readme update

This commit is contained in:
wboughattas
2025-12-31 00:03:54 -05:00
parent 231d9f86f6
commit 02d6a1b974

173
README.md
View File

@@ -1,42 +1,60 @@
# Severed Infra: Cloud-Native Home Lab
# Severed Blog
This repository contains the Infrastructure-as-Code (IaC) and manifest definitions for **Severed**, a modern blog and
observability stack running on Kubernetes (K3d).
## Introduction
We are taking a simple static website, the **Severed Blog**, and engineering a production-grade infrastructure around
it.
Anyone can run `docker run nginx`. The real engineering challenge is building the **platform** that keeps that
application alive, scalable, and observable.
In this project, we utilize **K3d** (Kubernetes in Docker) to mimic a real cloud environment locally. Beyond simple
deployment, we implement:
* **High Availability:** Running multiple replicas so the site never goes down.
* **Auto-Scaling:** Automatically detecting traffic spikes (RPS) and launching new pods.
* **Observability:** Using the LGTM stack (Loki, Grafana, Prometheus) to visualize exactly what is happening inside the
cluster.
* **Persistence:** Dynamic storage provisioning for databases using OpenEBS.
## Architecture
The stack is designed to mimic a real-world AWS/Cloud environment but optimized for local development using **K3d** (k3s
in Docker).
**The Stack:**
The stack is designed to represent a modern, cloud-native environment.
* **Cluster:** K3d (Lightweight Kubernetes).
* **Ingress Controller:** Traefik (Routing `*.localhost` domains).
* **Application:** Jekyll Static Site served via Nginx (ConfigMap injected).
* **Observability (LGTM Stack):**
* **L**oki (Logs).
* **G**rafana (Visualizations & Dashboards-as-Code).
* **T**empo (Tracing - *Planned*).
* **M**onitoring / Prometheus (Metrics).
* **Agent:** Grafana Alloy (OpenTelemetry Collector) running as a DaemonSet.
* **Ingress:** Traefik (Routing `*.localhost` domains).
* **Storage:** OpenEBS (Local PV provisioner for Prometheus/Loki persistence).
* **Application:**
* **Workload:** Nginx serving static assets.
* **Sidecar:** Prometheus Exporter for scraping Nginx metrics.
* **Scaling:** HPA driven by Custom Metrics (Requests Per Second).
* **Observability (LGTM):**
* **Loki:** Log Aggregation.
* **Prometheus:** Metric Storage (Scraping Kube State Metrics & Application Sidecars).
* **Grafana:** Stateless UI with dashboards-as-code.
* **Alloy:** OpenTelemetry Collector running as a DaemonSet.
## Repository Structure
```text
Severed-Infra/
├── apps/ # Application Manifests
│ ├── severed-blog.yaml # Deployment + Service
│ ├── severed-blog-config.yaml # Nginx ConfigMap (Decoupled Config)
│ ├── severed-blog.yaml # Deployment (Web + Sidecar)
│ ├── severed-blog-hpa.yaml # Auto-Scaling Rules (CPU/RAM/RPS)
│ ├── severed-blog-config.yaml # Nginx ConfigMap
│ └── severed-ingress.yaml # Routing Rules (blog.localhost)
├── infra/ # Infrastructure & Observability
│ ├── alloy-agent.yaml # DaemonSet for Metrics/Logs Collection
│ ├── alloy-env.yaml # Environment Variables
└── observer/ # The Observability Stack
├── loki.yaml # Log Aggregation
├── prometheus.yaml # Metric Storage
── grafana.yaml # Dashboard UI (Stateless)
└── dashboard-json.yaml # "Cluster Health" Dashboard as Code
└── namespaces.yaml
│ ├── alloy-setup.yaml # DaemonSet for Metrics/Logs Collection
│ ├── observer/ # The Observability Stack
│ ├── loki.yaml # Log Database
├── prometheus.yaml # Metric Database
├── adapter-values.yaml # Custom Metrics Rules (Prometheus Adapter)
── grafana.yaml # Dashboard UI
└── storage/ # StorageClass Definitions
└── scripts/ # Automation
├── deploy-all.sh # One-click deployment
└── tests/ # Stress testing tools (Apache Bench)
```
## Quick Start
@@ -46,88 +64,71 @@ Severed-Infra/
Ensure you have the following installed:
* [Docker Desktop](https://www.docker.com/)
* [K3d](https://k3d.io/) (`brew install k3d`)
* `kubectl` (`brew install kubectl`)
* [K3d](https://k3d.io/)
* `kubectl`
* `helm` (Required for Kube State Metrics and Prometheus Adapter)
### 2. Boot the Cluster
### 2. Deploy
Create a cluster with port mapping for the Ingress controller:
We have automated the bootstrap process. The `deploy-all.sh` script handles cluster creation, Helm chart installation,
and manifest application.
```bash
k3d cluster create severed-cluster -p "8080:80@loadbalancer"
cd scripts
./deploy-all.sh
```
### 3. Deploy Infrastructure (Observability)
### 3. Verify
Spin up the database backends (Loki/Prometheus) and the UI (Grafana).
Once the script completes, check the status of your pods:
```bash
# 1. Create the secret for Grafana Admin
kubectl create secret generic grafana-secrets \
--namespace monitoring \
--from-literal=admin-user=admin \
--from-literal=admin-password=severed_secure_password
# 2. Deploy the stack
kubectl apply -f infra/observer/
# 3. Deploy the Collector Agent (Alloy)
kubectl apply -f infra/alloy-setup.yaml
```
### 4. Deploy the Application
Deploy the blog and its routing rules.
```bash
kubectl apply -f apps/
kubectl get pods -A
```
## Access Points
| Service | URL | Credentials / Notes |
|-------------------|------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|
| **Severed Blog** | [http://blog.localhost:8080](https://www.google.com/search?q=http://blog.localhost:8080) | Public Access |
| **Grafana** | [http://grafana.localhost:8080](https://www.google.com/search?q=http://grafana.localhost:8080) | **User:** `admin` / **Pass:** `severed_secure_password`<br> <br>*(Anonymous View Access Enabled)* |
| **Prometheus** | *Internal Only* | Accessed via Alloy/Grafana |
| **Loki** | *Internal Only* | Accessed via Alloy/Grafana |
| **K8s Dashboard** | `https://localhost:8443` | **Auth:** Token-based. Access via `kubectl port-forward svc/kubernetes-dashboard-kong-proxy 8443:443 -n kubernetes-dashboard`. |
| Service | URL | Credentials |
|-------------------|------------------------------------------------------------------------------------------------|-----------------------------------------------|
| **Severed Blog** | [http://blog.localhost:8080](https://www.google.com/search?q=http://blog.localhost:8080) | Public |
| **Grafana** | [http://grafana.localhost:8080](https://www.google.com/search?q=http://grafana.localhost:8080) | **User:** `admin` <br> <br> **Pass:** `admin` |
| **K8s Dashboard** | `https://localhost:8443` | Requires Token (See below) |
## Observability Features
To retrieve the K8s Dashboard Admin Token:
This stack uses **Grafana Alloy** to automatically scrape metrics and tail logs from all pods.
```bash
kubectl -n kubernetes-dashboard get secret admin-user-token -o jsonpath={".data.token"} | base64 -d
```
* **Cluster Health Dashboard:** Pre-provisioned "Infrastructure as Code." No manual setup required.
* Real-time **CPU/Memory** usage per node.
* **Disk Usage** monitoring (filtering out overlay/tmpfs noise).
* **RPS & Error Rates** derived directly from Nginx logs using LogQL.
* **Log Relabeling:** Alloy automatically promotes hidden K8s metadata (like `app=severed-blog`) into searchable Loki
labels.
## Highlights
## Engineering Decisions
### Auto-Scaling (HPA)
* **ConfigMaps vs. Rebuilds:** The Nginx configuration is injected via a ConfigMap (`apps/blog-config.yaml`). We can
tweak caching headers or routing rules without rebuilding the Docker image.
* **Host Networking Fix:** Alloy runs with `hostNetwork: true` to scrape node metrics but uses
`dnsPolicy: ClusterFirstWithHostNet` to ensure it can still resolve internal K8s services (`loki.monitoring.svc`).
* **Security:** Grafana admin credentials are stored in Kubernetes Secrets, not plaintext YAML. Anonymous access is
restricted to `Viewer` role only.
We implemented a custom **Horizontal Pod Autoscaler**.
---
* **Metrics:** `nginx_http_requests_total`, cpu usage, ram usage
* **Pipeline:** Sidecar Exporter -> Prometheus -> Prometheus Adapter -> Custom Metrics API -> HPA Controller.
* **Behavior:** Scales up max 1 pod every 15s to prevent thrashing; stabilizes for 30s before scaling down.
### Future Roadmap
### Observability
* [ ] Add Cert-Manager for TLS (HTTPS).
* [ ] Implement ArgoCD for automated GitOps syncing.
* [ ] Move to a physical Home Server.
* **Dashboards-as-Code:** Grafana dashboards are injected via ConfigMaps. If the pod restarts, the dashboards persist.
* **Log Correlation:** Alloy enriches logs with Kubernetes metadata (Namespace, Pod Name), allowing us to filter logs by
`app=severed-blog` instead of container IDs.
---
## Testing
### todos/bugfixes
To verify the auto-scaling capabilities, run the stress test script. This uses Apache Bench (`ab`) to generate massive
concurrency.
* **[ ] Automate Dashboard Auth:** Rotate/retrieve the `admin-user` token to avoid manually `create token` every session.
* **[ ] External Secret Management:** Replace generic secrets with HashiCorp Vault to encrypt `grafana-secrets` and dashboard tokens.
* **[ ] Ingress Hardening:** Resolve the `localhost` 401 loop using **Cert-Manager** with self-signed certificates, which allows Kong to see valid HTTPS traffic and accept session cookies natively.
* **[ ] Persistence Layer:** Deploy a **Local Path Provisioner** or **HostPath** storage class for Loki and Prometheus
so that metrics and dashboard configurations survive a `k3d cluster stop`.
* **[ ] Resource Quotas:** Define `resources: requests/limits` for the LGTM stack.
```bash
# Triggers the HPA to scale from 2 -> 6 replicas
cd scripts/tests
./stress-blog.sh
```
Watch the scaling happen in real-time:
```bash
kubectl get hpa -n severed-apps -w
```