readme update

2025-12-31 00:03:54 -05:00
parent 231d9f86f6
commit 02d6a1b974
1 changed files with 87 additions and 86 deletions
--- a/README.md
+++ b/README.md
@@ -1,42 +1,60 @@
-# Severed Infra: Cloud-Native Home Lab
+# Severed Blog

-This repository contains the Infrastructure-as-Code (IaC) and manifest definitions for **Severed**, a modern blog and
-observability stack running on Kubernetes (K3d).
+## Introduction
+
+We are taking a simple static website, the **Severed Blog**, and engineering a production-grade infrastructure around
+it.
+
+Anyone can run `docker run nginx`. The real engineering challenge is building the **platform** that keeps that
+application alive, scalable, and observable.
+
+In this project, we utilize **K3d** (Kubernetes in Docker) to mimic a real cloud environment locally. Beyond simple
+deployment, we implement:
+
+* **High Availability:** Running multiple replicas so the site never goes down.
+* **Auto-Scaling:** Automatically detecting traffic spikes (RPS) and launching new pods.
+* **Observability:** Using the LGTM stack (Loki, Grafana, Prometheus) to visualize exactly what is happening inside the
+  cluster.
+* **Persistence:** Dynamic storage provisioning for databases using OpenEBS.

 ## Architecture

-The stack is designed to mimic a real-world AWS/Cloud environment but optimized for local development using **K3d** (k3s
-in Docker).
-
-**The Stack:**
+The stack is designed to represent a modern, cloud-native environment.

 * **Cluster:** K3d (Lightweight Kubernetes).
-* **Ingress Controller:** Traefik (Routing `*.localhost` domains).
-* **Application:** Jekyll Static Site served via Nginx (ConfigMap injected).
-* **Observability (LGTM Stack):**
-* **L**oki (Logs).
-* **G**rafana (Visualizations & Dashboards-as-Code).
-* **T**empo (Tracing - *Planned*).
-* **M**onitoring / Prometheus (Metrics).
-* **Agent:** Grafana Alloy (OpenTelemetry Collector) running as a DaemonSet.
+* **Ingress:** Traefik (Routing `*.localhost` domains).
+* **Storage:** OpenEBS (Local PV provisioner for Prometheus/Loki persistence).
+* **Application:**
+* **Workload:** Nginx serving static assets.
+* **Sidecar:** Prometheus Exporter for scraping Nginx metrics.
+* **Scaling:** HPA driven by Custom Metrics (Requests Per Second).
+
+* **Observability (LGTM):**
+* **Loki:** Log Aggregation.
+* **Prometheus:** Metric Storage (Scraping Kube State Metrics & Application Sidecars).
+* **Grafana:** Stateless UI with dashboards-as-code.
+* **Alloy:** OpenTelemetry Collector running as a DaemonSet.

 ## Repository Structure

 ```text
 Severed-Infra/
 ├── apps/                           # Application Manifests
-│   ├── severed-blog.yaml           # Deployment + Service
-│   ├── severed-blog-config.yaml    # Nginx ConfigMap (Decoupled Config)
+│   ├── severed-blog.yaml           # Deployment (Web + Sidecar)
+│   ├── severed-blog-hpa.yaml       # Auto-Scaling Rules (CPU/RAM/RPS)
+│   ├── severed-blog-config.yaml    # Nginx ConfigMap
 │   └── severed-ingress.yaml        # Routing Rules (blog.localhost)
 ├── infra/                          # Infrastructure & Observability
-│   ├── alloy-agent.yaml            # DaemonSet for Metrics/Logs Collection
-│   ├── alloy-env.yaml              # Environment Variables
-│   └── observer/                   # The Observability Stack
-│       ├── loki.yaml               # Log Aggregation
-│       ├── prometheus.yaml         # Metric Storage
-│       ├── grafana.yaml            # Dashboard UI (Stateless)
-│       └── dashboard-json.yaml     # "Cluster Health" Dashboard as Code
-└── namespaces.yaml
+│   ├── alloy-setup.yaml            # DaemonSet for Metrics/Logs Collection
+│   ├── observer/                   # The Observability Stack
+│   │   ├── loki.yaml               # Log Database
+│   │   ├── prometheus.yaml         # Metric Database
+│   │   ├── adapter-values.yaml     # Custom Metrics Rules (Prometheus Adapter)
+│   │   └── grafana.yaml            # Dashboard UI
+│   └── storage/                    # StorageClass Definitions
+└── scripts/                        # Automation
+    ├── deploy-all.sh               # One-click deployment
+    └── tests/                      # Stress testing tools (Apache Bench)
 ```

 ## Quick Start
@@ -46,88 +64,71 @@ Severed-Infra/
 Ensure you have the following installed:

 * [Docker Desktop](https://www.docker.com/)
-* [K3d](https://k3d.io/) (`brew install k3d`)
-* `kubectl` (`brew install kubectl`)
+* [K3d](https://k3d.io/)
+* `kubectl`
+* `helm` (Required for Kube State Metrics and Prometheus Adapter)

-### 2. Boot the Cluster
+### 2. Deploy

-Create a cluster with port mapping for the Ingress controller:
+We have automated the bootstrap process. The `deploy-all.sh` script handles cluster creation, Helm chart installation,
+and manifest application.

 ```bash
-k3d cluster create severed-cluster -p "8080:80@loadbalancer"
+cd scripts
+./deploy-all.sh
 ```

-### 3. Deploy Infrastructure (Observability)
+### 3. Verify

-Spin up the database backends (Loki/Prometheus) and the UI (Grafana).
+Once the script completes, check the status of your pods:

 ```bash
-# 1. Create the secret for Grafana Admin
-kubectl create secret generic grafana-secrets \
-  --namespace monitoring \
-  --from-literal=admin-user=admin \
-  --from-literal=admin-password=severed_secure_password
-
-# 2. Deploy the stack
-kubectl apply -f infra/observer/
-
-# 3. Deploy the Collector Agent (Alloy)
-kubectl apply -f infra/alloy-setup.yaml
-```
-
-### 4. Deploy the Application
-
-Deploy the blog and its routing rules.
-
-```bash
-kubectl apply -f apps/
+kubectl get pods -A
 ```

 ## Access Points

-| Service           | URL                                                                                            | Credentials / Notes                                                                                                            |
-|-------------------|------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|
-| **Severed Blog**  | [http://blog.localhost:8080](https://www.google.com/search?q=http://blog.localhost:8080)       | Public Access                                                                                                                  |
-| **Grafana**       | [http://grafana.localhost:8080](https://www.google.com/search?q=http://grafana.localhost:8080) | **User:** `admin` / **Pass:** `severed_secure_password`<br> <br>*(Anonymous View Access Enabled)*                              |
-| **Prometheus**    | *Internal Only*                                                                                | Accessed via Alloy/Grafana                                                                                                     |
-| **Loki**          | *Internal Only*                                                                                | Accessed via Alloy/Grafana                                                                                                     |
-| **K8s Dashboard** | `https://localhost:8443`                                                                       | **Auth:** Token-based. Access via `kubectl port-forward svc/kubernetes-dashboard-kong-proxy 8443:443 -n kubernetes-dashboard`. |
+| Service           | URL                                                                                            | Credentials                                   |
+|-------------------|------------------------------------------------------------------------------------------------|-----------------------------------------------|
+| **Severed Blog**  | [http://blog.localhost:8080](https://www.google.com/search?q=http://blog.localhost:8080)       | Public                                        |
+| **Grafana**       | [http://grafana.localhost:8080](https://www.google.com/search?q=http://grafana.localhost:8080) | **User:** `admin` <br> <br> **Pass:** `admin` |
+| **K8s Dashboard** | `https://localhost:8443`                                                                       | Requires Token (See below)                    |

-## Observability Features
+To retrieve the K8s Dashboard Admin Token:

-This stack uses **Grafana Alloy** to automatically scrape metrics and tail logs from all pods.
+```bash
+kubectl -n kubernetes-dashboard get secret admin-user-token -o jsonpath={".data.token"} | base64 -d
+```

-* **Cluster Health Dashboard:** Pre-provisioned "Infrastructure as Code." No manual setup required.
-* Real-time **CPU/Memory** usage per node.
-* **Disk Usage** monitoring (filtering out overlay/tmpfs noise).
-* **RPS & Error Rates** derived directly from Nginx logs using LogQL.
-* **Log Relabeling:** Alloy automatically promotes hidden K8s metadata (like `app=severed-blog`) into searchable Loki
-  labels.
+## Highlights

-## Engineering Decisions
+### Auto-Scaling (HPA)

-* **ConfigMaps vs. Rebuilds:** The Nginx configuration is injected via a ConfigMap (`apps/blog-config.yaml`). We can
-  tweak caching headers or routing rules without rebuilding the Docker image.
-* **Host Networking Fix:** Alloy runs with `hostNetwork: true` to scrape node metrics but uses
-  `dnsPolicy: ClusterFirstWithHostNet` to ensure it can still resolve internal K8s services (`loki.monitoring.svc`).
-* **Security:** Grafana admin credentials are stored in Kubernetes Secrets, not plaintext YAML. Anonymous access is
-  restricted to `Viewer` role only.
+We implemented a custom **Horizontal Pod Autoscaler**.

---
+* **Metrics:** `nginx_http_requests_total`, cpu usage, ram usage
+* **Pipeline:** Sidecar Exporter -> Prometheus -> Prometheus Adapter -> Custom Metrics API -> HPA Controller.
+* **Behavior:** Scales up max 1 pod every 15s to prevent thrashing; stabilizes for 30s before scaling down.

-### Future Roadmap
+### Observability

-* [ ] Add Cert-Manager for TLS (HTTPS).
-* [ ] Implement ArgoCD for automated GitOps syncing.
-* [ ] Move to a physical Home Server.
+* **Dashboards-as-Code:** Grafana dashboards are injected via ConfigMaps. If the pod restarts, the dashboards persist.
+* **Log Correlation:** Alloy enriches logs with Kubernetes metadata (Namespace, Pod Name), allowing us to filter logs by
+  `app=severed-blog` instead of container IDs.

---
+## Testing

-### todos/bugfixes
+To verify the auto-scaling capabilities, run the stress test script. This uses Apache Bench (`ab`) to generate massive
+concurrency.

-* **[ ] Automate Dashboard Auth:** Rotate/retrieve the `admin-user` token to avoid manually `create token` every session.
-* **[ ] External Secret Management:** Replace generic secrets with HashiCorp Vault to encrypt `grafana-secrets` and dashboard tokens.
-* **[ ] Ingress Hardening:** Resolve the `localhost` 401 loop using **Cert-Manager** with self-signed certificates, which allows Kong to see valid HTTPS traffic and accept session cookies natively.
-* **[ ] Persistence Layer:** Deploy a **Local Path Provisioner** or **HostPath** storage class for Loki and Prometheus
-  so that metrics and dashboard configurations survive a `k3d cluster stop`.
-* **[ ] Resource Quotas:** Define `resources: requests/limits` for the LGTM stack.
+```bash
+# Triggers the HPA to scale from 2 -> 6 replicas
+cd scripts/tests
+./stress-blog.sh
+```
+
+Watch the scaling happen in real-time:
+
+```bash
+kubectl get hpa -n severed-apps -w
+```