diff --git a/README.md b/README.md index 720e278..eeac9e3 100644 --- a/README.md +++ b/README.md @@ -1,42 +1,60 @@ -# Severed Infra: Cloud-Native Home Lab +# Severed Blog -This repository contains the Infrastructure-as-Code (IaC) and manifest definitions for **Severed**, a modern blog and -observability stack running on Kubernetes (K3d). +## Introduction + +We are taking a simple static website, the **Severed Blog**, and engineering a production-grade infrastructure around +it. + +Anyone can run `docker run nginx`. The real engineering challenge is building the **platform** that keeps that +application alive, scalable, and observable. + +In this project, we utilize **K3d** (Kubernetes in Docker) to mimic a real cloud environment locally. Beyond simple +deployment, we implement: + +* **High Availability:** Running multiple replicas so the site never goes down. +* **Auto-Scaling:** Automatically detecting traffic spikes (RPS) and launching new pods. +* **Observability:** Using the LGTM stack (Loki, Grafana, Prometheus) to visualize exactly what is happening inside the + cluster. +* **Persistence:** Dynamic storage provisioning for databases using OpenEBS. ## Architecture -The stack is designed to mimic a real-world AWS/Cloud environment but optimized for local development using **K3d** (k3s -in Docker). - -**The Stack:** +The stack is designed to represent a modern, cloud-native environment. * **Cluster:** K3d (Lightweight Kubernetes). -* **Ingress Controller:** Traefik (Routing `*.localhost` domains). -* **Application:** Jekyll Static Site served via Nginx (ConfigMap injected). -* **Observability (LGTM Stack):** -* **L**oki (Logs). -* **G**rafana (Visualizations & Dashboards-as-Code). -* **T**empo (Tracing - *Planned*). -* **M**onitoring / Prometheus (Metrics). -* **Agent:** Grafana Alloy (OpenTelemetry Collector) running as a DaemonSet. +* **Ingress:** Traefik (Routing `*.localhost` domains). +* **Storage:** OpenEBS (Local PV provisioner for Prometheus/Loki persistence). +* **Application:** +* **Workload:** Nginx serving static assets. +* **Sidecar:** Prometheus Exporter for scraping Nginx metrics. +* **Scaling:** HPA driven by Custom Metrics (Requests Per Second). + +* **Observability (LGTM):** +* **Loki:** Log Aggregation. +* **Prometheus:** Metric Storage (Scraping Kube State Metrics & Application Sidecars). +* **Grafana:** Stateless UI with dashboards-as-code. +* **Alloy:** OpenTelemetry Collector running as a DaemonSet. ## Repository Structure ```text Severed-Infra/ ├── apps/ # Application Manifests -│ ├── severed-blog.yaml # Deployment + Service -│ ├── severed-blog-config.yaml # Nginx ConfigMap (Decoupled Config) +│ ├── severed-blog.yaml # Deployment (Web + Sidecar) +│ ├── severed-blog-hpa.yaml # Auto-Scaling Rules (CPU/RAM/RPS) +│ ├── severed-blog-config.yaml # Nginx ConfigMap │ └── severed-ingress.yaml # Routing Rules (blog.localhost) ├── infra/ # Infrastructure & Observability -│ ├── alloy-agent.yaml # DaemonSet for Metrics/Logs Collection -│ ├── alloy-env.yaml # Environment Variables -│ └── observer/ # The Observability Stack -│ ├── loki.yaml # Log Aggregation -│ ├── prometheus.yaml # Metric Storage -│ ├── grafana.yaml # Dashboard UI (Stateless) -│ └── dashboard-json.yaml # "Cluster Health" Dashboard as Code -└── namespaces.yaml +│ ├── alloy-setup.yaml # DaemonSet for Metrics/Logs Collection +│ ├── observer/ # The Observability Stack +│ │ ├── loki.yaml # Log Database +│ │ ├── prometheus.yaml # Metric Database +│ │ ├── adapter-values.yaml # Custom Metrics Rules (Prometheus Adapter) +│ │ └── grafana.yaml # Dashboard UI +│ └── storage/ # StorageClass Definitions +└── scripts/ # Automation + ├── deploy-all.sh # One-click deployment + └── tests/ # Stress testing tools (Apache Bench) ``` ## Quick Start @@ -46,88 +64,71 @@ Severed-Infra/ Ensure you have the following installed: * [Docker Desktop](https://www.docker.com/) -* [K3d](https://k3d.io/) (`brew install k3d`) -* `kubectl` (`brew install kubectl`) +* [K3d](https://k3d.io/) +* `kubectl` +* `helm` (Required for Kube State Metrics and Prometheus Adapter) -### 2. Boot the Cluster +### 2. Deploy -Create a cluster with port mapping for the Ingress controller: +We have automated the bootstrap process. The `deploy-all.sh` script handles cluster creation, Helm chart installation, +and manifest application. ```bash -k3d cluster create severed-cluster -p "8080:80@loadbalancer" +cd scripts +./deploy-all.sh ``` -### 3. Deploy Infrastructure (Observability) +### 3. Verify -Spin up the database backends (Loki/Prometheus) and the UI (Grafana). +Once the script completes, check the status of your pods: ```bash -# 1. Create the secret for Grafana Admin -kubectl create secret generic grafana-secrets \ - --namespace monitoring \ - --from-literal=admin-user=admin \ - --from-literal=admin-password=severed_secure_password - -# 2. Deploy the stack -kubectl apply -f infra/observer/ - -# 3. Deploy the Collector Agent (Alloy) -kubectl apply -f infra/alloy-setup.yaml -``` - -### 4. Deploy the Application - -Deploy the blog and its routing rules. - -```bash -kubectl apply -f apps/ +kubectl get pods -A ``` ## Access Points -| Service | URL | Credentials / Notes | -|-------------------|------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------| -| **Severed Blog** | [http://blog.localhost:8080](https://www.google.com/search?q=http://blog.localhost:8080) | Public Access | -| **Grafana** | [http://grafana.localhost:8080](https://www.google.com/search?q=http://grafana.localhost:8080) | **User:** `admin` / **Pass:** `severed_secure_password`

*(Anonymous View Access Enabled)* | -| **Prometheus** | *Internal Only* | Accessed via Alloy/Grafana | -| **Loki** | *Internal Only* | Accessed via Alloy/Grafana | -| **K8s Dashboard** | `https://localhost:8443` | **Auth:** Token-based. Access via `kubectl port-forward svc/kubernetes-dashboard-kong-proxy 8443:443 -n kubernetes-dashboard`. | +| Service | URL | Credentials | +|-------------------|------------------------------------------------------------------------------------------------|-----------------------------------------------| +| **Severed Blog** | [http://blog.localhost:8080](https://www.google.com/search?q=http://blog.localhost:8080) | Public | +| **Grafana** | [http://grafana.localhost:8080](https://www.google.com/search?q=http://grafana.localhost:8080) | **User:** `admin`

**Pass:** `admin` | +| **K8s Dashboard** | `https://localhost:8443` | Requires Token (See below) | -## Observability Features +To retrieve the K8s Dashboard Admin Token: -This stack uses **Grafana Alloy** to automatically scrape metrics and tail logs from all pods. +```bash +kubectl -n kubernetes-dashboard get secret admin-user-token -o jsonpath={".data.token"} | base64 -d +``` -* **Cluster Health Dashboard:** Pre-provisioned "Infrastructure as Code." No manual setup required. -* Real-time **CPU/Memory** usage per node. -* **Disk Usage** monitoring (filtering out overlay/tmpfs noise). -* **RPS & Error Rates** derived directly from Nginx logs using LogQL. -* **Log Relabeling:** Alloy automatically promotes hidden K8s metadata (like `app=severed-blog`) into searchable Loki - labels. +## Highlights -## Engineering Decisions +### Auto-Scaling (HPA) -* **ConfigMaps vs. Rebuilds:** The Nginx configuration is injected via a ConfigMap (`apps/blog-config.yaml`). We can - tweak caching headers or routing rules without rebuilding the Docker image. -* **Host Networking Fix:** Alloy runs with `hostNetwork: true` to scrape node metrics but uses - `dnsPolicy: ClusterFirstWithHostNet` to ensure it can still resolve internal K8s services (`loki.monitoring.svc`). -* **Security:** Grafana admin credentials are stored in Kubernetes Secrets, not plaintext YAML. Anonymous access is - restricted to `Viewer` role only. +We implemented a custom **Horizontal Pod Autoscaler**. ---- +* **Metrics:** `nginx_http_requests_total`, cpu usage, ram usage +* **Pipeline:** Sidecar Exporter -> Prometheus -> Prometheus Adapter -> Custom Metrics API -> HPA Controller. +* **Behavior:** Scales up max 1 pod every 15s to prevent thrashing; stabilizes for 30s before scaling down. -### Future Roadmap +### Observability -* [ ] Add Cert-Manager for TLS (HTTPS). -* [ ] Implement ArgoCD for automated GitOps syncing. -* [ ] Move to a physical Home Server. +* **Dashboards-as-Code:** Grafana dashboards are injected via ConfigMaps. If the pod restarts, the dashboards persist. +* **Log Correlation:** Alloy enriches logs with Kubernetes metadata (Namespace, Pod Name), allowing us to filter logs by + `app=severed-blog` instead of container IDs. ---- +## Testing -### todos/bugfixes +To verify the auto-scaling capabilities, run the stress test script. This uses Apache Bench (`ab`) to generate massive +concurrency. -* **[ ] Automate Dashboard Auth:** Rotate/retrieve the `admin-user` token to avoid manually `create token` every session. -* **[ ] External Secret Management:** Replace generic secrets with HashiCorp Vault to encrypt `grafana-secrets` and dashboard tokens. -* **[ ] Ingress Hardening:** Resolve the `localhost` 401 loop using **Cert-Manager** with self-signed certificates, which allows Kong to see valid HTTPS traffic and accept session cookies natively. -* **[ ] Persistence Layer:** Deploy a **Local Path Provisioner** or **HostPath** storage class for Loki and Prometheus - so that metrics and dashboard configurations survive a `k3d cluster stop`. -* **[ ] Resource Quotas:** Define `resources: requests/limits` for the LGTM stack. +```bash +# Triggers the HPA to scale from 2 -> 6 replicas +cd scripts/tests +./stress-blog.sh +``` + +Watch the scaling happen in real-time: + +```bash +kubectl get hpa -n severed-apps -w +```