readme update
This commit is contained in:
173
README.md
173
README.md
@@ -1,42 +1,60 @@
|
||||
# Severed Infra: Cloud-Native Home Lab
|
||||
# Severed Blog
|
||||
|
||||
This repository contains the Infrastructure-as-Code (IaC) and manifest definitions for **Severed**, a modern blog and
|
||||
observability stack running on Kubernetes (K3d).
|
||||
## Introduction
|
||||
|
||||
We are taking a simple static website, the **Severed Blog**, and engineering a production-grade infrastructure around
|
||||
it.
|
||||
|
||||
Anyone can run `docker run nginx`. The real engineering challenge is building the **platform** that keeps that
|
||||
application alive, scalable, and observable.
|
||||
|
||||
In this project, we utilize **K3d** (Kubernetes in Docker) to mimic a real cloud environment locally. Beyond simple
|
||||
deployment, we implement:
|
||||
|
||||
* **High Availability:** Running multiple replicas so the site never goes down.
|
||||
* **Auto-Scaling:** Automatically detecting traffic spikes (RPS) and launching new pods.
|
||||
* **Observability:** Using the LGTM stack (Loki, Grafana, Prometheus) to visualize exactly what is happening inside the
|
||||
cluster.
|
||||
* **Persistence:** Dynamic storage provisioning for databases using OpenEBS.
|
||||
|
||||
## Architecture
|
||||
|
||||
The stack is designed to mimic a real-world AWS/Cloud environment but optimized for local development using **K3d** (k3s
|
||||
in Docker).
|
||||
|
||||
**The Stack:**
|
||||
The stack is designed to represent a modern, cloud-native environment.
|
||||
|
||||
* **Cluster:** K3d (Lightweight Kubernetes).
|
||||
* **Ingress Controller:** Traefik (Routing `*.localhost` domains).
|
||||
* **Application:** Jekyll Static Site served via Nginx (ConfigMap injected).
|
||||
* **Observability (LGTM Stack):**
|
||||
* **L**oki (Logs).
|
||||
* **G**rafana (Visualizations & Dashboards-as-Code).
|
||||
* **T**empo (Tracing - *Planned*).
|
||||
* **M**onitoring / Prometheus (Metrics).
|
||||
* **Agent:** Grafana Alloy (OpenTelemetry Collector) running as a DaemonSet.
|
||||
* **Ingress:** Traefik (Routing `*.localhost` domains).
|
||||
* **Storage:** OpenEBS (Local PV provisioner for Prometheus/Loki persistence).
|
||||
* **Application:**
|
||||
* **Workload:** Nginx serving static assets.
|
||||
* **Sidecar:** Prometheus Exporter for scraping Nginx metrics.
|
||||
* **Scaling:** HPA driven by Custom Metrics (Requests Per Second).
|
||||
|
||||
* **Observability (LGTM):**
|
||||
* **Loki:** Log Aggregation.
|
||||
* **Prometheus:** Metric Storage (Scraping Kube State Metrics & Application Sidecars).
|
||||
* **Grafana:** Stateless UI with dashboards-as-code.
|
||||
* **Alloy:** OpenTelemetry Collector running as a DaemonSet.
|
||||
|
||||
## Repository Structure
|
||||
|
||||
```text
|
||||
Severed-Infra/
|
||||
├── apps/ # Application Manifests
|
||||
│ ├── severed-blog.yaml # Deployment + Service
|
||||
│ ├── severed-blog-config.yaml # Nginx ConfigMap (Decoupled Config)
|
||||
│ ├── severed-blog.yaml # Deployment (Web + Sidecar)
|
||||
│ ├── severed-blog-hpa.yaml # Auto-Scaling Rules (CPU/RAM/RPS)
|
||||
│ ├── severed-blog-config.yaml # Nginx ConfigMap
|
||||
│ └── severed-ingress.yaml # Routing Rules (blog.localhost)
|
||||
├── infra/ # Infrastructure & Observability
|
||||
│ ├── alloy-agent.yaml # DaemonSet for Metrics/Logs Collection
|
||||
│ ├── alloy-env.yaml # Environment Variables
|
||||
│ └── observer/ # The Observability Stack
|
||||
│ ├── loki.yaml # Log Aggregation
|
||||
│ ├── prometheus.yaml # Metric Storage
|
||||
│ ├── grafana.yaml # Dashboard UI (Stateless)
|
||||
│ └── dashboard-json.yaml # "Cluster Health" Dashboard as Code
|
||||
└── namespaces.yaml
|
||||
│ ├── alloy-setup.yaml # DaemonSet for Metrics/Logs Collection
|
||||
│ ├── observer/ # The Observability Stack
|
||||
│ │ ├── loki.yaml # Log Database
|
||||
│ │ ├── prometheus.yaml # Metric Database
|
||||
│ │ ├── adapter-values.yaml # Custom Metrics Rules (Prometheus Adapter)
|
||||
│ │ └── grafana.yaml # Dashboard UI
|
||||
│ └── storage/ # StorageClass Definitions
|
||||
└── scripts/ # Automation
|
||||
├── deploy-all.sh # One-click deployment
|
||||
└── tests/ # Stress testing tools (Apache Bench)
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
@@ -46,88 +64,71 @@ Severed-Infra/
|
||||
Ensure you have the following installed:
|
||||
|
||||
* [Docker Desktop](https://www.docker.com/)
|
||||
* [K3d](https://k3d.io/) (`brew install k3d`)
|
||||
* `kubectl` (`brew install kubectl`)
|
||||
* [K3d](https://k3d.io/)
|
||||
* `kubectl`
|
||||
* `helm` (Required for Kube State Metrics and Prometheus Adapter)
|
||||
|
||||
### 2. Boot the Cluster
|
||||
### 2. Deploy
|
||||
|
||||
Create a cluster with port mapping for the Ingress controller:
|
||||
We have automated the bootstrap process. The `deploy-all.sh` script handles cluster creation, Helm chart installation,
|
||||
and manifest application.
|
||||
|
||||
```bash
|
||||
k3d cluster create severed-cluster -p "8080:80@loadbalancer"
|
||||
cd scripts
|
||||
./deploy-all.sh
|
||||
```
|
||||
|
||||
### 3. Deploy Infrastructure (Observability)
|
||||
### 3. Verify
|
||||
|
||||
Spin up the database backends (Loki/Prometheus) and the UI (Grafana).
|
||||
Once the script completes, check the status of your pods:
|
||||
|
||||
```bash
|
||||
# 1. Create the secret for Grafana Admin
|
||||
kubectl create secret generic grafana-secrets \
|
||||
--namespace monitoring \
|
||||
--from-literal=admin-user=admin \
|
||||
--from-literal=admin-password=severed_secure_password
|
||||
|
||||
# 2. Deploy the stack
|
||||
kubectl apply -f infra/observer/
|
||||
|
||||
# 3. Deploy the Collector Agent (Alloy)
|
||||
kubectl apply -f infra/alloy-setup.yaml
|
||||
```
|
||||
|
||||
### 4. Deploy the Application
|
||||
|
||||
Deploy the blog and its routing rules.
|
||||
|
||||
```bash
|
||||
kubectl apply -f apps/
|
||||
kubectl get pods -A
|
||||
```
|
||||
|
||||
## Access Points
|
||||
|
||||
| Service | URL | Credentials / Notes |
|
||||
|-------------------|------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|
|
||||
| **Severed Blog** | [http://blog.localhost:8080](https://www.google.com/search?q=http://blog.localhost:8080) | Public Access |
|
||||
| **Grafana** | [http://grafana.localhost:8080](https://www.google.com/search?q=http://grafana.localhost:8080) | **User:** `admin` / **Pass:** `severed_secure_password`<br> <br>*(Anonymous View Access Enabled)* |
|
||||
| **Prometheus** | *Internal Only* | Accessed via Alloy/Grafana |
|
||||
| **Loki** | *Internal Only* | Accessed via Alloy/Grafana |
|
||||
| **K8s Dashboard** | `https://localhost:8443` | **Auth:** Token-based. Access via `kubectl port-forward svc/kubernetes-dashboard-kong-proxy 8443:443 -n kubernetes-dashboard`. |
|
||||
| Service | URL | Credentials |
|
||||
|-------------------|------------------------------------------------------------------------------------------------|-----------------------------------------------|
|
||||
| **Severed Blog** | [http://blog.localhost:8080](https://www.google.com/search?q=http://blog.localhost:8080) | Public |
|
||||
| **Grafana** | [http://grafana.localhost:8080](https://www.google.com/search?q=http://grafana.localhost:8080) | **User:** `admin` <br> <br> **Pass:** `admin` |
|
||||
| **K8s Dashboard** | `https://localhost:8443` | Requires Token (See below) |
|
||||
|
||||
## Observability Features
|
||||
To retrieve the K8s Dashboard Admin Token:
|
||||
|
||||
This stack uses **Grafana Alloy** to automatically scrape metrics and tail logs from all pods.
|
||||
```bash
|
||||
kubectl -n kubernetes-dashboard get secret admin-user-token -o jsonpath={".data.token"} | base64 -d
|
||||
```
|
||||
|
||||
* **Cluster Health Dashboard:** Pre-provisioned "Infrastructure as Code." No manual setup required.
|
||||
* Real-time **CPU/Memory** usage per node.
|
||||
* **Disk Usage** monitoring (filtering out overlay/tmpfs noise).
|
||||
* **RPS & Error Rates** derived directly from Nginx logs using LogQL.
|
||||
* **Log Relabeling:** Alloy automatically promotes hidden K8s metadata (like `app=severed-blog`) into searchable Loki
|
||||
labels.
|
||||
## Highlights
|
||||
|
||||
## Engineering Decisions
|
||||
### Auto-Scaling (HPA)
|
||||
|
||||
* **ConfigMaps vs. Rebuilds:** The Nginx configuration is injected via a ConfigMap (`apps/blog-config.yaml`). We can
|
||||
tweak caching headers or routing rules without rebuilding the Docker image.
|
||||
* **Host Networking Fix:** Alloy runs with `hostNetwork: true` to scrape node metrics but uses
|
||||
`dnsPolicy: ClusterFirstWithHostNet` to ensure it can still resolve internal K8s services (`loki.monitoring.svc`).
|
||||
* **Security:** Grafana admin credentials are stored in Kubernetes Secrets, not plaintext YAML. Anonymous access is
|
||||
restricted to `Viewer` role only.
|
||||
We implemented a custom **Horizontal Pod Autoscaler**.
|
||||
|
||||
---
|
||||
* **Metrics:** `nginx_http_requests_total`, cpu usage, ram usage
|
||||
* **Pipeline:** Sidecar Exporter -> Prometheus -> Prometheus Adapter -> Custom Metrics API -> HPA Controller.
|
||||
* **Behavior:** Scales up max 1 pod every 15s to prevent thrashing; stabilizes for 30s before scaling down.
|
||||
|
||||
### Future Roadmap
|
||||
### Observability
|
||||
|
||||
* [ ] Add Cert-Manager for TLS (HTTPS).
|
||||
* [ ] Implement ArgoCD for automated GitOps syncing.
|
||||
* [ ] Move to a physical Home Server.
|
||||
* **Dashboards-as-Code:** Grafana dashboards are injected via ConfigMaps. If the pod restarts, the dashboards persist.
|
||||
* **Log Correlation:** Alloy enriches logs with Kubernetes metadata (Namespace, Pod Name), allowing us to filter logs by
|
||||
`app=severed-blog` instead of container IDs.
|
||||
|
||||
---
|
||||
## Testing
|
||||
|
||||
### todos/bugfixes
|
||||
To verify the auto-scaling capabilities, run the stress test script. This uses Apache Bench (`ab`) to generate massive
|
||||
concurrency.
|
||||
|
||||
* **[ ] Automate Dashboard Auth:** Rotate/retrieve the `admin-user` token to avoid manually `create token` every session.
|
||||
* **[ ] External Secret Management:** Replace generic secrets with HashiCorp Vault to encrypt `grafana-secrets` and dashboard tokens.
|
||||
* **[ ] Ingress Hardening:** Resolve the `localhost` 401 loop using **Cert-Manager** with self-signed certificates, which allows Kong to see valid HTTPS traffic and accept session cookies natively.
|
||||
* **[ ] Persistence Layer:** Deploy a **Local Path Provisioner** or **HostPath** storage class for Loki and Prometheus
|
||||
so that metrics and dashboard configurations survive a `k3d cluster stop`.
|
||||
* **[ ] Resource Quotas:** Define `resources: requests/limits` for the LGTM stack.
|
||||
```bash
|
||||
# Triggers the HPA to scale from 2 -> 6 replicas
|
||||
cd scripts/tests
|
||||
./stress-blog.sh
|
||||
```
|
||||
|
||||
Watch the scaling happen in real-time:
|
||||
|
||||
```bash
|
||||
kubectl get hpa -n severed-apps -w
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user