added parts

2025-12-30 23:53:49 -05:00
parent 5a12dd0444
commit 786a98f4c5
12 changed files with 1248 additions and 68 deletions
--- a/_posts/architectures/2025-12-27-architecture-0.1.md
+++ b/_posts/architectures/2025-12-27-architecture-0.1.md
@@ -1,17 +0,0 @@
---
-layout: post
-title: Architecture V1 (WIP)
-date: 2025-12-27 02:00:00 -0400
-categories:
-  - architectures
---
-
-## Monitoring
-
-```bash
-.
-└── alloy
-	├── config
-	│   └── config.alloy
-	└── docker-compose.yml
-```
--- a/_posts/blog_app/2025-12-27-concepts.md
+++ b/_posts/blog_app/2025-12-27-concepts.md
@@ -0,0 +1,24 @@
+---
+layout: post
+title: 'Kubernetes vs Docker'
+date: 2025-12-27 02:00:00 -0400
+categories:
+  - blog_app
+---
+
+# Kubernetes Concepts Cheat Sheet
+
+| Object      | Docker Equivalent              | Kubernetes Purpose                                                |
+| ----------- | ------------------------------ | ----------------------------------------------------------------- |
+| Node        | The Host Machine               | A physical or virtual server in the cluster.                      |
+| Pod         | A Container                    | The smallest deployable unit (can contain multiple containers).   |
+| Deployments | `docker-compose up`            | Manages the lifecycle and scaling of Pods.                        |
+| Services    | Network Aliases                | Provides a stable DNS name/IP for a group of Pods.                |
+| HPA         | Auto-Scaling Group             | Automatically scales replicas based on traffic/load.              |
+| Ingress     | Nginx Proxy / Traefik          | Manages external access to Services via HTTP/HTTPS.               |
+| ConfigMap   | `docker run -v config:/etc...` | Decouples configuration files from the container image.           |
+| Secret      | Environment Variables (Secure) | Stores sensitive data (passwords, tokens) encoded in Base64.      |
+| DaemonSet   | `mode: global` (Swarm)         | Ensures one copy of a Pod runs on every Node (logs/monitoring).   |
+| StatefulSet | N/A                            | Manages apps requiring stable identities and storage (Databases). |
+
+[[2025-12-27-part-1]]
--- a/_posts/blog_app/2025-12-27-intro.md
+++ b/_posts/blog_app/2025-12-27-intro.md
@@ -0,0 +1,27 @@
+---
+layout: post
+title: 'Deploying the Severed Blog'
+date: 2025-12-28 02:00:00 -0400
+categories:
+  - blog_app
+highlight: true
+---
+
+# Introduction
+
+We are taking a simple static website, the **Severed Blog**, and engineering a proper infrastructure around it.
+
+Anyone can run `docker run nginx`. The real engineering challenge is building the **platform** that keeps that application alive, scalable, and observable.
+
+In this project, we will build a local Kubernetes cluster that mimics a real cloud environment. We will not just deploy the app; we will implement:
+
+- **High Availability:** Running multiple copies so the site never goes down.
+- **Auto-Scaling:** Automatically detecting traffic spikes and launching new pods.
+- **Observability:** Using the LGTM stack (Loki, Grafana, Prometheus) to visualize exactly what is happening inside the cluster.
+
+The infra code can be found in [here](https://git.severed.ink/Severed/Severed-Infra).
+The blog code can be found in [here](https://git.severed.ink/Severed/Severed-Blog).
+
+Let's start by building the foundation.
+
+[[2025-12-27-part-1]]
--- a/_posts/blog_app/2025-12-27-part-1.md
+++ b/_posts/blog_app/2025-12-27-part-1.md
@@ -0,0 +1,135 @@
+---
+layout: post
+title: 'Step 1: K3d Cluster Architecture'
+date: 2025-12-28 03:00:00 -0400
+categories:
+  - blog_app
+highlight: true
+---
+
+[[2025-12-27-intro]]
+
+# 1. K3d Cluster Architecture
+
+In a standard Docker setup, containers share the host's kernel and networking space directly. In Kubernetes, we introduce an abstraction layer: a **Cluster**. For this project, we use **K3d**, which packages **K3s** (a lightweight production-grade K8s distribution) into Docker containers.
+
+```text
+Severed-Infra % tree
+.
+├── README.md
+├── apps
+│   ├── severed-blog-config.yaml
+│   ├── severed-blog-hpa.yaml
+│   ├── severed-blog-service.yaml
+│   ├── severed-blog.yaml
+│   └── severed-ingress.yaml
+├── infra
+│   ├── alloy-env.yaml
+│   ├── alloy-setup.yaml
+│   ├── dashboard
+│   │   ├── dashboard-admin.yaml
+│   │   ├── permanent-token.yaml
+│   │   └── traefik-config.yaml
+│   ├── observer
+│   │   ├── adapter-values.yaml
+│   │   ├── dashboard-json.yaml
+│   │   ├── grafana-ingress.yaml
+│   │   ├── grafana.yaml
+│   │   ├── loki.yaml
+│   │   └── prometheus.yaml
+│   └── storage
+│       └── openebs-sc.yaml
+├── namespaces.yaml
+└── scripts
+    ├── README.md
+    ├── access-hub.sh
+    ├── deploy-all.sh
+    ├── setup-grafana-creds.sh
+    └── tests
+        ├── generated-202-404-blog.sh
+        └── stress-blog.sh
+```
+
+## 1.1 Multi-Node Simulation
+
+- **Server (Control Plane):** The master node. Runs the API server, scheduler, and etcd.
+- **Agents (Workers):** The worker nodes where our application pods run.
+
+### Setting up the environment
+
+We map port `8080` to the internal Traefik LoadBalancer to access services via `*.localhost`.
+
+```bash
+k3d cluster create severed-cluster \
+  --agents 2 \
+  -p "8080:80@loadbalancer" \
+  -p "8443:443@loadbalancer"
+```
+
+## 1.2 Image Registry Lifecycle
+
+Since our `severed-blog` image is local, we side-load it directly into the cluster's internal image store rather than pushing to Docker Hub.
+
+```bash
+docker build -t severed-blog:v0.3 .
+k3d image import severed-blog:v0.3 -c severed-cluster
+```
+
+## 1.3 Namespaces & Storage
+
+We partition the cluster into logical domains. We also install **OpenEBS** to provide dynamic storage provisioning (PersistentVolumes) for our databases.
+
+**`namespaces.yaml`**
+
+```yaml
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: severed-apps
+---
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: monitoring
+---
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: kubernetes-dashboard
+---
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: openebs
+```
+
+**`infra/storage/openebs-sc.yaml`**
+
+```yaml
+apiVersion: storage.k8s.io/v1
+kind: StorageClass
+metadata:
+  name: severed-storage
+provisioner: openebs.io/local
+reclaimPolicy: Delete
+volumeBindingMode: WaitForFirstConsumer
+```
+
+---
+
+## 1.4. Infrastructure Concepts Cheat Sheet
+
+| Object      | Docker Equivalent              | Kubernetes Purpose                                                |
+| ----------- | ------------------------------ | ----------------------------------------------------------------- |
+| Node        | The Host Machine               | A physical or virtual server in the cluster.                      |
+| Pod         | A Container                    | The smallest deployable unit (can contain multiple containers).   |
+| Deployments | `docker-compose up`            | Manages the lifecycle and scaling of Pods.                        |
+| Services    | Network Aliases                | Provides a stable DNS name/IP for a group of Pods.                |
+| HPA         | Auto-Scaling Group             | Automatically scales replicas based on traffic/load.              |
+| Ingress     | Nginx Proxy / Traefik          | Manages external access to Services via HTTP/HTTPS.               |
+| ConfigMap   | `docker run -v config:/etc...` | Decouples configuration files from the container image.           |
+| Secret      | Environment Variables (Secure) | Stores sensitive data (passwords, tokens) encoded in Base64.      |
+| DaemonSet   | `mode: global` (Swarm)         | Ensures one copy of a Pod runs on _every_ Node (logs/monitoring). |
+| StatefulSet | N/A                            | Manages apps requiring stable identities and storage (Databases). |
+
+[[2025-12-27-part-2]]
--- a/_posts/blog_app/2025-12-27-part-2.md
+++ b/_posts/blog_app/2025-12-27-part-2.md
@@ -0,0 +1,285 @@
+---
+layout: post
+title: 'Step 2: The Application Engine & Auto-Scaling'
+date: 2025-12-28 04:00:00 -0400
+categories:
+  - blog_app
+highlight: true
+---
+
+[[2025-12-27-part-1]]
+
+# 2. The Application Engine & Auto-Scaling
+
+## 2.1 Decoupling Configuration (ConfigMaps)
+
+In Docker, if you need to update an Nginx `default.conf`, you typically `COPY` the file into the image and rebuild it. In Kubernetes, we use a **ConfigMap** to treat configuration as a separate object. By using a ConfigMap, we can update these rules and simply restart the pods to apply changes, no Docker build required.
+
+We use a **ConfigMap** to inject the Nginx configuration.
+
+**`apps/severed-blog-config.yaml`**
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: severed-blog-config
+  namespace: severed-apps
+data:
+  default.conf: |
+    # 1. Define the custom log format
+    log_format observability    '$remote_addr - $remote_user [$time_local] "$request" '
+                                '$status $body_bytes_sent "$http_referer" '
+                                '"$http_user_agent" "$request_time"';
+
+    server {
+      listen           80;
+      server_name      localhost;
+      root             /usr/share/nginx/html;
+      index            index.html index.htm;
+
+      # 2. Apply the format to stdout
+      access_log       /dev/stdout observability;
+      error_log        /dev/stderr;
+
+      # gzip compression
+      gzip             on;
+      gzip_types       text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
+      gzip_vary        on;
+      gzip_min_length  1000;
+
+      # assets (images, fonts, favicons) - cache for 1 Year
+      location ~* \.(jpg|jpeg|gif|png|ico|svg|woff|woff2|ttf|eot)$ {
+        expires    365d;
+        add_header Cache-Control "public, no-transform";
+        try_files  $uri =404;
+      }
+
+      # code (css, js) - cache for 1 month
+      location ~* \.(css|js)$ {
+        expires    30d;
+        add_header Cache-Control "public, no-transform";
+        try_files  $uri =404;
+      }
+
+      # standard routing
+      location / {
+        try_files $uri $uri/ $uri.html =404;
+      }
+
+      error_page       404 /404.html;
+      location = /404.html {
+        internal;
+      }
+
+      # logging / lb config
+      real_ip_header   X-Forwarded-For;
+      set_real_ip_from 10.0.0.0/8;
+
+      # metrics endpoint for Alloy/Prometheus
+      location /metrics {
+        stub_status on;
+        access_log off; # Keep noise out of our main logs
+        allow 127.0.0.1;
+        allow 10.0.0.0/8;
+        allow 172.16.0.0/12;
+        deny all;
+      }
+    }
+```
+
+It is a better practice to keep `default.conf` as a standalone file in our repo (e.g., `apps/config/default.conf`) and inject it like:
+
+```shell
+kubectl create configmap severed-blog-config \
+  -n severed-apps \
+  --from-file=default.conf=apps/config/default.conf \
+  --dry-run=client -o yaml | kubectl apply -f -
+```
+
+## 2.2 Deploying the Workload: The Sidecar Pattern
+
+The **Deployment** ensures the desired state is maintained. We requested `replicas: 2`, meaning K8s will ensure two instances of the blog are running across our worker nodes.
+
+**The Sidecar:** We added a second container (`nginx-prometheus-exporter`) to the same Pod.
+
+1. **Web Container:** Serves the blog content.
+2. **Exporter Container:** Scrapes the Web container's local `/metrics` endpoint and translates it into Prometheus format on port `9113`.
+
+**`apps/severed-blog.yaml`**
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: severed-blog
+  namespace: severed-apps
+spec:
+  replicas: 2
+  selector:
+    matchLabels:
+      app: severed-blog
+  template:
+    metadata:
+      labels:
+        app: severed-blog
+    spec:
+      containers:
+        - name: web
+          image: severed-blog:v0.3
+          imagePullPolicy: Never
+          ports:
+            - containerPort: 80
+          resources:
+            requests:
+              cpu: '50m'
+              memory: '64Mi'
+            limits:
+              cpu: '200m'
+              memory: '128Mi'
+          volumeMounts:
+            - name: nginx-config-vol
+              mountPath: /etc/nginx/conf.d/default.conf
+              subPath: default.conf
+
+        - name: exporter
+          image: nginx/nginx-prometheus-exporter:latest
+          args:
+            - -nginx.scrape-uri=http://localhost:80/metrics
+          ports:
+            - containerPort: 9113
+              name: metrics
+          resources:
+            requests:
+              cpu: '10m'
+              memory: '32Mi'
+            limits:
+              cpu: '50m'
+              memory: '64Mi'
+
+      volumes:
+        - name: nginx-config-vol
+          configMap:
+            name: severed-blog-config
+```
+
+The `spec.volumes` block references our ConfigMap, and `volumeMounts` places that data exactly where Nginx expects its
+configuration.
+
+### 2.2.1 Internal Networking (Services)
+
+Pods are ephemeral; they die and get new IP addresses. If we pointed our Ingress directly at a Pod IP, the site would break every time a pod restarted.
+
+We use a **Service** to solve this. A Service provides a stable Virtual IP (ClusterIP) and an internal DNS name (`severed-blog-service.severed-apps.svc.cluster.local`) that load balances traffic to any Pod matching the selector: `app: severed-blog`.
+
+**`apps/severed-blog-service.yaml`**
+
+```yaml
+apiVersion: v1
+kind: Service
+metadata:
+  name: severed-blog-service
+  namespace: severed-apps
+spec:
+  selector:
+    app: severed-blog
+  ports:
+    - protocol: TCP
+      port: 80
+      targetPort: 80
+  type: ClusterIP
+```
+
+## 2.3 Traffic Routing (Ingress)
+
+External users cannot talk to Pods directly. Traffic flows: **Internet → Ingress → Service → Pod**.
+
+1. **The Service:** Acts as an internal LoadBalancer with a stable DNS name.
+2. **The Ingress:** Acts as a reverse proxy (Traefik) that reads the URL hostname.
+
+**`apps/severed-ingress.yaml`**
+
+```yaml
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: severed-ingress
+  namespace: severed-apps
+  annotations:
+    traefik.ingress.kubernetes.io/router.entrypoints: web
+spec:
+  rules:
+    # ONLY accept traffic for this specific hostname
+    - host: blog.localhost
+      http:
+        paths:
+          - path: /
+            pathType: Prefix
+            backend:
+              service:
+                name: severed-blog-service
+                port:
+                  number: 80
+```
+
+## 2.4 Auto-Scaling (HPA)
+
+We implemented a **Horizontal Pod Autoscaler (HPA)** that scales the blog based on three metrics:
+
+1. **CPU:** Target 90% of _Requests_ (not Limits).
+2. **Memory:** Target 80% of _Requests_.
+3. **Traffic (RPS):** Target 500 requests per second per pod.
+
+To prevent scaling up and down too fast, we added a **Stabilization Window** and a strict **Scale Up Limit** (max 1 pod every 15s). This prevents the cluster from exploding due to 1-second spikes.
+
+**`apps/severed-blog-hpa.yaml`**
+
+```yaml
+apiVersion: autoscaling/v2
+kind: HorizontalPodAutoscaler
+metadata:
+  name: severed-blog-hpa
+  namespace: severed-apps
+spec:
+  scaleTargetRef:
+    apiVersion: apps/v1
+    kind: Deployment
+    name: severed-blog
+  minReplicas: 2 # Never drop below 2 for HA
+  maxReplicas: 6 # Maximum number of pods to prevent cluster exhaustion
+  metrics:
+    - type: Resource
+      resource:
+        name: cpu
+        target:
+          type: Utilization
+          averageUtilization: 90 # Scale up if CPU Usage exceeds 90%
+    - type: Resource
+      resource:
+        name: memory
+        target:
+          type: Utilization
+          averageUtilization: 80 # Scale up if RAM Usage exceeds 80%
+    - type: Pods
+      pods:
+        metric:
+          name: nginx_http_requests_total
+        target:
+          type: AverageValue
+          averageValue: '500' # Scale up if requests per second > 500 per pod
+  behavior:
+    scaleDown:
+      stabilizationWindowSeconds: 60 # 60 sec before removing a pod
+      policies:
+        - type: Percent
+          value: 100
+          periodSeconds: 15
+    scaleUp:
+      stabilizationWindowSeconds: 60
+      policies:
+        - type: Pods
+          value: 1
+          periodSeconds: 60
+```
+
+[[2025-12-27-part-3]]
--- a/_posts/blog_app/2025-12-27-part-3.md
+++ b/_posts/blog_app/2025-12-27-part-3.md
@@ -0,0 +1,663 @@
+---
+layout: post
+title: 'Step 3: Observability (LGTM, KSM)'
+date: 2025-12-28 05:00:00 -0400
+categories:
+  - blog_app
+highlight: true
+---
+
+[[2025-12-27-part-2]]
+
+# 3. Observability: The LGTM Stack
+
+In a distributed cluster, logs and metrics are scattered across different pods and nodes. We centralized monitoring using the LGTM Stack (Loki, Grafana, Prometheus) plus **Kube State Metrics** and the **Prometheus Adapter** to centralize our logs and metrics.
+
+## 3.1 The Databases (StatefulSets)
+
+- **Prometheus:** Scrapes metrics. We updated the config to scrape **Kube State Metrics** via its internal DNS Service.
+- **Loki:** Aggregates logs. Configured with a 168h (7-day) retention period.
+
+**`infra/observer/prometheus.yaml`**
+
+```yaml
+# Configuration
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: prometheus-config
+  namespace: monitoring
+data:
+  prometheus.yml: |
+    global:
+      scrape_interval: 15s
+      evaluation_interval: 15s
+    storage:
+      tsdb:
+        out_of_order_time_window: 1m
+
+    scrape_configs:
+      # 1. Scrape Prometheus itself (Health Check)
+      - job_name: 'prometheus'
+        static_configs:
+          - targets: ['localhost:9090']
+
+      # 2. Scrape Kube State Metrics (KSM)
+      # We use the internal DNS: service-name.namespace.svc.cluster.local:port
+      - job_name: 'kube-state-metrics'
+        static_configs:
+          - targets: ['kube-state-metrics.monitoring.svc.cluster.local:8080']
+
+---
+# Service
+apiVersion: v1
+kind: Service
+metadata:
+  name: prometheus
+  namespace: monitoring
+spec:
+  type: ClusterIP
+  selector:
+    app: prometheus
+  ports:
+    - port: 9090
+      targetPort: 9090
+
+---
+# The Database (StatefulSet)
+apiVersion: apps/v1
+kind: StatefulSet
+metadata:
+  name: prometheus
+  namespace: monitoring
+spec:
+  serviceName: prometheus
+  replicas: 1
+  selector:
+    matchLabels:
+      app: prometheus
+  template:
+    metadata:
+      labels:
+        app: prometheus
+    spec:
+      containers:
+        - name: prometheus
+          image: prom/prometheus:latest
+          args:
+            - '--config.file=/etc/prometheus/prometheus.yml'
+            - '--web.enable-remote-write-receiver'
+            - '--storage.tsdb.path=/prometheus'
+            - '--web.console.libraries=/usr/share/prometheus/console_libraries'
+            - '--web.console.templates=/usr/share/prometheus/consoles'
+          ports:
+            - containerPort: 9090
+          volumeMounts:
+            - name: config
+              mountPath: /etc/prometheus
+            - name: data
+              mountPath: /prometheus
+      volumes:
+        - name: config
+          configMap:
+            name: prometheus-config
+  volumeClaimTemplates:
+    - metadata:
+        name: data
+      spec:
+        accessModes: ['ReadWriteOnce']
+        storageClassName: 'openebs-hostpath'
+        resources:
+          requests:
+            storage: 5Gi
+```
+
+**`infra/observer/loki.yaml`**
+
+```yaml
+# --- Configuration ---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: loki-config
+  namespace: monitoring
+data:
+  local-config.yaml: |
+    auth_enabled: false
+    server:
+      http_listen_port: 3100
+    common:
+      path_prefix: /loki
+      storage:
+        filesystem:
+          chunks_directory: /loki/chunks
+          rules_directory: /loki/rules
+      replication_factor: 1
+      ring:
+        instance_addr: 127.0.0.1
+        kvstore:
+          store: inmemory
+    schema_config:
+      configs:
+        - from: 2020-10-24
+          store: tsdb
+          object_store: filesystem
+          schema: v13
+          index:
+            prefix: index_
+            period: 24h
+
+---
+# --- Storage Service (Headless) ---
+# Required for StatefulSets to maintain stable DNS entries.
+apiVersion: v1
+kind: Service
+metadata:
+  name: loki
+  namespace: monitoring
+spec:
+  type: ClusterIP
+  selector:
+    app: loki
+  ports:
+    - port: 3100
+      targetPort: 3100
+      name: http-metrics
+
+---
+# --- The Database (StatefulSet) ---
+apiVersion: apps/v1
+kind: StatefulSet
+metadata:
+  name: loki
+  namespace: monitoring
+spec:
+  serviceName: loki
+  replicas: 1
+  selector:
+    matchLabels:
+      app: loki
+  template:
+    metadata:
+      labels:
+        app: loki
+    spec:
+      containers:
+        - name: loki
+          image: grafana/loki:latest
+          args:
+            - -config.file=/etc/loki/local-config.yaml
+          ports:
+            - containerPort: 3100
+              name: http-metrics
+          volumeMounts:
+            - name: config
+              mountPath: /etc/loki
+            - name: data
+              mountPath: /loki
+      volumes:
+        - name: config
+          configMap:
+            name: loki-config
+  # Persistent Storage
+  volumeClaimTemplates:
+    - metadata:
+        name: data
+      spec:
+        accessModes: ['ReadWriteOnce']
+        storageClassName: 'openebs-hostpath'
+        resources:
+          requests:
+            storage: 5Gi
+```
+
+## 3.2 The Bridge: Prometheus Adapter & KSM
+
+Standard HPA only understands CPU and Memory. To scale on **Requests Per Second**, we needed two extra components.
+
+**Helm (Package Manager)**
+You will notice `kube-state-metrics` and `prometheus-adapter` are missing from our file tree. That is because we install them using **Helm**. Helm allows us to install complex, pre-packaged applications ("Charts") without writing thousands of lines of YAML. We only provide a `values.yaml` file to override specific settings.
+
+1. **Kube State Metrics (KSM):** A service that listens to the Kubernetes API and generates metrics about the state of objects (e.g., `kube_pod_created`).
+2. **Prometheus Adapter:** Installs via Helm. We use `infra/observer/adapter-values.yaml` to configure how it translates Prometheus queries into Kubernetes metrics.
+
+**`infra/observer/adapter-values.yaml`**
+
+```yaml
+prometheus:
+  url: http://prometheus.monitoring.svc.cluster.local
+  port: 9090
+
+rules:
+  custom:
+    - seriesQuery: 'nginx_http_requests_total{pod!="",namespace!=""}'
+      resources:
+        overrides:
+          namespace: { resource: 'namespace' }
+          pod: { resource: 'pod' }
+      name:
+        matches: '^(.*)_total'
+        as: 'nginx_http_requests_total'
+      metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[1m])'
+```
+
+## 3.3 The Agent: Grafana Alloy (DaemonSets)
+
+We need to collect logs from every node in the cluster.
+
+- **DaemonSet vs. Deployment:** A Deployment ensures _n_ replicas exist somewhere. A **DaemonSet** ensures exactly **one** Pod runs on **every** Node. This is perfect for infrastructure agents (logging, networking, monitoring).
+- **Downward API:** We need to inject the Pod's own name and namespace into its environment variables so it knows "who it is."
+
+**`infra/alloy-env.yaml`**
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: monitoring-env
+  namespace: monitoring
+data:
+  LOKI_URL: 'http://loki.monitoring.svc:3100/loki/api/v1/push'
+  PROM_URL: 'http://prometheus.monitoring.svc:9090/api/v1/write'
+```
+
+**`infra/alloy-setup.yaml`**
+
+```yaml
+# --- RBAC configuration ---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: alloy-sa
+  namespace: monitoring
+
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: alloy-cluster-role
+rules:
+  # 1. Standard API Access
+  - apiGroups: ['']
+    resources: ['nodes', 'nodes/proxy', 'services', 'endpoints', 'pods']
+    verbs: ['get', 'list', 'watch']
+  # 2. ALLOW METRICS ACCESS (Crucial for cAdvisor/Kubelet)
+  - apiGroups: ['']
+    resources: ['nodes/stats', 'nodes/metrics']
+    verbs: ['get']
+  # 3. Log Access
+  - apiGroups: ['']
+    resources: ['pods/log']
+    verbs: ['get', 'list', 'watch']
+  # 4. Non-Resource URLs (Sometimes needed for /metrics endpoints)
+  - nonResourceURLs: ['/metrics']
+    verbs: ['get']
+
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: alloy-cluster-binding
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: alloy-cluster-role
+subjects:
+  - kind: ServiceAccount
+    name: alloy-sa
+    namespace: monitoring
+
+---
+# --- Alloy pipeline configuration ---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: alloy-config
+  namespace: monitoring
+data:
+  config.alloy: |
+    // 1. Discovery: Find all pods
+    discovery.kubernetes "k8s_pods" {
+      role = "pod"
+    }
+
+    // 2. Relabeling: Filter and Label "severed-blog" pods
+    discovery.relabel "blog_pods" {
+      targets = discovery.kubernetes.k8s_pods.targets
+
+      rule {
+        action = "keep"
+        source_labels = ["__meta_kubernetes_pod_label_app"]
+        regex = "severed-blog"
+      }
+
+      // Explicitly set 'pod' and 'namespace' labels for the Adapter
+      rule {
+        action = "replace"
+        source_labels = ["__meta_kubernetes_pod_name"]
+        target_label = "pod"
+      }
+
+      rule {
+        action = "replace"
+        source_labels = ["__meta_kubernetes_namespace"]
+        target_label = "namespace"
+      }
+
+      // Route to the sidecar exporter port
+      rule {
+        action = "replace"
+        source_labels = ["__address__"]
+        target_label = "__address__"
+        regex = "([^:]+)(?::\\d+)?"
+        replacement = "$1:9113"
+      }
+    }
+
+    // 3. Direct Nginx Scraper
+    prometheus.scrape "nginx_scraper" {
+      targets = discovery.relabel.blog_pods.output
+      forward_to = [prometheus.remote_write.metrics_service.receiver]
+      job_name   = "integrations/nginx"
+    }
+
+    // 4. Host Metrics (Unix Exporter)
+    prometheus.exporter.unix "host" {
+      rootfs_path = "/host/root"
+      sysfs_path  = "/host/sys"
+      procfs_path = "/host/proc"
+    }
+
+    prometheus.scrape "host_scraper" {
+      targets    = prometheus.exporter.unix.host.targets
+      forward_to = [prometheus.remote_write.metrics_service.receiver]
+    }
+
+    // 5. Remote Write: Send to Prometheus
+    prometheus.remote_write "metrics_service" {
+      endpoint {
+        url = sys.env("PROM_URL")
+      }
+    }
+
+    // 6. Logs Pipeline: Send to Loki
+    loki.source.kubernetes "pod_logs" {
+      targets    = discovery.relabel.blog_pods.output
+      forward_to = [loki.write.default.receiver]
+    }
+
+    loki.write "default" {
+      endpoint {
+        url = sys.env("LOKI_URL")
+      }
+    }
+
+    // 7. Kubelet Scraper (cAdvisor for Container Metrics)
+    discovery.kubernetes "k8s_nodes" {
+      role = "node"
+    }
+
+    prometheus.scrape "kubelet_cadvisor" {
+      targets = discovery.kubernetes.k8s_nodes.targets
+      scheme  = "https"
+      metrics_path = "/metrics/cadvisor"
+      job_name     = "integrations/kubernetes/cadvisor"
+
+      tls_config {
+        insecure_skip_verify = true
+      }
+      bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
+
+      forward_to = [prometheus.remote_write.metrics_service.receiver]
+    }
+
+---
+# --- Agent Deployment (DaemonSet) ---
+apiVersion: apps/v1
+kind: DaemonSet
+metadata:
+  name: alloy
+  namespace: monitoring
+spec:
+  selector:
+    matchLabels:
+      name: alloy
+  template:
+    metadata:
+      labels:
+        name: alloy
+    spec:
+      serviceAccountName: alloy-sa
+      hostNetwork: true
+      hostPID: true
+      dnsPolicy: ClusterFirstWithHostNet
+      containers:
+        - name: alloy
+          image: grafana/alloy:latest
+          args:
+            - run
+            - --server.http.listen-addr=0.0.0.0:12345
+            - --storage.path=/var/lib/alloy/data
+            - /etc/alloy/config.alloy
+          envFrom:
+            - configMapRef:
+                name: monitoring-env
+                optional: false
+          volumeMounts:
+            - name: config
+              mountPath: /etc/alloy
+            - name: logs
+              mountPath: /var/log
+            - name: proc
+              mountPath: /host/proc
+              readOnly: true
+            - name: sys
+              mountPath: /host/sys
+              readOnly: true
+            - name: root
+              mountPath: /host/root
+              readOnly: true
+      volumes:
+        - name: config
+          configMap:
+            name: alloy-config
+        - name: logs
+          hostPath:
+            path: /var/log
+        - name: proc
+          hostPath:
+            path: /proc
+        - name: sys
+          hostPath:
+            path: /sys
+        - name: root
+          hostPath:
+            path: /
+```
+
+## 3.4 Visualization: Grafana
+
+We deployed Grafana with pre-loaded dashboards via ConfigMaps.
+
+**Key Dashboards Created:**
+
+1. **Cluster Health:** CPU/Memory saturation.
+2. **HPA Live Status:** A custom table showing the _real_ scaling drivers (RPS, CPU Request %) vs the HPA's reaction.
+
+**`infra/observer/grafana.yaml`**
+
+```yaml
+# 1. Datasources (Connection to Loki/Prom)
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: grafana-datasources
+  namespace: monitoring
+data:
+  datasources.yaml: |
+    apiVersion: 1
+    datasources:
+      - name: Prometheus
+        type: prometheus
+        access: proxy
+        url: http://prometheus.monitoring.svc:9090
+        isDefault: false
+      - name: Loki
+        type: loki
+        access: proxy
+        url: http://loki.monitoring.svc:3100
+        isDefault: true
+
+---
+# 2. Dashboard Provider (Tells Grafana to load from /var/lib/grafana/dashboards)
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: grafana-dashboard-provider
+  namespace: monitoring
+data:
+  dashboard-provider.yaml: |
+    apiVersion: 1
+    providers:
+      - name: 'Severed Dashboards'
+        orgId: 1
+        folder: ''
+        type: file
+        disableDeletion: false
+        updateIntervalSeconds: 10 # Allow editing in UI, but it resets on restart
+        options:
+          path: /var/lib/grafana/dashboards
+
+---
+# 3. Service
+apiVersion: v1
+kind: Service
+metadata:
+  name: grafana-service
+  namespace: monitoring
+spec:
+  type: LoadBalancer
+  selector:
+    app: grafana
+  ports:
+    - protocol: TCP
+      port: 3000
+      targetPort: 3000
+
+---
+# 4. Deployment (The App)
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: grafana
+  namespace: monitoring
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: grafana
+  template:
+    metadata:
+      labels:
+        app: grafana
+    spec:
+      containers:
+        - name: grafana
+          image: grafana/grafana:latest
+          ports:
+            - containerPort: 3000
+
+          env:
+            - name: GF_SECURITY_ADMIN_USER
+              valueFrom:
+                secretKeyRef:
+                  name: grafana-secrets
+                  key: admin-user
+            - name: GF_SECURITY_ADMIN_PASSWORD
+              valueFrom:
+                secretKeyRef:
+                  name: grafana-secrets
+                  key: admin-password
+
+            - name: GF_AUTH_ANONYMOUS_ENABLED
+              value: 'true'
+            - name: GF_AUTH_ANONYMOUS_ORG_ROLE
+              value: 'Viewer'
+            - name: GF_AUTH_ANONYMOUS_ORG_NAME
+              value: 'Main Org.'
+
+          volumeMounts:
+            - name: grafana-datasources
+              mountPath: /etc/grafana/provisioning/datasources
+            - name: grafana-dashboard-provider
+              mountPath: /etc/grafana/provisioning/dashboards
+            - name: grafana-dashboards-json
+              mountPath: /var/lib/grafana/dashboards
+            - name: grafana-storage
+              mountPath: /var/lib/grafana
+      volumes:
+        - name: grafana-datasources
+          configMap:
+            name: grafana-datasources
+        - name: grafana-dashboard-provider
+          configMap:
+            name: grafana-dashboard-provider
+        - name: grafana-dashboards-json
+          configMap:
+            name: grafana-dashboards-json
+        - name: grafana-storage
+          emptyDir: {}
+```
+
+In the Deployment above, you see references to `grafana-secrets`. However, this file is **not** in our git repository.
+
+```yaml
+- name: GF_SECURITY_ADMIN_PASSWORD
+  valueFrom:
+    secretKeyRef:
+      name: grafana-secrets # <--- where is this?
+      key: admin-password
+```
+
+We don't commit it to version control. In our `deploy-all.sh` script, we generate this secret imperatively using `kubectl create secret generic`. In a real production environment, we would use tools like **ExternalSecrets** or **SealedSecrets** to inject these safely.
+
+**`dashboard-json.yaml`**
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: grafana-dashboards-json
+  namespace: monitoring
+data:
+  severed-health.json: |
+    ...
+```
+
+Just like our blog, we need an Ingress to access Grafana. Notice we map a different hostname (`grafana.localhost`) to the Grafana service port (`3000`).
+
+**`infra/observer/grafana-ingress.yaml`**
+
+```yaml
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: grafana-ingress
+  namespace: monitoring
+  annotations:
+    traefik.ingress.kubernetes.io/router.entrypoints: web
+spec:
+  rules:
+    - host: grafana.localhost
+      http:
+        paths:
+          - path: /
+            pathType: Prefix
+            backend:
+              service:
+                name: grafana-service # ...send them to Grafana
+                port:
+                  number: 3000
+```
+
+[[2025-12-27-part-4]]
--- a/_posts/blog_app/2025-12-27-part-4.md
+++ b/_posts/blog_app/2025-12-27-part-4.md
@@ -0,0 +1,94 @@
+---
+layout: post
+title: 'Step 4: RBAC & Security'
+date: 2025-12-28 06:00:00 -0400
+categories:
+  - blog_app
+highlight: true
+---
+
+[[2025-12-27-part-3]]
+
+# 4. Cluster Management & Security
+
+## 4.1 RBAC: Admin user
+
+In Kubernetes, a **ServiceAccount** is an identity for a process or a human to talk to the API. We created an `admin-user` but identities have no power by default. We must link them to a **ClusterRole** (a set of permissions) using a **ClusterRoleBinding**.
+
+- **ServiceAccount**: Creates the `admin-user` identity in the dashboard namespace.
+- **ClusterRoleBinding**: Grants this specific user the `cluster-admin` role (Full access to the entire cluster).
+
+**`infra/dashboard/dashboard-admin.yaml`**:
+
+```yaml
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: admin-user
+  namespace: kubernetes-dashboard
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: admin-user
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: cluster-admin
+subjects:
+  - kind: ServiceAccount
+    name: admin-user
+    namespace: kubernetes-dashboard
+```
+
+## 4.2 Authentication: Permanent Tokens
+
+Modern Kubernetes no longer generates tokens automatically for ServiceAccounts. To log into the UI, we need a static, long-lived credential.
+
+**`infra/dashboard/permanent-token.yaml`**:
+
+```yaml
+apiVersion: v1
+kind: Secret
+metadata:
+  name: admin-user-token
+  namespace: kubernetes-dashboard
+  annotations:
+    kubernetes.io/service-account.name: 'admin-user'
+type: kubernetes.io/service-account-token
+```
+
+This creates a **Secret** of type `kubernetes.io/service-account-token`. By adding the annotation `kubernetes.io/service-account.name: "admin-user"`, K8s automatically populates the Secret with a signed JWT token that we can use to bypass the login screen.
+
+## 4.3 Localhost: Ingress & Cookies
+
+The Kubernetes Dashboard requires HTTPS, which creates issues with self-signed certificates on `localhost`. We need to reconfigure **Traefik** (the internal reverse proxy bundled with K3s) to allow insecure backends.
+
+**Helm & CRDs**
+K3s installs Traefik using **Helm** (the Kubernetes Package Manager). Usually, you manage Helm via CLI (`helm install`). However, K3s includes a **Helm Controller** that lets us manage charts using YAML files called **HelmChartConfigs** (a Custom Resource Definition or CRD).
+
+This allows us to reconfigure a complex Helm deployment using a simple declarative file.
+
+**`infra/dashboard/traefik-config.yaml`**
+
+```yaml
+apiVersion: helm.cattle.io/v1
+kind: HelmChartConfig
+metadata:
+  name: traefik
+  namespace: kube-system
+spec:
+  valuesContent: |-
+    additionalArguments:
+      # Tell Traefik to ignore SSL errors when talking to internal services
+      - "--serversTransport.insecureSkipVerify=true"
+```
+
+## 4.4. Stress Testing & Verification
+
+We used **Apache Bench (`ab`)** to generate massive concurrency capable of triggering the HPA. This test results in tens of thousands of requests which triggers the RPS rule in out HPA configuration.
+
+```bash
+# Generate 50 concurrent users for 5 minutes
+ab -k -c 50 -t 300 -H "Host: blog.localhost" http://127.0.0.1:8080/
+```
--- a/_posts/releases/2025-12-27-release-0.1.md
+++ b/_posts/releases/2025-12-27-release-0.1.md
@@ -7,9 +7,7 @@ categories:
 highlight: true
 ---

-This blog serves as the public documentation for **Severed**. While the main site provides the high-level vision,
-this space is dedicated to the technical source-of-truth for the experiments, infrastructure-as-code, and proprietary
-tooling that are used within the cluster.
+This blog serves as the public documentation for **Severed**. This space is dedicated to the technical source-of-truth for the experiments, infrastructure-as-code, and proprietary tooling that are used.

 ### Ecosystem

@@ -23,31 +21,24 @@ The following services are currently active within the `severed.ink` network:

 ### Core Infrastructure

-The ecosystem is powered by a **Home Server Cluster** managed via a **Kubernetes (k3s)** distribution. This setup
-prioritizes local sovereignty and GitOps principles.
+The ecosystem is powered by a hybrid **Home Server Cluster** managed via a **Kubernetes (k3s)** distribution and AWS services. We prioritize local sovereignty and GitOps principles.

- **CI Pipeline:** Automated build and test suites are orchestrated by a private Jenkins server utilizing self-hosted
-  runners.
+- **CI Pipeline:** Automated build and test suites are orchestrated by a private Jenkins server utilizing self-hosted runners.
 - **GitOps & Deployment:** Automated synchronization and state enforcement via **ArgoCD**.
 - **Data Layer:** Persistent storage managed by **PostgreSQL**.
 - **Telemetry:** Full-stack observability provided by **Prometheus** (metrics) and **Loki** (logs) via **Grafana**.
- **Security Layer:** Push/Pull GitOps operations require an active connection to a **WireGuard (VPN)** for remote
-  access.
+- **Security Layer:** Push/Pull GitOps operations require an active connection to a **WireGuard (VPN)** for remote access.

 ### Roadmap

-Engineering efforts are currently focused on the following milestones:
+Efforts are currently focused on the following milestones:

 1. **OSS Strategy:** Transitioning from a hybrid of AWS managed services toward a ~100% Open Source Software (OSS) stack.
-2. **High Availability (HA):** Implementing a "Cloud RAID-1" failover mechanism. In the event of home cluster
-   instability, traffic automatically routes to a secondary cloud-instantiated Kubernetes cluster as a temporary
-   failover.
-3. **Data Resilience:** Automating PostgreSQL backup strategies to ensure parity between the primary cluster and the
-   cloud-based failover.
-4. **Storage Infrastructure:** Integrating a dedicated **TrueNAS** node to move from local SATA/NVMe reliance to a
-   centralized, redundant storage architecture.
+2. **High Availability (HA):** Implementing a "Cloud RAID-1" failover mechanism. In the event of home cluster instability, traffic automatically routes to a secondary cloud-instantiated Kubernetes cluster as a temporary failover.
+3. **Data Resilience:** Automating PostgreSQL backup strategies to ensure parity between the primary cluster and the cloud-based failover.
+4. **Storage Infrastructure:** Integrating a dedicated **TrueNAS** node to move from local SATA/NVMe reliance to a centralized, redundant storage architecture.

-### Terminal Redirect
+### Redirect

 For the full technical portfolio and expertise highlights, visit the main site: