From 786a98f4c5276a8ef8a4f4236beb12e0d05cf459 Mon Sep 17 00:00:00 2001 From: wboughattas Date: Tue, 30 Dec 2025 23:53:49 -0500 Subject: [PATCH] added parts --- Gemfile | 1 + Gemfile.lock | 9 + README.md | 33 - _config.yml | 1 + .../2025-12-27-architecture-0.1.md | 17 - _posts/blog_app/2025-12-27-concepts.md | 24 + _posts/blog_app/2025-12-27-intro.md | 27 + _posts/blog_app/2025-12-27-part-1.md | 135 ++++ _posts/blog_app/2025-12-27-part-2.md | 285 ++++++++ _posts/blog_app/2025-12-27-part-3.md | 663 ++++++++++++++++++ _posts/blog_app/2025-12-27-part-4.md | 94 +++ _posts/releases/2025-12-27-release-0.1.md | 27 +- 12 files changed, 1248 insertions(+), 68 deletions(-) delete mode 100644 README.md delete mode 100644 _posts/architectures/2025-12-27-architecture-0.1.md create mode 100644 _posts/blog_app/2025-12-27-concepts.md create mode 100644 _posts/blog_app/2025-12-27-intro.md create mode 100644 _posts/blog_app/2025-12-27-part-1.md create mode 100644 _posts/blog_app/2025-12-27-part-2.md create mode 100644 _posts/blog_app/2025-12-27-part-3.md create mode 100644 _posts/blog_app/2025-12-27-part-4.md diff --git a/Gemfile b/Gemfile index 10a5d60..7a0aa64 100644 --- a/Gemfile +++ b/Gemfile @@ -5,6 +5,7 @@ gem "minima", "~> 2.5" group :jekyll_plugins do gem "jekyll-feed", "~> 0.12" gem 'jekyll-archives' + gem "jekyll-wikirefs" end # Windows and JRuby does not include zoneinfo files, so bundle the tzinfo-data gem diff --git a/Gemfile.lock b/Gemfile.lock index 5d25436..4b89beb 100644 --- a/Gemfile.lock +++ b/Gemfile.lock @@ -83,6 +83,9 @@ GEM jekyll (>= 3.8, < 5.0) jekyll-watch (2.2.1) listen (~> 3.0) + jekyll-wikirefs (0.0.16) + jekyll + nokogiri (~> 1.13.3) json (2.15.1) kramdown (2.5.1) rexml (>= 3.3.9) @@ -93,13 +96,18 @@ GEM rb-fsevent (~> 0.10, >= 0.10.3) rb-inotify (~> 0.9, >= 0.9.10) mercenary (0.4.0) + mini_portile2 (2.8.9) minima (2.5.2) jekyll (>= 3.5, < 5.0) jekyll-feed (~> 0.9) jekyll-seo-tag (~> 2.1) + nokogiri (1.13.10) + mini_portile2 (~> 2.8.0) + racc (~> 1.4) pathutil (0.16.2) forwardable-extended (~> 2.6) public_suffix (6.0.2) + racc (1.8.1) rake (13.3.0) rb-fsevent (0.11.2) rb-inotify (0.11.1) @@ -169,6 +177,7 @@ DEPENDENCIES jekyll (~> 4.4.1) jekyll-archives jekyll-feed (~> 0.12) + jekyll-wikirefs minima (~> 2.5) tzinfo (>= 1, < 3) tzinfo-data diff --git a/README.md b/README.md deleted file mode 100644 index 148daf1..0000000 --- a/README.md +++ /dev/null @@ -1,33 +0,0 @@ -# Severed blog (Jekyll app) - -**Live Production:** [https://blog.severed.ink/](https://blog.severed.ink/) - -## Development - -**Install Dependencies (Ruby & Node):** - -```bash -bundle install && pnpm install -``` - -**Start Local Server:** - -```bash -pnpm dev -``` - -**Build for Production:** - -```bash -pnpm build -``` - -## Code Quality - -**Format Code Manually:** - -```bash -pnpm format -``` - -_(Formatting also runs automatically on commit via Husky)_ diff --git a/_config.yml b/_config.yml index 88ce63c..b3133b2 100644 --- a/_config.yml +++ b/_config.yml @@ -17,6 +17,7 @@ minima: plugins: - jekyll-feed - jekyll-archives + - jekyll-wikirefs exclude: - .sass-cache/ diff --git a/_posts/architectures/2025-12-27-architecture-0.1.md b/_posts/architectures/2025-12-27-architecture-0.1.md deleted file mode 100644 index 380804a..0000000 --- a/_posts/architectures/2025-12-27-architecture-0.1.md +++ /dev/null @@ -1,17 +0,0 @@ ---- -layout: post -title: Architecture V1 (WIP) -date: 2025-12-27 02:00:00 -0400 -categories: - - architectures ---- - -## Monitoring - -```bash -. -└── alloy - ├── config - │ └── config.alloy - └── docker-compose.yml -``` diff --git a/_posts/blog_app/2025-12-27-concepts.md b/_posts/blog_app/2025-12-27-concepts.md new file mode 100644 index 0000000..fbc793e --- /dev/null +++ b/_posts/blog_app/2025-12-27-concepts.md @@ -0,0 +1,24 @@ +--- +layout: post +title: 'Kubernetes vs Docker' +date: 2025-12-27 02:00:00 -0400 +categories: + - blog_app +--- + +# Kubernetes Concepts Cheat Sheet + +| Object | Docker Equivalent | Kubernetes Purpose | +| ----------- | ------------------------------ | ----------------------------------------------------------------- | +| Node | The Host Machine | A physical or virtual server in the cluster. | +| Pod | A Container | The smallest deployable unit (can contain multiple containers). | +| Deployments | `docker-compose up` | Manages the lifecycle and scaling of Pods. | +| Services | Network Aliases | Provides a stable DNS name/IP for a group of Pods. | +| HPA | Auto-Scaling Group | Automatically scales replicas based on traffic/load. | +| Ingress | Nginx Proxy / Traefik | Manages external access to Services via HTTP/HTTPS. | +| ConfigMap | `docker run -v config:/etc...` | Decouples configuration files from the container image. | +| Secret | Environment Variables (Secure) | Stores sensitive data (passwords, tokens) encoded in Base64. | +| DaemonSet | `mode: global` (Swarm) | Ensures one copy of a Pod runs on every Node (logs/monitoring). | +| StatefulSet | N/A | Manages apps requiring stable identities and storage (Databases). | + +[[2025-12-27-part-1]] diff --git a/_posts/blog_app/2025-12-27-intro.md b/_posts/blog_app/2025-12-27-intro.md new file mode 100644 index 0000000..960c5fe --- /dev/null +++ b/_posts/blog_app/2025-12-27-intro.md @@ -0,0 +1,27 @@ +--- +layout: post +title: 'Deploying the Severed Blog' +date: 2025-12-28 02:00:00 -0400 +categories: + - blog_app +highlight: true +--- + +# Introduction + +We are taking a simple static website, the **Severed Blog**, and engineering a proper infrastructure around it. + +Anyone can run `docker run nginx`. The real engineering challenge is building the **platform** that keeps that application alive, scalable, and observable. + +In this project, we will build a local Kubernetes cluster that mimics a real cloud environment. We will not just deploy the app; we will implement: + +- **High Availability:** Running multiple copies so the site never goes down. +- **Auto-Scaling:** Automatically detecting traffic spikes and launching new pods. +- **Observability:** Using the LGTM stack (Loki, Grafana, Prometheus) to visualize exactly what is happening inside the cluster. + +The infra code can be found in [here](https://git.severed.ink/Severed/Severed-Infra). +The blog code can be found in [here](https://git.severed.ink/Severed/Severed-Blog). + +Let's start by building the foundation. + +[[2025-12-27-part-1]] diff --git a/_posts/blog_app/2025-12-27-part-1.md b/_posts/blog_app/2025-12-27-part-1.md new file mode 100644 index 0000000..92f07e4 --- /dev/null +++ b/_posts/blog_app/2025-12-27-part-1.md @@ -0,0 +1,135 @@ +--- +layout: post +title: 'Step 1: K3d Cluster Architecture' +date: 2025-12-28 03:00:00 -0400 +categories: + - blog_app +highlight: true +--- + +[[2025-12-27-intro]] + +# 1. K3d Cluster Architecture + +In a standard Docker setup, containers share the host's kernel and networking space directly. In Kubernetes, we introduce an abstraction layer: a **Cluster**. For this project, we use **K3d**, which packages **K3s** (a lightweight production-grade K8s distribution) into Docker containers. + +```text +Severed-Infra % tree +. +├── README.md +├── apps +│ ├── severed-blog-config.yaml +│ ├── severed-blog-hpa.yaml +│ ├── severed-blog-service.yaml +│ ├── severed-blog.yaml +│ └── severed-ingress.yaml +├── infra +│ ├── alloy-env.yaml +│ ├── alloy-setup.yaml +│ ├── dashboard +│ │ ├── dashboard-admin.yaml +│ │ ├── permanent-token.yaml +│ │ └── traefik-config.yaml +│ ├── observer +│ │ ├── adapter-values.yaml +│ │ ├── dashboard-json.yaml +│ │ ├── grafana-ingress.yaml +│ │ ├── grafana.yaml +│ │ ├── loki.yaml +│ │ └── prometheus.yaml +│ └── storage +│ └── openebs-sc.yaml +├── namespaces.yaml +└── scripts + ├── README.md + ├── access-hub.sh + ├── deploy-all.sh + ├── setup-grafana-creds.sh + └── tests + ├── generated-202-404-blog.sh + └── stress-blog.sh +``` + +## 1.1 Multi-Node Simulation + +- **Server (Control Plane):** The master node. Runs the API server, scheduler, and etcd. +- **Agents (Workers):** The worker nodes where our application pods run. + +### Setting up the environment + +We map port `8080` to the internal Traefik LoadBalancer to access services via `*.localhost`. + +```bash +k3d cluster create severed-cluster \ + --agents 2 \ + -p "8080:80@loadbalancer" \ + -p "8443:443@loadbalancer" +``` + +## 1.2 Image Registry Lifecycle + +Since our `severed-blog` image is local, we side-load it directly into the cluster's internal image store rather than pushing to Docker Hub. + +```bash +docker build -t severed-blog:v0.3 . +k3d image import severed-blog:v0.3 -c severed-cluster +``` + +## 1.3 Namespaces & Storage + +We partition the cluster into logical domains. We also install **OpenEBS** to provide dynamic storage provisioning (PersistentVolumes) for our databases. + +**`namespaces.yaml`** + +```yaml +apiVersion: v1 +kind: Namespace +metadata: + name: severed-apps +--- +apiVersion: v1 +kind: Namespace +metadata: + name: monitoring +--- +apiVersion: v1 +kind: Namespace +metadata: + name: kubernetes-dashboard +--- +apiVersion: v1 +kind: Namespace +metadata: + name: openebs +``` + +**`infra/storage/openebs-sc.yaml`** + +```yaml +apiVersion: storage.k8s.io/v1 +kind: StorageClass +metadata: + name: severed-storage +provisioner: openebs.io/local +reclaimPolicy: Delete +volumeBindingMode: WaitForFirstConsumer +``` + +--- + +## 1.4. Infrastructure Concepts Cheat Sheet + +| Object | Docker Equivalent | Kubernetes Purpose | +| ----------- | ------------------------------ | ----------------------------------------------------------------- | +| Node | The Host Machine | A physical or virtual server in the cluster. | +| Pod | A Container | The smallest deployable unit (can contain multiple containers). | +| Deployments | `docker-compose up` | Manages the lifecycle and scaling of Pods. | +| Services | Network Aliases | Provides a stable DNS name/IP for a group of Pods. | +| HPA | Auto-Scaling Group | Automatically scales replicas based on traffic/load. | +| Ingress | Nginx Proxy / Traefik | Manages external access to Services via HTTP/HTTPS. | +| ConfigMap | `docker run -v config:/etc...` | Decouples configuration files from the container image. | +| Secret | Environment Variables (Secure) | Stores sensitive data (passwords, tokens) encoded in Base64. | +| DaemonSet | `mode: global` (Swarm) | Ensures one copy of a Pod runs on _every_ Node (logs/monitoring). | +| StatefulSet | N/A | Manages apps requiring stable identities and storage (Databases). | + +[[2025-12-27-part-2]] diff --git a/_posts/blog_app/2025-12-27-part-2.md b/_posts/blog_app/2025-12-27-part-2.md new file mode 100644 index 0000000..f9fc890 --- /dev/null +++ b/_posts/blog_app/2025-12-27-part-2.md @@ -0,0 +1,285 @@ +--- +layout: post +title: 'Step 2: The Application Engine & Auto-Scaling' +date: 2025-12-28 04:00:00 -0400 +categories: + - blog_app +highlight: true +--- + +[[2025-12-27-part-1]] + +# 2. The Application Engine & Auto-Scaling + +## 2.1 Decoupling Configuration (ConfigMaps) + +In Docker, if you need to update an Nginx `default.conf`, you typically `COPY` the file into the image and rebuild it. In Kubernetes, we use a **ConfigMap** to treat configuration as a separate object. By using a ConfigMap, we can update these rules and simply restart the pods to apply changes, no Docker build required. + +We use a **ConfigMap** to inject the Nginx configuration. + +**`apps/severed-blog-config.yaml`** + +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: severed-blog-config + namespace: severed-apps +data: + default.conf: | + # 1. Define the custom log format + log_format observability '$remote_addr - $remote_user [$time_local] "$request" ' + '$status $body_bytes_sent "$http_referer" ' + '"$http_user_agent" "$request_time"'; + + server { + listen 80; + server_name localhost; + root /usr/share/nginx/html; + index index.html index.htm; + + # 2. Apply the format to stdout + access_log /dev/stdout observability; + error_log /dev/stderr; + + # gzip compression + gzip on; + gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript; + gzip_vary on; + gzip_min_length 1000; + + # assets (images, fonts, favicons) - cache for 1 Year + location ~* \.(jpg|jpeg|gif|png|ico|svg|woff|woff2|ttf|eot)$ { + expires 365d; + add_header Cache-Control "public, no-transform"; + try_files $uri =404; + } + + # code (css, js) - cache for 1 month + location ~* \.(css|js)$ { + expires 30d; + add_header Cache-Control "public, no-transform"; + try_files $uri =404; + } + + # standard routing + location / { + try_files $uri $uri/ $uri.html =404; + } + + error_page 404 /404.html; + location = /404.html { + internal; + } + + # logging / lb config + real_ip_header X-Forwarded-For; + set_real_ip_from 10.0.0.0/8; + + # metrics endpoint for Alloy/Prometheus + location /metrics { + stub_status on; + access_log off; # Keep noise out of our main logs + allow 127.0.0.1; + allow 10.0.0.0/8; + allow 172.16.0.0/12; + deny all; + } + } +``` + +It is a better practice to keep `default.conf` as a standalone file in our repo (e.g., `apps/config/default.conf`) and inject it like: + +```shell +kubectl create configmap severed-blog-config \ + -n severed-apps \ + --from-file=default.conf=apps/config/default.conf \ + --dry-run=client -o yaml | kubectl apply -f - +``` + +## 2.2 Deploying the Workload: The Sidecar Pattern + +The **Deployment** ensures the desired state is maintained. We requested `replicas: 2`, meaning K8s will ensure two instances of the blog are running across our worker nodes. + +**The Sidecar:** We added a second container (`nginx-prometheus-exporter`) to the same Pod. + +1. **Web Container:** Serves the blog content. +2. **Exporter Container:** Scrapes the Web container's local `/metrics` endpoint and translates it into Prometheus format on port `9113`. + +**`apps/severed-blog.yaml`** + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: severed-blog + namespace: severed-apps +spec: + replicas: 2 + selector: + matchLabels: + app: severed-blog + template: + metadata: + labels: + app: severed-blog + spec: + containers: + - name: web + image: severed-blog:v0.3 + imagePullPolicy: Never + ports: + - containerPort: 80 + resources: + requests: + cpu: '50m' + memory: '64Mi' + limits: + cpu: '200m' + memory: '128Mi' + volumeMounts: + - name: nginx-config-vol + mountPath: /etc/nginx/conf.d/default.conf + subPath: default.conf + + - name: exporter + image: nginx/nginx-prometheus-exporter:latest + args: + - -nginx.scrape-uri=http://localhost:80/metrics + ports: + - containerPort: 9113 + name: metrics + resources: + requests: + cpu: '10m' + memory: '32Mi' + limits: + cpu: '50m' + memory: '64Mi' + + volumes: + - name: nginx-config-vol + configMap: + name: severed-blog-config +``` + +The `spec.volumes` block references our ConfigMap, and `volumeMounts` places that data exactly where Nginx expects its +configuration. + +### 2.2.1 Internal Networking (Services) + +Pods are ephemeral; they die and get new IP addresses. If we pointed our Ingress directly at a Pod IP, the site would break every time a pod restarted. + +We use a **Service** to solve this. A Service provides a stable Virtual IP (ClusterIP) and an internal DNS name (`severed-blog-service.severed-apps.svc.cluster.local`) that load balances traffic to any Pod matching the selector: `app: severed-blog`. + +**`apps/severed-blog-service.yaml`** + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: severed-blog-service + namespace: severed-apps +spec: + selector: + app: severed-blog + ports: + - protocol: TCP + port: 80 + targetPort: 80 + type: ClusterIP +``` + +## 2.3 Traffic Routing (Ingress) + +External users cannot talk to Pods directly. Traffic flows: **Internet → Ingress → Service → Pod**. + +1. **The Service:** Acts as an internal LoadBalancer with a stable DNS name. +2. **The Ingress:** Acts as a reverse proxy (Traefik) that reads the URL hostname. + +**`apps/severed-ingress.yaml`** + +```yaml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: severed-ingress + namespace: severed-apps + annotations: + traefik.ingress.kubernetes.io/router.entrypoints: web +spec: + rules: + # ONLY accept traffic for this specific hostname + - host: blog.localhost + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: severed-blog-service + port: + number: 80 +``` + +## 2.4 Auto-Scaling (HPA) + +We implemented a **Horizontal Pod Autoscaler (HPA)** that scales the blog based on three metrics: + +1. **CPU:** Target 90% of _Requests_ (not Limits). +2. **Memory:** Target 80% of _Requests_. +3. **Traffic (RPS):** Target 500 requests per second per pod. + +To prevent scaling up and down too fast, we added a **Stabilization Window** and a strict **Scale Up Limit** (max 1 pod every 15s). This prevents the cluster from exploding due to 1-second spikes. + +**`apps/severed-blog-hpa.yaml`** + +```yaml +apiVersion: autoscaling/v2 +kind: HorizontalPodAutoscaler +metadata: + name: severed-blog-hpa + namespace: severed-apps +spec: + scaleTargetRef: + apiVersion: apps/v1 + kind: Deployment + name: severed-blog + minReplicas: 2 # Never drop below 2 for HA + maxReplicas: 6 # Maximum number of pods to prevent cluster exhaustion + metrics: + - type: Resource + resource: + name: cpu + target: + type: Utilization + averageUtilization: 90 # Scale up if CPU Usage exceeds 90% + - type: Resource + resource: + name: memory + target: + type: Utilization + averageUtilization: 80 # Scale up if RAM Usage exceeds 80% + - type: Pods + pods: + metric: + name: nginx_http_requests_total + target: + type: AverageValue + averageValue: '500' # Scale up if requests per second > 500 per pod + behavior: + scaleDown: + stabilizationWindowSeconds: 60 # 60 sec before removing a pod + policies: + - type: Percent + value: 100 + periodSeconds: 15 + scaleUp: + stabilizationWindowSeconds: 60 + policies: + - type: Pods + value: 1 + periodSeconds: 60 +``` + +[[2025-12-27-part-3]] diff --git a/_posts/blog_app/2025-12-27-part-3.md b/_posts/blog_app/2025-12-27-part-3.md new file mode 100644 index 0000000..59564de --- /dev/null +++ b/_posts/blog_app/2025-12-27-part-3.md @@ -0,0 +1,663 @@ +--- +layout: post +title: 'Step 3: Observability (LGTM, KSM)' +date: 2025-12-28 05:00:00 -0400 +categories: + - blog_app +highlight: true +--- + +[[2025-12-27-part-2]] + +# 3. Observability: The LGTM Stack + +In a distributed cluster, logs and metrics are scattered across different pods and nodes. We centralized monitoring using the LGTM Stack (Loki, Grafana, Prometheus) plus **Kube State Metrics** and the **Prometheus Adapter** to centralize our logs and metrics. + +## 3.1 The Databases (StatefulSets) + +- **Prometheus:** Scrapes metrics. We updated the config to scrape **Kube State Metrics** via its internal DNS Service. +- **Loki:** Aggregates logs. Configured with a 168h (7-day) retention period. + +**`infra/observer/prometheus.yaml`** + +```yaml +# Configuration +apiVersion: v1 +kind: ConfigMap +metadata: + name: prometheus-config + namespace: monitoring +data: + prometheus.yml: | + global: + scrape_interval: 15s + evaluation_interval: 15s + storage: + tsdb: + out_of_order_time_window: 1m + + scrape_configs: + # 1. Scrape Prometheus itself (Health Check) + - job_name: 'prometheus' + static_configs: + - targets: ['localhost:9090'] + + # 2. Scrape Kube State Metrics (KSM) + # We use the internal DNS: service-name.namespace.svc.cluster.local:port + - job_name: 'kube-state-metrics' + static_configs: + - targets: ['kube-state-metrics.monitoring.svc.cluster.local:8080'] + +--- +# Service +apiVersion: v1 +kind: Service +metadata: + name: prometheus + namespace: monitoring +spec: + type: ClusterIP + selector: + app: prometheus + ports: + - port: 9090 + targetPort: 9090 + +--- +# The Database (StatefulSet) +apiVersion: apps/v1 +kind: StatefulSet +metadata: + name: prometheus + namespace: monitoring +spec: + serviceName: prometheus + replicas: 1 + selector: + matchLabels: + app: prometheus + template: + metadata: + labels: + app: prometheus + spec: + containers: + - name: prometheus + image: prom/prometheus:latest + args: + - '--config.file=/etc/prometheus/prometheus.yml' + - '--web.enable-remote-write-receiver' + - '--storage.tsdb.path=/prometheus' + - '--web.console.libraries=/usr/share/prometheus/console_libraries' + - '--web.console.templates=/usr/share/prometheus/consoles' + ports: + - containerPort: 9090 + volumeMounts: + - name: config + mountPath: /etc/prometheus + - name: data + mountPath: /prometheus + volumes: + - name: config + configMap: + name: prometheus-config + volumeClaimTemplates: + - metadata: + name: data + spec: + accessModes: ['ReadWriteOnce'] + storageClassName: 'openebs-hostpath' + resources: + requests: + storage: 5Gi +``` + +**`infra/observer/loki.yaml`** + +```yaml +# --- Configuration --- +apiVersion: v1 +kind: ConfigMap +metadata: + name: loki-config + namespace: monitoring +data: + local-config.yaml: | + auth_enabled: false + server: + http_listen_port: 3100 + common: + path_prefix: /loki + storage: + filesystem: + chunks_directory: /loki/chunks + rules_directory: /loki/rules + replication_factor: 1 + ring: + instance_addr: 127.0.0.1 + kvstore: + store: inmemory + schema_config: + configs: + - from: 2020-10-24 + store: tsdb + object_store: filesystem + schema: v13 + index: + prefix: index_ + period: 24h + +--- +# --- Storage Service (Headless) --- +# Required for StatefulSets to maintain stable DNS entries. +apiVersion: v1 +kind: Service +metadata: + name: loki + namespace: monitoring +spec: + type: ClusterIP + selector: + app: loki + ports: + - port: 3100 + targetPort: 3100 + name: http-metrics + +--- +# --- The Database (StatefulSet) --- +apiVersion: apps/v1 +kind: StatefulSet +metadata: + name: loki + namespace: monitoring +spec: + serviceName: loki + replicas: 1 + selector: + matchLabels: + app: loki + template: + metadata: + labels: + app: loki + spec: + containers: + - name: loki + image: grafana/loki:latest + args: + - -config.file=/etc/loki/local-config.yaml + ports: + - containerPort: 3100 + name: http-metrics + volumeMounts: + - name: config + mountPath: /etc/loki + - name: data + mountPath: /loki + volumes: + - name: config + configMap: + name: loki-config + # Persistent Storage + volumeClaimTemplates: + - metadata: + name: data + spec: + accessModes: ['ReadWriteOnce'] + storageClassName: 'openebs-hostpath' + resources: + requests: + storage: 5Gi +``` + +## 3.2 The Bridge: Prometheus Adapter & KSM + +Standard HPA only understands CPU and Memory. To scale on **Requests Per Second**, we needed two extra components. + +**Helm (Package Manager)** +You will notice `kube-state-metrics` and `prometheus-adapter` are missing from our file tree. That is because we install them using **Helm**. Helm allows us to install complex, pre-packaged applications ("Charts") without writing thousands of lines of YAML. We only provide a `values.yaml` file to override specific settings. + +1. **Kube State Metrics (KSM):** A service that listens to the Kubernetes API and generates metrics about the state of objects (e.g., `kube_pod_created`). +2. **Prometheus Adapter:** Installs via Helm. We use `infra/observer/adapter-values.yaml` to configure how it translates Prometheus queries into Kubernetes metrics. + +**`infra/observer/adapter-values.yaml`** + +```yaml +prometheus: + url: http://prometheus.monitoring.svc.cluster.local + port: 9090 + +rules: + custom: + - seriesQuery: 'nginx_http_requests_total{pod!="",namespace!=""}' + resources: + overrides: + namespace: { resource: 'namespace' } + pod: { resource: 'pod' } + name: + matches: '^(.*)_total' + as: 'nginx_http_requests_total' + metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[1m])' +``` + +## 3.3 The Agent: Grafana Alloy (DaemonSets) + +We need to collect logs from every node in the cluster. + +- **DaemonSet vs. Deployment:** A Deployment ensures _n_ replicas exist somewhere. A **DaemonSet** ensures exactly **one** Pod runs on **every** Node. This is perfect for infrastructure agents (logging, networking, monitoring). +- **Downward API:** We need to inject the Pod's own name and namespace into its environment variables so it knows "who it is." + +**`infra/alloy-env.yaml`** + +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: monitoring-env + namespace: monitoring +data: + LOKI_URL: 'http://loki.monitoring.svc:3100/loki/api/v1/push' + PROM_URL: 'http://prometheus.monitoring.svc:9090/api/v1/write' +``` + +**`infra/alloy-setup.yaml`** + +```yaml +# --- RBAC configuration --- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: alloy-sa + namespace: monitoring + +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: alloy-cluster-role +rules: + # 1. Standard API Access + - apiGroups: [''] + resources: ['nodes', 'nodes/proxy', 'services', 'endpoints', 'pods'] + verbs: ['get', 'list', 'watch'] + # 2. ALLOW METRICS ACCESS (Crucial for cAdvisor/Kubelet) + - apiGroups: [''] + resources: ['nodes/stats', 'nodes/metrics'] + verbs: ['get'] + # 3. Log Access + - apiGroups: [''] + resources: ['pods/log'] + verbs: ['get', 'list', 'watch'] + # 4. Non-Resource URLs (Sometimes needed for /metrics endpoints) + - nonResourceURLs: ['/metrics'] + verbs: ['get'] + +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: alloy-cluster-binding +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: alloy-cluster-role +subjects: + - kind: ServiceAccount + name: alloy-sa + namespace: monitoring + +--- +# --- Alloy pipeline configuration --- +apiVersion: v1 +kind: ConfigMap +metadata: + name: alloy-config + namespace: monitoring +data: + config.alloy: | + // 1. Discovery: Find all pods + discovery.kubernetes "k8s_pods" { + role = "pod" + } + + // 2. Relabeling: Filter and Label "severed-blog" pods + discovery.relabel "blog_pods" { + targets = discovery.kubernetes.k8s_pods.targets + + rule { + action = "keep" + source_labels = ["__meta_kubernetes_pod_label_app"] + regex = "severed-blog" + } + + // Explicitly set 'pod' and 'namespace' labels for the Adapter + rule { + action = "replace" + source_labels = ["__meta_kubernetes_pod_name"] + target_label = "pod" + } + + rule { + action = "replace" + source_labels = ["__meta_kubernetes_namespace"] + target_label = "namespace" + } + + // Route to the sidecar exporter port + rule { + action = "replace" + source_labels = ["__address__"] + target_label = "__address__" + regex = "([^:]+)(?::\\d+)?" + replacement = "$1:9113" + } + } + + // 3. Direct Nginx Scraper + prometheus.scrape "nginx_scraper" { + targets = discovery.relabel.blog_pods.output + forward_to = [prometheus.remote_write.metrics_service.receiver] + job_name = "integrations/nginx" + } + + // 4. Host Metrics (Unix Exporter) + prometheus.exporter.unix "host" { + rootfs_path = "/host/root" + sysfs_path = "/host/sys" + procfs_path = "/host/proc" + } + + prometheus.scrape "host_scraper" { + targets = prometheus.exporter.unix.host.targets + forward_to = [prometheus.remote_write.metrics_service.receiver] + } + + // 5. Remote Write: Send to Prometheus + prometheus.remote_write "metrics_service" { + endpoint { + url = sys.env("PROM_URL") + } + } + + // 6. Logs Pipeline: Send to Loki + loki.source.kubernetes "pod_logs" { + targets = discovery.relabel.blog_pods.output + forward_to = [loki.write.default.receiver] + } + + loki.write "default" { + endpoint { + url = sys.env("LOKI_URL") + } + } + + // 7. Kubelet Scraper (cAdvisor for Container Metrics) + discovery.kubernetes "k8s_nodes" { + role = "node" + } + + prometheus.scrape "kubelet_cadvisor" { + targets = discovery.kubernetes.k8s_nodes.targets + scheme = "https" + metrics_path = "/metrics/cadvisor" + job_name = "integrations/kubernetes/cadvisor" + + tls_config { + insecure_skip_verify = true + } + bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token" + + forward_to = [prometheus.remote_write.metrics_service.receiver] + } + +--- +# --- Agent Deployment (DaemonSet) --- +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: alloy + namespace: monitoring +spec: + selector: + matchLabels: + name: alloy + template: + metadata: + labels: + name: alloy + spec: + serviceAccountName: alloy-sa + hostNetwork: true + hostPID: true + dnsPolicy: ClusterFirstWithHostNet + containers: + - name: alloy + image: grafana/alloy:latest + args: + - run + - --server.http.listen-addr=0.0.0.0:12345 + - --storage.path=/var/lib/alloy/data + - /etc/alloy/config.alloy + envFrom: + - configMapRef: + name: monitoring-env + optional: false + volumeMounts: + - name: config + mountPath: /etc/alloy + - name: logs + mountPath: /var/log + - name: proc + mountPath: /host/proc + readOnly: true + - name: sys + mountPath: /host/sys + readOnly: true + - name: root + mountPath: /host/root + readOnly: true + volumes: + - name: config + configMap: + name: alloy-config + - name: logs + hostPath: + path: /var/log + - name: proc + hostPath: + path: /proc + - name: sys + hostPath: + path: /sys + - name: root + hostPath: + path: / +``` + +## 3.4 Visualization: Grafana + +We deployed Grafana with pre-loaded dashboards via ConfigMaps. + +**Key Dashboards Created:** + +1. **Cluster Health:** CPU/Memory saturation. +2. **HPA Live Status:** A custom table showing the _real_ scaling drivers (RPS, CPU Request %) vs the HPA's reaction. + +**`infra/observer/grafana.yaml`** + +```yaml +# 1. Datasources (Connection to Loki/Prom) +apiVersion: v1 +kind: ConfigMap +metadata: + name: grafana-datasources + namespace: monitoring +data: + datasources.yaml: | + apiVersion: 1 + datasources: + - name: Prometheus + type: prometheus + access: proxy + url: http://prometheus.monitoring.svc:9090 + isDefault: false + - name: Loki + type: loki + access: proxy + url: http://loki.monitoring.svc:3100 + isDefault: true + +--- +# 2. Dashboard Provider (Tells Grafana to load from /var/lib/grafana/dashboards) +apiVersion: v1 +kind: ConfigMap +metadata: + name: grafana-dashboard-provider + namespace: monitoring +data: + dashboard-provider.yaml: | + apiVersion: 1 + providers: + - name: 'Severed Dashboards' + orgId: 1 + folder: '' + type: file + disableDeletion: false + updateIntervalSeconds: 10 # Allow editing in UI, but it resets on restart + options: + path: /var/lib/grafana/dashboards + +--- +# 3. Service +apiVersion: v1 +kind: Service +metadata: + name: grafana-service + namespace: monitoring +spec: + type: LoadBalancer + selector: + app: grafana + ports: + - protocol: TCP + port: 3000 + targetPort: 3000 + +--- +# 4. Deployment (The App) +apiVersion: apps/v1 +kind: Deployment +metadata: + name: grafana + namespace: monitoring +spec: + replicas: 1 + selector: + matchLabels: + app: grafana + template: + metadata: + labels: + app: grafana + spec: + containers: + - name: grafana + image: grafana/grafana:latest + ports: + - containerPort: 3000 + + env: + - name: GF_SECURITY_ADMIN_USER + valueFrom: + secretKeyRef: + name: grafana-secrets + key: admin-user + - name: GF_SECURITY_ADMIN_PASSWORD + valueFrom: + secretKeyRef: + name: grafana-secrets + key: admin-password + + - name: GF_AUTH_ANONYMOUS_ENABLED + value: 'true' + - name: GF_AUTH_ANONYMOUS_ORG_ROLE + value: 'Viewer' + - name: GF_AUTH_ANONYMOUS_ORG_NAME + value: 'Main Org.' + + volumeMounts: + - name: grafana-datasources + mountPath: /etc/grafana/provisioning/datasources + - name: grafana-dashboard-provider + mountPath: /etc/grafana/provisioning/dashboards + - name: grafana-dashboards-json + mountPath: /var/lib/grafana/dashboards + - name: grafana-storage + mountPath: /var/lib/grafana + volumes: + - name: grafana-datasources + configMap: + name: grafana-datasources + - name: grafana-dashboard-provider + configMap: + name: grafana-dashboard-provider + - name: grafana-dashboards-json + configMap: + name: grafana-dashboards-json + - name: grafana-storage + emptyDir: {} +``` + +In the Deployment above, you see references to `grafana-secrets`. However, this file is **not** in our git repository. + +```yaml +- name: GF_SECURITY_ADMIN_PASSWORD + valueFrom: + secretKeyRef: + name: grafana-secrets # <--- where is this? + key: admin-password +``` + +We don't commit it to version control. In our `deploy-all.sh` script, we generate this secret imperatively using `kubectl create secret generic`. In a real production environment, we would use tools like **ExternalSecrets** or **SealedSecrets** to inject these safely. + +**`dashboard-json.yaml`** + +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: grafana-dashboards-json + namespace: monitoring +data: + severed-health.json: | + ... +``` + +Just like our blog, we need an Ingress to access Grafana. Notice we map a different hostname (`grafana.localhost`) to the Grafana service port (`3000`). + +**`infra/observer/grafana-ingress.yaml`** + +```yaml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: grafana-ingress + namespace: monitoring + annotations: + traefik.ingress.kubernetes.io/router.entrypoints: web +spec: + rules: + - host: grafana.localhost + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: grafana-service # ...send them to Grafana + port: + number: 3000 +``` + +[[2025-12-27-part-4]] diff --git a/_posts/blog_app/2025-12-27-part-4.md b/_posts/blog_app/2025-12-27-part-4.md new file mode 100644 index 0000000..0d52882 --- /dev/null +++ b/_posts/blog_app/2025-12-27-part-4.md @@ -0,0 +1,94 @@ +--- +layout: post +title: 'Step 4: RBAC & Security' +date: 2025-12-28 06:00:00 -0400 +categories: + - blog_app +highlight: true +--- + +[[2025-12-27-part-3]] + +# 4. Cluster Management & Security + +## 4.1 RBAC: Admin user + +In Kubernetes, a **ServiceAccount** is an identity for a process or a human to talk to the API. We created an `admin-user` but identities have no power by default. We must link them to a **ClusterRole** (a set of permissions) using a **ClusterRoleBinding**. + +- **ServiceAccount**: Creates the `admin-user` identity in the dashboard namespace. +- **ClusterRoleBinding**: Grants this specific user the `cluster-admin` role (Full access to the entire cluster). + +**`infra/dashboard/dashboard-admin.yaml`**: + +```yaml +apiVersion: v1 +kind: ServiceAccount +metadata: + name: admin-user + namespace: kubernetes-dashboard +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: admin-user +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: cluster-admin +subjects: + - kind: ServiceAccount + name: admin-user + namespace: kubernetes-dashboard +``` + +## 4.2 Authentication: Permanent Tokens + +Modern Kubernetes no longer generates tokens automatically for ServiceAccounts. To log into the UI, we need a static, long-lived credential. + +**`infra/dashboard/permanent-token.yaml`**: + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: admin-user-token + namespace: kubernetes-dashboard + annotations: + kubernetes.io/service-account.name: 'admin-user' +type: kubernetes.io/service-account-token +``` + +This creates a **Secret** of type `kubernetes.io/service-account-token`. By adding the annotation `kubernetes.io/service-account.name: "admin-user"`, K8s automatically populates the Secret with a signed JWT token that we can use to bypass the login screen. + +## 4.3 Localhost: Ingress & Cookies + +The Kubernetes Dashboard requires HTTPS, which creates issues with self-signed certificates on `localhost`. We need to reconfigure **Traefik** (the internal reverse proxy bundled with K3s) to allow insecure backends. + +**Helm & CRDs** +K3s installs Traefik using **Helm** (the Kubernetes Package Manager). Usually, you manage Helm via CLI (`helm install`). However, K3s includes a **Helm Controller** that lets us manage charts using YAML files called **HelmChartConfigs** (a Custom Resource Definition or CRD). + +This allows us to reconfigure a complex Helm deployment using a simple declarative file. + +**`infra/dashboard/traefik-config.yaml`** + +```yaml +apiVersion: helm.cattle.io/v1 +kind: HelmChartConfig +metadata: + name: traefik + namespace: kube-system +spec: + valuesContent: |- + additionalArguments: + # Tell Traefik to ignore SSL errors when talking to internal services + - "--serversTransport.insecureSkipVerify=true" +``` + +## 4.4. Stress Testing & Verification + +We used **Apache Bench (`ab`)** to generate massive concurrency capable of triggering the HPA. This test results in tens of thousands of requests which triggers the RPS rule in out HPA configuration. + +```bash +# Generate 50 concurrent users for 5 minutes +ab -k -c 50 -t 300 -H "Host: blog.localhost" http://127.0.0.1:8080/ +``` diff --git a/_posts/releases/2025-12-27-release-0.1.md b/_posts/releases/2025-12-27-release-0.1.md index 844a298..721edda 100644 --- a/_posts/releases/2025-12-27-release-0.1.md +++ b/_posts/releases/2025-12-27-release-0.1.md @@ -7,9 +7,7 @@ categories: highlight: true --- -This blog serves as the public documentation for **Severed**. While the main site provides the high-level vision, -this space is dedicated to the technical source-of-truth for the experiments, infrastructure-as-code, and proprietary -tooling that are used within the cluster. +This blog serves as the public documentation for **Severed**. This space is dedicated to the technical source-of-truth for the experiments, infrastructure-as-code, and proprietary tooling that are used. ### Ecosystem @@ -23,31 +21,24 @@ The following services are currently active within the `severed.ink` network: ### Core Infrastructure -The ecosystem is powered by a **Home Server Cluster** managed via a **Kubernetes (k3s)** distribution. This setup -prioritizes local sovereignty and GitOps principles. +The ecosystem is powered by a hybrid **Home Server Cluster** managed via a **Kubernetes (k3s)** distribution and AWS services. We prioritize local sovereignty and GitOps principles. -- **CI Pipeline:** Automated build and test suites are orchestrated by a private Jenkins server utilizing self-hosted - runners. +- **CI Pipeline:** Automated build and test suites are orchestrated by a private Jenkins server utilizing self-hosted runners. - **GitOps & Deployment:** Automated synchronization and state enforcement via **ArgoCD**. - **Data Layer:** Persistent storage managed by **PostgreSQL**. - **Telemetry:** Full-stack observability provided by **Prometheus** (metrics) and **Loki** (logs) via **Grafana**. -- **Security Layer:** Push/Pull GitOps operations require an active connection to a **WireGuard (VPN)** for remote - access. +- **Security Layer:** Push/Pull GitOps operations require an active connection to a **WireGuard (VPN)** for remote access. ### Roadmap -Engineering efforts are currently focused on the following milestones: +Efforts are currently focused on the following milestones: 1. **OSS Strategy:** Transitioning from a hybrid of AWS managed services toward a ~100% Open Source Software (OSS) stack. -2. **High Availability (HA):** Implementing a "Cloud RAID-1" failover mechanism. In the event of home cluster - instability, traffic automatically routes to a secondary cloud-instantiated Kubernetes cluster as a temporary - failover. -3. **Data Resilience:** Automating PostgreSQL backup strategies to ensure parity between the primary cluster and the - cloud-based failover. -4. **Storage Infrastructure:** Integrating a dedicated **TrueNAS** node to move from local SATA/NVMe reliance to a - centralized, redundant storage architecture. +2. **High Availability (HA):** Implementing a "Cloud RAID-1" failover mechanism. In the event of home cluster instability, traffic automatically routes to a secondary cloud-instantiated Kubernetes cluster as a temporary failover. +3. **Data Resilience:** Automating PostgreSQL backup strategies to ensure parity between the primary cluster and the cloud-based failover. +4. **Storage Infrastructure:** Integrating a dedicated **TrueNAS** node to move from local SATA/NVMe reliance to a centralized, redundant storage architecture. -### Terminal Redirect +### Redirect For the full technical portfolio and expertise highlights, visit the main site: