added parts

This commit is contained in:
wboughattas
2025-12-30 23:53:49 -05:00
parent 5a12dd0444
commit 786a98f4c5
12 changed files with 1248 additions and 68 deletions

View File

@@ -5,6 +5,7 @@ gem "minima", "~> 2.5"
group :jekyll_plugins do
gem "jekyll-feed", "~> 0.12"
gem 'jekyll-archives'
gem "jekyll-wikirefs"
end
# Windows and JRuby does not include zoneinfo files, so bundle the tzinfo-data gem

View File

@@ -83,6 +83,9 @@ GEM
jekyll (>= 3.8, < 5.0)
jekyll-watch (2.2.1)
listen (~> 3.0)
jekyll-wikirefs (0.0.16)
jekyll
nokogiri (~> 1.13.3)
json (2.15.1)
kramdown (2.5.1)
rexml (>= 3.3.9)
@@ -93,13 +96,18 @@ GEM
rb-fsevent (~> 0.10, >= 0.10.3)
rb-inotify (~> 0.9, >= 0.9.10)
mercenary (0.4.0)
mini_portile2 (2.8.9)
minima (2.5.2)
jekyll (>= 3.5, < 5.0)
jekyll-feed (~> 0.9)
jekyll-seo-tag (~> 2.1)
nokogiri (1.13.10)
mini_portile2 (~> 2.8.0)
racc (~> 1.4)
pathutil (0.16.2)
forwardable-extended (~> 2.6)
public_suffix (6.0.2)
racc (1.8.1)
rake (13.3.0)
rb-fsevent (0.11.2)
rb-inotify (0.11.1)
@@ -169,6 +177,7 @@ DEPENDENCIES
jekyll (~> 4.4.1)
jekyll-archives
jekyll-feed (~> 0.12)
jekyll-wikirefs
minima (~> 2.5)
tzinfo (>= 1, < 3)
tzinfo-data

View File

@@ -1,33 +0,0 @@
# Severed blog (Jekyll app)
**Live Production:** [https://blog.severed.ink/](https://blog.severed.ink/)
## Development
**Install Dependencies (Ruby & Node):**
```bash
bundle install && pnpm install
```
**Start Local Server:**
```bash
pnpm dev
```
**Build for Production:**
```bash
pnpm build
```
## Code Quality
**Format Code Manually:**
```bash
pnpm format
```
_(Formatting also runs automatically on commit via Husky)_

View File

@@ -17,6 +17,7 @@ minima:
plugins:
- jekyll-feed
- jekyll-archives
- jekyll-wikirefs
exclude:
- .sass-cache/

View File

@@ -1,17 +0,0 @@
---
layout: post
title: Architecture V1 (WIP)
date: 2025-12-27 02:00:00 -0400
categories:
- architectures
---
## Monitoring
```bash
.
└── alloy
├── config
│ └── config.alloy
└── docker-compose.yml
```

View File

@@ -0,0 +1,24 @@
---
layout: post
title: 'Kubernetes vs Docker'
date: 2025-12-27 02:00:00 -0400
categories:
- blog_app
---
# Kubernetes Concepts Cheat Sheet
| Object | Docker Equivalent | Kubernetes Purpose |
| ----------- | ------------------------------ | ----------------------------------------------------------------- |
| Node | The Host Machine | A physical or virtual server in the cluster. |
| Pod | A Container | The smallest deployable unit (can contain multiple containers). |
| Deployments | `docker-compose up` | Manages the lifecycle and scaling of Pods. |
| Services | Network Aliases | Provides a stable DNS name/IP for a group of Pods. |
| HPA | Auto-Scaling Group | Automatically scales replicas based on traffic/load. |
| Ingress | Nginx Proxy / Traefik | Manages external access to Services via HTTP/HTTPS. |
| ConfigMap | `docker run -v config:/etc...` | Decouples configuration files from the container image. |
| Secret | Environment Variables (Secure) | Stores sensitive data (passwords, tokens) encoded in Base64. |
| DaemonSet | `mode: global` (Swarm) | Ensures one copy of a Pod runs on every Node (logs/monitoring). |
| StatefulSet | N/A | Manages apps requiring stable identities and storage (Databases). |
[[2025-12-27-part-1]]

View File

@@ -0,0 +1,27 @@
---
layout: post
title: 'Deploying the Severed Blog'
date: 2025-12-28 02:00:00 -0400
categories:
- blog_app
highlight: true
---
# Introduction
We are taking a simple static website, the **Severed Blog**, and engineering a proper infrastructure around it.
Anyone can run `docker run nginx`. The real engineering challenge is building the **platform** that keeps that application alive, scalable, and observable.
In this project, we will build a local Kubernetes cluster that mimics a real cloud environment. We will not just deploy the app; we will implement:
- **High Availability:** Running multiple copies so the site never goes down.
- **Auto-Scaling:** Automatically detecting traffic spikes and launching new pods.
- **Observability:** Using the LGTM stack (Loki, Grafana, Prometheus) to visualize exactly what is happening inside the cluster.
The infra code can be found in [here](https://git.severed.ink/Severed/Severed-Infra).
The blog code can be found in [here](https://git.severed.ink/Severed/Severed-Blog).
Let's start by building the foundation.
[[2025-12-27-part-1]]

View File

@@ -0,0 +1,135 @@
---
layout: post
title: 'Step 1: K3d Cluster Architecture'
date: 2025-12-28 03:00:00 -0400
categories:
- blog_app
highlight: true
---
[[2025-12-27-intro]]
# 1. K3d Cluster Architecture
In a standard Docker setup, containers share the host's kernel and networking space directly. In Kubernetes, we introduce an abstraction layer: a **Cluster**. For this project, we use **K3d**, which packages **K3s** (a lightweight production-grade K8s distribution) into Docker containers.
```text
Severed-Infra % tree
.
├── README.md
├── apps
│ ├── severed-blog-config.yaml
│ ├── severed-blog-hpa.yaml
│ ├── severed-blog-service.yaml
│ ├── severed-blog.yaml
│ └── severed-ingress.yaml
├── infra
│ ├── alloy-env.yaml
│ ├── alloy-setup.yaml
│ ├── dashboard
│ │ ├── dashboard-admin.yaml
│ │ ├── permanent-token.yaml
│ │ └── traefik-config.yaml
│ ├── observer
│ │ ├── adapter-values.yaml
│ │ ├── dashboard-json.yaml
│ │ ├── grafana-ingress.yaml
│ │ ├── grafana.yaml
│ │ ├── loki.yaml
│ │ └── prometheus.yaml
│ └── storage
│ └── openebs-sc.yaml
├── namespaces.yaml
└── scripts
├── README.md
├── access-hub.sh
├── deploy-all.sh
├── setup-grafana-creds.sh
└── tests
├── generated-202-404-blog.sh
└── stress-blog.sh
```
## 1.1 Multi-Node Simulation
- **Server (Control Plane):** The master node. Runs the API server, scheduler, and etcd.
- **Agents (Workers):** The worker nodes where our application pods run.
### Setting up the environment
We map port `8080` to the internal Traefik LoadBalancer to access services via `*.localhost`.
```bash
k3d cluster create severed-cluster \
--agents 2 \
-p "8080:80@loadbalancer" \
-p "8443:443@loadbalancer"
```
## 1.2 Image Registry Lifecycle
Since our `severed-blog` image is local, we side-load it directly into the cluster's internal image store rather than pushing to Docker Hub.
```bash
docker build -t severed-blog:v0.3 .
k3d image import severed-blog:v0.3 -c severed-cluster
```
## 1.3 Namespaces & Storage
We partition the cluster into logical domains. We also install **OpenEBS** to provide dynamic storage provisioning (PersistentVolumes) for our databases.
**`namespaces.yaml`**
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: severed-apps
---
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
---
apiVersion: v1
kind: Namespace
metadata:
name: kubernetes-dashboard
---
apiVersion: v1
kind: Namespace
metadata:
name: openebs
```
**`infra/storage/openebs-sc.yaml`**
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: severed-storage
provisioner: openebs.io/local
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
```
---
## 1.4. Infrastructure Concepts Cheat Sheet
| Object | Docker Equivalent | Kubernetes Purpose |
| ----------- | ------------------------------ | ----------------------------------------------------------------- |
| Node | The Host Machine | A physical or virtual server in the cluster. |
| Pod | A Container | The smallest deployable unit (can contain multiple containers). |
| Deployments | `docker-compose up` | Manages the lifecycle and scaling of Pods. |
| Services | Network Aliases | Provides a stable DNS name/IP for a group of Pods. |
| HPA | Auto-Scaling Group | Automatically scales replicas based on traffic/load. |
| Ingress | Nginx Proxy / Traefik | Manages external access to Services via HTTP/HTTPS. |
| ConfigMap | `docker run -v config:/etc...` | Decouples configuration files from the container image. |
| Secret | Environment Variables (Secure) | Stores sensitive data (passwords, tokens) encoded in Base64. |
| DaemonSet | `mode: global` (Swarm) | Ensures one copy of a Pod runs on _every_ Node (logs/monitoring). |
| StatefulSet | N/A | Manages apps requiring stable identities and storage (Databases). |
[[2025-12-27-part-2]]

View File

@@ -0,0 +1,285 @@
---
layout: post
title: 'Step 2: The Application Engine & Auto-Scaling'
date: 2025-12-28 04:00:00 -0400
categories:
- blog_app
highlight: true
---
[[2025-12-27-part-1]]
# 2. The Application Engine & Auto-Scaling
## 2.1 Decoupling Configuration (ConfigMaps)
In Docker, if you need to update an Nginx `default.conf`, you typically `COPY` the file into the image and rebuild it. In Kubernetes, we use a **ConfigMap** to treat configuration as a separate object. By using a ConfigMap, we can update these rules and simply restart the pods to apply changes, no Docker build required.
We use a **ConfigMap** to inject the Nginx configuration.
**`apps/severed-blog-config.yaml`**
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: severed-blog-config
namespace: severed-apps
data:
default.conf: |
# 1. Define the custom log format
log_format observability '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$request_time"';
server {
listen 80;
server_name localhost;
root /usr/share/nginx/html;
index index.html index.htm;
# 2. Apply the format to stdout
access_log /dev/stdout observability;
error_log /dev/stderr;
# gzip compression
gzip on;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
gzip_vary on;
gzip_min_length 1000;
# assets (images, fonts, favicons) - cache for 1 Year
location ~* \.(jpg|jpeg|gif|png|ico|svg|woff|woff2|ttf|eot)$ {
expires 365d;
add_header Cache-Control "public, no-transform";
try_files $uri =404;
}
# code (css, js) - cache for 1 month
location ~* \.(css|js)$ {
expires 30d;
add_header Cache-Control "public, no-transform";
try_files $uri =404;
}
# standard routing
location / {
try_files $uri $uri/ $uri.html =404;
}
error_page 404 /404.html;
location = /404.html {
internal;
}
# logging / lb config
real_ip_header X-Forwarded-For;
set_real_ip_from 10.0.0.0/8;
# metrics endpoint for Alloy/Prometheus
location /metrics {
stub_status on;
access_log off; # Keep noise out of our main logs
allow 127.0.0.1;
allow 10.0.0.0/8;
allow 172.16.0.0/12;
deny all;
}
}
```
It is a better practice to keep `default.conf` as a standalone file in our repo (e.g., `apps/config/default.conf`) and inject it like:
```shell
kubectl create configmap severed-blog-config \
-n severed-apps \
--from-file=default.conf=apps/config/default.conf \
--dry-run=client -o yaml | kubectl apply -f -
```
## 2.2 Deploying the Workload: The Sidecar Pattern
The **Deployment** ensures the desired state is maintained. We requested `replicas: 2`, meaning K8s will ensure two instances of the blog are running across our worker nodes.
**The Sidecar:** We added a second container (`nginx-prometheus-exporter`) to the same Pod.
1. **Web Container:** Serves the blog content.
2. **Exporter Container:** Scrapes the Web container's local `/metrics` endpoint and translates it into Prometheus format on port `9113`.
**`apps/severed-blog.yaml`**
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: severed-blog
namespace: severed-apps
spec:
replicas: 2
selector:
matchLabels:
app: severed-blog
template:
metadata:
labels:
app: severed-blog
spec:
containers:
- name: web
image: severed-blog:v0.3
imagePullPolicy: Never
ports:
- containerPort: 80
resources:
requests:
cpu: '50m'
memory: '64Mi'
limits:
cpu: '200m'
memory: '128Mi'
volumeMounts:
- name: nginx-config-vol
mountPath: /etc/nginx/conf.d/default.conf
subPath: default.conf
- name: exporter
image: nginx/nginx-prometheus-exporter:latest
args:
- -nginx.scrape-uri=http://localhost:80/metrics
ports:
- containerPort: 9113
name: metrics
resources:
requests:
cpu: '10m'
memory: '32Mi'
limits:
cpu: '50m'
memory: '64Mi'
volumes:
- name: nginx-config-vol
configMap:
name: severed-blog-config
```
The `spec.volumes` block references our ConfigMap, and `volumeMounts` places that data exactly where Nginx expects its
configuration.
### 2.2.1 Internal Networking (Services)
Pods are ephemeral; they die and get new IP addresses. If we pointed our Ingress directly at a Pod IP, the site would break every time a pod restarted.
We use a **Service** to solve this. A Service provides a stable Virtual IP (ClusterIP) and an internal DNS name (`severed-blog-service.severed-apps.svc.cluster.local`) that load balances traffic to any Pod matching the selector: `app: severed-blog`.
**`apps/severed-blog-service.yaml`**
```yaml
apiVersion: v1
kind: Service
metadata:
name: severed-blog-service
namespace: severed-apps
spec:
selector:
app: severed-blog
ports:
- protocol: TCP
port: 80
targetPort: 80
type: ClusterIP
```
## 2.3 Traffic Routing (Ingress)
External users cannot talk to Pods directly. Traffic flows: **Internet → Ingress → Service → Pod**.
1. **The Service:** Acts as an internal LoadBalancer with a stable DNS name.
2. **The Ingress:** Acts as a reverse proxy (Traefik) that reads the URL hostname.
**`apps/severed-ingress.yaml`**
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: severed-ingress
namespace: severed-apps
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
rules:
# ONLY accept traffic for this specific hostname
- host: blog.localhost
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: severed-blog-service
port:
number: 80
```
## 2.4 Auto-Scaling (HPA)
We implemented a **Horizontal Pod Autoscaler (HPA)** that scales the blog based on three metrics:
1. **CPU:** Target 90% of _Requests_ (not Limits).
2. **Memory:** Target 80% of _Requests_.
3. **Traffic (RPS):** Target 500 requests per second per pod.
To prevent scaling up and down too fast, we added a **Stabilization Window** and a strict **Scale Up Limit** (max 1 pod every 15s). This prevents the cluster from exploding due to 1-second spikes.
**`apps/severed-blog-hpa.yaml`**
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: severed-blog-hpa
namespace: severed-apps
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: severed-blog
minReplicas: 2 # Never drop below 2 for HA
maxReplicas: 6 # Maximum number of pods to prevent cluster exhaustion
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 90 # Scale up if CPU Usage exceeds 90%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 # Scale up if RAM Usage exceeds 80%
- type: Pods
pods:
metric:
name: nginx_http_requests_total
target:
type: AverageValue
averageValue: '500' # Scale up if requests per second > 500 per pod
behavior:
scaleDown:
stabilizationWindowSeconds: 60 # 60 sec before removing a pod
policies:
- type: Percent
value: 100
periodSeconds: 15
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 1
periodSeconds: 60
```
[[2025-12-27-part-3]]

View File

@@ -0,0 +1,663 @@
---
layout: post
title: 'Step 3: Observability (LGTM, KSM)'
date: 2025-12-28 05:00:00 -0400
categories:
- blog_app
highlight: true
---
[[2025-12-27-part-2]]
# 3. Observability: The LGTM Stack
In a distributed cluster, logs and metrics are scattered across different pods and nodes. We centralized monitoring using the LGTM Stack (Loki, Grafana, Prometheus) plus **Kube State Metrics** and the **Prometheus Adapter** to centralize our logs and metrics.
## 3.1 The Databases (StatefulSets)
- **Prometheus:** Scrapes metrics. We updated the config to scrape **Kube State Metrics** via its internal DNS Service.
- **Loki:** Aggregates logs. Configured with a 168h (7-day) retention period.
**`infra/observer/prometheus.yaml`**
```yaml
# Configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
storage:
tsdb:
out_of_order_time_window: 1m
scrape_configs:
# 1. Scrape Prometheus itself (Health Check)
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# 2. Scrape Kube State Metrics (KSM)
# We use the internal DNS: service-name.namespace.svc.cluster.local:port
- job_name: 'kube-state-metrics'
static_configs:
- targets: ['kube-state-metrics.monitoring.svc.cluster.local:8080']
---
# Service
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitoring
spec:
type: ClusterIP
selector:
app: prometheus
ports:
- port: 9090
targetPort: 9090
---
# The Database (StatefulSet)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
namespace: monitoring
spec:
serviceName: prometheus
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus:latest
args:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--web.enable-remote-write-receiver'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
ports:
- containerPort: 9090
volumeMounts:
- name: config
mountPath: /etc/prometheus
- name: data
mountPath: /prometheus
volumes:
- name: config
configMap:
name: prometheus-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ['ReadWriteOnce']
storageClassName: 'openebs-hostpath'
resources:
requests:
storage: 5Gi
```
**`infra/observer/loki.yaml`**
```yaml
# --- Configuration ---
apiVersion: v1
kind: ConfigMap
metadata:
name: loki-config
namespace: monitoring
data:
local-config.yaml: |
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
---
# --- Storage Service (Headless) ---
# Required for StatefulSets to maintain stable DNS entries.
apiVersion: v1
kind: Service
metadata:
name: loki
namespace: monitoring
spec:
type: ClusterIP
selector:
app: loki
ports:
- port: 3100
targetPort: 3100
name: http-metrics
---
# --- The Database (StatefulSet) ---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: loki
namespace: monitoring
spec:
serviceName: loki
replicas: 1
selector:
matchLabels:
app: loki
template:
metadata:
labels:
app: loki
spec:
containers:
- name: loki
image: grafana/loki:latest
args:
- -config.file=/etc/loki/local-config.yaml
ports:
- containerPort: 3100
name: http-metrics
volumeMounts:
- name: config
mountPath: /etc/loki
- name: data
mountPath: /loki
volumes:
- name: config
configMap:
name: loki-config
# Persistent Storage
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ['ReadWriteOnce']
storageClassName: 'openebs-hostpath'
resources:
requests:
storage: 5Gi
```
## 3.2 The Bridge: Prometheus Adapter & KSM
Standard HPA only understands CPU and Memory. To scale on **Requests Per Second**, we needed two extra components.
**Helm (Package Manager)**
You will notice `kube-state-metrics` and `prometheus-adapter` are missing from our file tree. That is because we install them using **Helm**. Helm allows us to install complex, pre-packaged applications ("Charts") without writing thousands of lines of YAML. We only provide a `values.yaml` file to override specific settings.
1. **Kube State Metrics (KSM):** A service that listens to the Kubernetes API and generates metrics about the state of objects (e.g., `kube_pod_created`).
2. **Prometheus Adapter:** Installs via Helm. We use `infra/observer/adapter-values.yaml` to configure how it translates Prometheus queries into Kubernetes metrics.
**`infra/observer/adapter-values.yaml`**
```yaml
prometheus:
url: http://prometheus.monitoring.svc.cluster.local
port: 9090
rules:
custom:
- seriesQuery: 'nginx_http_requests_total{pod!="",namespace!=""}'
resources:
overrides:
namespace: { resource: 'namespace' }
pod: { resource: 'pod' }
name:
matches: '^(.*)_total'
as: 'nginx_http_requests_total'
metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[1m])'
```
## 3.3 The Agent: Grafana Alloy (DaemonSets)
We need to collect logs from every node in the cluster.
- **DaemonSet vs. Deployment:** A Deployment ensures _n_ replicas exist somewhere. A **DaemonSet** ensures exactly **one** Pod runs on **every** Node. This is perfect for infrastructure agents (logging, networking, monitoring).
- **Downward API:** We need to inject the Pod's own name and namespace into its environment variables so it knows "who it is."
**`infra/alloy-env.yaml`**
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: monitoring-env
namespace: monitoring
data:
LOKI_URL: 'http://loki.monitoring.svc:3100/loki/api/v1/push'
PROM_URL: 'http://prometheus.monitoring.svc:9090/api/v1/write'
```
**`infra/alloy-setup.yaml`**
```yaml
# --- RBAC configuration ---
apiVersion: v1
kind: ServiceAccount
metadata:
name: alloy-sa
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: alloy-cluster-role
rules:
# 1. Standard API Access
- apiGroups: ['']
resources: ['nodes', 'nodes/proxy', 'services', 'endpoints', 'pods']
verbs: ['get', 'list', 'watch']
# 2. ALLOW METRICS ACCESS (Crucial for cAdvisor/Kubelet)
- apiGroups: ['']
resources: ['nodes/stats', 'nodes/metrics']
verbs: ['get']
# 3. Log Access
- apiGroups: ['']
resources: ['pods/log']
verbs: ['get', 'list', 'watch']
# 4. Non-Resource URLs (Sometimes needed for /metrics endpoints)
- nonResourceURLs: ['/metrics']
verbs: ['get']
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: alloy-cluster-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: alloy-cluster-role
subjects:
- kind: ServiceAccount
name: alloy-sa
namespace: monitoring
---
# --- Alloy pipeline configuration ---
apiVersion: v1
kind: ConfigMap
metadata:
name: alloy-config
namespace: monitoring
data:
config.alloy: |
// 1. Discovery: Find all pods
discovery.kubernetes "k8s_pods" {
role = "pod"
}
// 2. Relabeling: Filter and Label "severed-blog" pods
discovery.relabel "blog_pods" {
targets = discovery.kubernetes.k8s_pods.targets
rule {
action = "keep"
source_labels = ["__meta_kubernetes_pod_label_app"]
regex = "severed-blog"
}
// Explicitly set 'pod' and 'namespace' labels for the Adapter
rule {
action = "replace"
source_labels = ["__meta_kubernetes_pod_name"]
target_label = "pod"
}
rule {
action = "replace"
source_labels = ["__meta_kubernetes_namespace"]
target_label = "namespace"
}
// Route to the sidecar exporter port
rule {
action = "replace"
source_labels = ["__address__"]
target_label = "__address__"
regex = "([^:]+)(?::\\d+)?"
replacement = "$1:9113"
}
}
// 3. Direct Nginx Scraper
prometheus.scrape "nginx_scraper" {
targets = discovery.relabel.blog_pods.output
forward_to = [prometheus.remote_write.metrics_service.receiver]
job_name = "integrations/nginx"
}
// 4. Host Metrics (Unix Exporter)
prometheus.exporter.unix "host" {
rootfs_path = "/host/root"
sysfs_path = "/host/sys"
procfs_path = "/host/proc"
}
prometheus.scrape "host_scraper" {
targets = prometheus.exporter.unix.host.targets
forward_to = [prometheus.remote_write.metrics_service.receiver]
}
// 5. Remote Write: Send to Prometheus
prometheus.remote_write "metrics_service" {
endpoint {
url = sys.env("PROM_URL")
}
}
// 6. Logs Pipeline: Send to Loki
loki.source.kubernetes "pod_logs" {
targets = discovery.relabel.blog_pods.output
forward_to = [loki.write.default.receiver]
}
loki.write "default" {
endpoint {
url = sys.env("LOKI_URL")
}
}
// 7. Kubelet Scraper (cAdvisor for Container Metrics)
discovery.kubernetes "k8s_nodes" {
role = "node"
}
prometheus.scrape "kubelet_cadvisor" {
targets = discovery.kubernetes.k8s_nodes.targets
scheme = "https"
metrics_path = "/metrics/cadvisor"
job_name = "integrations/kubernetes/cadvisor"
tls_config {
insecure_skip_verify = true
}
bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
forward_to = [prometheus.remote_write.metrics_service.receiver]
}
---
# --- Agent Deployment (DaemonSet) ---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: alloy
namespace: monitoring
spec:
selector:
matchLabels:
name: alloy
template:
metadata:
labels:
name: alloy
spec:
serviceAccountName: alloy-sa
hostNetwork: true
hostPID: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: alloy
image: grafana/alloy:latest
args:
- run
- --server.http.listen-addr=0.0.0.0:12345
- --storage.path=/var/lib/alloy/data
- /etc/alloy/config.alloy
envFrom:
- configMapRef:
name: monitoring-env
optional: false
volumeMounts:
- name: config
mountPath: /etc/alloy
- name: logs
mountPath: /var/log
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
- name: root
mountPath: /host/root
readOnly: true
volumes:
- name: config
configMap:
name: alloy-config
- name: logs
hostPath:
path: /var/log
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: root
hostPath:
path: /
```
## 3.4 Visualization: Grafana
We deployed Grafana with pre-loaded dashboards via ConfigMaps.
**Key Dashboards Created:**
1. **Cluster Health:** CPU/Memory saturation.
2. **HPA Live Status:** A custom table showing the _real_ scaling drivers (RPS, CPU Request %) vs the HPA's reaction.
**`infra/observer/grafana.yaml`**
```yaml
# 1. Datasources (Connection to Loki/Prom)
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
namespace: monitoring
data:
datasources.yaml: |
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus.monitoring.svc:9090
isDefault: false
- name: Loki
type: loki
access: proxy
url: http://loki.monitoring.svc:3100
isDefault: true
---
# 2. Dashboard Provider (Tells Grafana to load from /var/lib/grafana/dashboards)
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboard-provider
namespace: monitoring
data:
dashboard-provider.yaml: |
apiVersion: 1
providers:
- name: 'Severed Dashboards'
orgId: 1
folder: ''
type: file
disableDeletion: false
updateIntervalSeconds: 10 # Allow editing in UI, but it resets on restart
options:
path: /var/lib/grafana/dashboards
---
# 3. Service
apiVersion: v1
kind: Service
metadata:
name: grafana-service
namespace: monitoring
spec:
type: LoadBalancer
selector:
app: grafana
ports:
- protocol: TCP
port: 3000
targetPort: 3000
---
# 4. Deployment (The App)
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:latest
ports:
- containerPort: 3000
env:
- name: GF_SECURITY_ADMIN_USER
valueFrom:
secretKeyRef:
name: grafana-secrets
key: admin-user
- name: GF_SECURITY_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
name: grafana-secrets
key: admin-password
- name: GF_AUTH_ANONYMOUS_ENABLED
value: 'true'
- name: GF_AUTH_ANONYMOUS_ORG_ROLE
value: 'Viewer'
- name: GF_AUTH_ANONYMOUS_ORG_NAME
value: 'Main Org.'
volumeMounts:
- name: grafana-datasources
mountPath: /etc/grafana/provisioning/datasources
- name: grafana-dashboard-provider
mountPath: /etc/grafana/provisioning/dashboards
- name: grafana-dashboards-json
mountPath: /var/lib/grafana/dashboards
- name: grafana-storage
mountPath: /var/lib/grafana
volumes:
- name: grafana-datasources
configMap:
name: grafana-datasources
- name: grafana-dashboard-provider
configMap:
name: grafana-dashboard-provider
- name: grafana-dashboards-json
configMap:
name: grafana-dashboards-json
- name: grafana-storage
emptyDir: {}
```
In the Deployment above, you see references to `grafana-secrets`. However, this file is **not** in our git repository.
```yaml
- name: GF_SECURITY_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
name: grafana-secrets # <--- where is this?
key: admin-password
```
We don't commit it to version control. In our `deploy-all.sh` script, we generate this secret imperatively using `kubectl create secret generic`. In a real production environment, we would use tools like **ExternalSecrets** or **SealedSecrets** to inject these safely.
**`dashboard-json.yaml`**
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards-json
namespace: monitoring
data:
severed-health.json: |
...
```
Just like our blog, we need an Ingress to access Grafana. Notice we map a different hostname (`grafana.localhost`) to the Grafana service port (`3000`).
**`infra/observer/grafana-ingress.yaml`**
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana-ingress
namespace: monitoring
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
rules:
- host: grafana.localhost
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana-service # ...send them to Grafana
port:
number: 3000
```
[[2025-12-27-part-4]]

View File

@@ -0,0 +1,94 @@
---
layout: post
title: 'Step 4: RBAC & Security'
date: 2025-12-28 06:00:00 -0400
categories:
- blog_app
highlight: true
---
[[2025-12-27-part-3]]
# 4. Cluster Management & Security
## 4.1 RBAC: Admin user
In Kubernetes, a **ServiceAccount** is an identity for a process or a human to talk to the API. We created an `admin-user` but identities have no power by default. We must link them to a **ClusterRole** (a set of permissions) using a **ClusterRoleBinding**.
- **ServiceAccount**: Creates the `admin-user` identity in the dashboard namespace.
- **ClusterRoleBinding**: Grants this specific user the `cluster-admin` role (Full access to the entire cluster).
**`infra/dashboard/dashboard-admin.yaml`**:
```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard
```
## 4.2 Authentication: Permanent Tokens
Modern Kubernetes no longer generates tokens automatically for ServiceAccounts. To log into the UI, we need a static, long-lived credential.
**`infra/dashboard/permanent-token.yaml`**:
```yaml
apiVersion: v1
kind: Secret
metadata:
name: admin-user-token
namespace: kubernetes-dashboard
annotations:
kubernetes.io/service-account.name: 'admin-user'
type: kubernetes.io/service-account-token
```
This creates a **Secret** of type `kubernetes.io/service-account-token`. By adding the annotation `kubernetes.io/service-account.name: "admin-user"`, K8s automatically populates the Secret with a signed JWT token that we can use to bypass the login screen.
## 4.3 Localhost: Ingress & Cookies
The Kubernetes Dashboard requires HTTPS, which creates issues with self-signed certificates on `localhost`. We need to reconfigure **Traefik** (the internal reverse proxy bundled with K3s) to allow insecure backends.
**Helm & CRDs**
K3s installs Traefik using **Helm** (the Kubernetes Package Manager). Usually, you manage Helm via CLI (`helm install`). However, K3s includes a **Helm Controller** that lets us manage charts using YAML files called **HelmChartConfigs** (a Custom Resource Definition or CRD).
This allows us to reconfigure a complex Helm deployment using a simple declarative file.
**`infra/dashboard/traefik-config.yaml`**
```yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: traefik
namespace: kube-system
spec:
valuesContent: |-
additionalArguments:
# Tell Traefik to ignore SSL errors when talking to internal services
- "--serversTransport.insecureSkipVerify=true"
```
## 4.4. Stress Testing & Verification
We used **Apache Bench (`ab`)** to generate massive concurrency capable of triggering the HPA. This test results in tens of thousands of requests which triggers the RPS rule in out HPA configuration.
```bash
# Generate 50 concurrent users for 5 minutes
ab -k -c 50 -t 300 -H "Host: blog.localhost" http://127.0.0.1:8080/
```

View File

@@ -7,9 +7,7 @@ categories:
highlight: true
---
This blog serves as the public documentation for **Severed**. While the main site provides the high-level vision,
this space is dedicated to the technical source-of-truth for the experiments, infrastructure-as-code, and proprietary
tooling that are used within the cluster.
This blog serves as the public documentation for **Severed**. This space is dedicated to the technical source-of-truth for the experiments, infrastructure-as-code, and proprietary tooling that are used.
### Ecosystem
@@ -23,31 +21,24 @@ The following services are currently active within the `severed.ink` network:
### Core Infrastructure
The ecosystem is powered by a **Home Server Cluster** managed via a **Kubernetes (k3s)** distribution. This setup
prioritizes local sovereignty and GitOps principles.
The ecosystem is powered by a hybrid **Home Server Cluster** managed via a **Kubernetes (k3s)** distribution and AWS services. We prioritize local sovereignty and GitOps principles.
- **CI Pipeline:** Automated build and test suites are orchestrated by a private Jenkins server utilizing self-hosted
runners.
- **CI Pipeline:** Automated build and test suites are orchestrated by a private Jenkins server utilizing self-hosted runners.
- **GitOps & Deployment:** Automated synchronization and state enforcement via **ArgoCD**.
- **Data Layer:** Persistent storage managed by **PostgreSQL**.
- **Telemetry:** Full-stack observability provided by **Prometheus** (metrics) and **Loki** (logs) via **Grafana**.
- **Security Layer:** Push/Pull GitOps operations require an active connection to a **WireGuard (VPN)** for remote
access.
- **Security Layer:** Push/Pull GitOps operations require an active connection to a **WireGuard (VPN)** for remote access.
### Roadmap
Engineering efforts are currently focused on the following milestones:
Efforts are currently focused on the following milestones:
1. **OSS Strategy:** Transitioning from a hybrid of AWS managed services toward a ~100% Open Source Software (OSS) stack.
2. **High Availability (HA):** Implementing a "Cloud RAID-1" failover mechanism. In the event of home cluster
instability, traffic automatically routes to a secondary cloud-instantiated Kubernetes cluster as a temporary
failover.
3. **Data Resilience:** Automating PostgreSQL backup strategies to ensure parity between the primary cluster and the
cloud-based failover.
4. **Storage Infrastructure:** Integrating a dedicated **TrueNAS** node to move from local SATA/NVMe reliance to a
centralized, redundant storage architecture.
2. **High Availability (HA):** Implementing a "Cloud RAID-1" failover mechanism. In the event of home cluster instability, traffic automatically routes to a secondary cloud-instantiated Kubernetes cluster as a temporary failover.
3. **Data Resilience:** Automating PostgreSQL backup strategies to ensure parity between the primary cluster and the cloud-based failover.
4. **Storage Infrastructure:** Integrating a dedicated **TrueNAS** node to move from local SATA/NVMe reliance to a centralized, redundant storage architecture.
### Terminal Redirect
### Redirect
For the full technical portfolio and expertise highlights, visit the main site: