Files

Severed-Infra: Health & Diagnostics Guide

1. The Foundation: Node & Storage Stability

Before troubleshooting apps, ensure the physical (Docker) layer is stable.

  • Node Readiness: All 3 nodes (1 server, 2 agents) must be Ready.
kubectl get nodes
  • Storage Binding: Verify that the OpenEBS Persistent Volume Claims (PVCs) for Loki and Prometheus are Bound.
kubectl get pvc -n monitoring

kubectl get pods -n severed-apps kubectl get pods -n monitoring kubectl get pods -n kubernetes-dashboard kubectl get pods -n openebs

kubectl rollout restart deployment grafana -n monitoring


2. The Telemetry Bridge: Alloy & Exporter

Check if Alloy is successfully translating raw Nginx text into Prometheus numbers.

  • Error Scan: Check Alloy logs specifically for scrape_uri or connection refused errors.
kubectl logs -n monitoring -l name=alloy --tail=50
  • Internal Handshake: Use your access-hub.sh script and visit localhost:12345.
  • Find the prometheus.exporter.nginx.blog component.
  • Ensure the health status is Green/Up.

3. The Database: Prometheus Query Test

If the exporter is working, the metrics will appear in the Prometheus time-series database.

  • Live Traffic Check: Verify that nginx_http_requests_total is returning a data vector (not an empty list []).
kubectl exec -it prometheus-0 -n monitoring -- \
  wget -qO- "http://localhost:9090/api/v1/query?query=nginx_http_requests_total"

  • Metric Discovery: List all Nginx-related metrics currently being stored.
kubectl exec -it prometheus-0 -n monitoring -- \
  wget -qO- "http://localhost:9090/api/v1/label/__name__/values" | grep nginx


4. The "Brain": Horizontal Pod Autoscaler (HPA)

The HPA is the final consumer of this data. If this is healthy, the cluster is auto-scaling correctly.

  • Target Alignment: The TARGETS column should show a real value (e.g., 0/10) rather than <unknown>.
kubectl get hpa -n severed-apps

  • Adapter Check: Ensure the Custom Metrics API is serving the translated Nginx metrics to the Kubernetes master.
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/severed-apps/pods/*/nginx_http_requests_total"

Cheat Sheet

Symptom Probable Cause Fix
502 Bad Gateway Node resource exhaustion Restart K3d or increase Docker RAM
strconv.ParseFloat error Missing Nginx Exporter Use prometheus.exporter.nginx in Alloy
HPA shows <unknown> Prometheus Adapter mismatch Verify adapter-values.yaml metric names
No nodes found Corrupted cluster state Run k3d cluster delete and recreate