Severed-Infra: Health & Diagnostics Guide

Before troubleshooting apps, ensure the physical (Docker) layer is stable.

kubectl get nodes

Storage Binding: Verify that the OpenEBS Persistent Volume Claims (PVCs) for Loki and Prometheus are Bound.

kubectl get pvc -n monitoring

kubectl get pods -n severed-apps kubectl get pods -n monitoring kubectl get pods -n kubernetes-dashboard kubectl get pods -n openebs

kubectl rollout restart deployment grafana -n monitoring

Check if Alloy is successfully translating raw Nginx text into Prometheus numbers.

Error Scan: Check Alloy logs specifically for scrape_uri or connection refused errors.

kubectl logs -n monitoring -l name=alloy --tail=50

Internal Handshake: Use your access-hub.sh script and visit localhost:12345.
Find the prometheus.exporter.nginx.blog component.
Ensure the health status is Green/Up.

If the exporter is working, the metrics will appear in the Prometheus time-series database.

Live Traffic Check: Verify that nginx_http_requests_total is returning a data vector (not an empty list []).

kubectl exec -it prometheus-0 -n monitoring -- \
  wget -qO- "http://localhost:9090/api/v1/query?query=nginx_http_requests_total"

kubectl exec -it prometheus-0 -n monitoring -- \
  wget -qO- "http://localhost:9090/api/v1/label/__name__/values" | grep nginx

The HPA is the final consumer of this data. If this is healthy, the cluster is auto-scaling correctly.

Target Alignment: The TARGETS column should show a real value (e.g., 0/10) rather than <unknown>.

kubectl get hpa -n severed-apps

Adapter Check: Ensure the Custom Metrics API is serving the translated Nginx metrics to the Kubernetes master.

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/severed-apps/pods/*/nginx_http_requests_total"

Symptom	Probable Cause	Fix
`502 Bad Gateway`	Node resource exhaustion	Restart K3d or increase Docker RAM
`strconv.ParseFloat` error	Missing Nginx Exporter	Use `prometheus.exporter.nginx` in Alloy
HPA shows `<unknown>`	Prometheus Adapter mismatch	Verify `adapter-values.yaml` metric names
`No nodes found`	Corrupted cluster state	Run `k3d cluster delete` and recreate