# Severed-Infra: Health & Diagnostics Guide ### 1. The Foundation: Node & Storage Stability Before troubleshooting apps, ensure the physical (Docker) layer is stable. * **Node Readiness:** All 3 nodes (1 server, 2 agents) must be `Ready`. ```bash kubectl get nodes ``` * **Storage Binding:** Verify that the OpenEBS Persistent Volume Claims (PVCs) for Loki and Prometheus are `Bound`. ```bash kubectl get pvc -n monitoring ``` [//]: # (todo add: kubectl get pods -n openebs) kubectl get pods -n severed-apps kubectl get pods -n monitoring kubectl get pods -n kubernetes-dashboard kubectl get pods -n openebs kubectl rollout restart deployment grafana -n monitoring --- ### 2. The Telemetry Bridge: Alloy & Exporter Check if Alloy is successfully translating raw Nginx text into Prometheus numbers. * **Error Scan:** Check Alloy logs specifically for `scrape_uri` or `connection refused` errors. ```bash kubectl logs -n monitoring -l name=alloy --tail=50 ``` [//]: # (kubectl apply -f infra/alloy-setup.yaml) [//]: # (kubectl delete pods -n monitoring -l name=alloy) [//]: # (kubectl get pods -n monitoring) [//]: # (kubectl describe pod alloy-dq2cd -n monitoring) [//]: # (kubectl logs -n monitoring -l name=alloy --tail=50) [//]: # (kubectl get pod -n monitoring -l app=grafana -o jsonpath='{.items[0].spec.containers[0].env}' | jq) [//]: # (kubectl apply -f apps/severed-blog-config.yaml) [//]: # (kubectl rollout restart deployment severed-blog -n severed-apps) [//]: # (kubectl logs -n severed-apps -l app=severed-blog -f) [//]: # (kubectl logs loki-0 -n monitoring --tail=20) * **Internal Handshake:** Use your `access-hub.sh` script and visit `localhost:12345`. * Find the `prometheus.exporter.nginx.blog` component. * Ensure the health status is **Green/Up**. --- ### 3. The Database: Prometheus Query Test If the exporter is working, the metrics will appear in the Prometheus time-series database. * **Live Traffic Check:** Verify that `nginx_http_requests_total` is returning a data vector (not an empty list `[]`). ```bash kubectl exec -it prometheus-0 -n monitoring -- \ wget -qO- "http://localhost:9090/api/v1/query?query=nginx_http_requests_total" ``` * **Metric Discovery:** List all Nginx-related metrics currently being stored. ```bash kubectl exec -it prometheus-0 -n monitoring -- \ wget -qO- "http://localhost:9090/api/v1/label/__name__/values" | grep nginx ``` --- ### 4. The "Brain": Horizontal Pod Autoscaler (HPA) The HPA is the final consumer of this data. If this is healthy, the cluster is auto-scaling correctly. * **Target Alignment:** The `TARGETS` column should show a real value (e.g., `0/10`) rather than ``. ```bash kubectl get hpa -n severed-apps ``` * **Adapter Check:** Ensure the Custom Metrics API is serving the translated Nginx metrics to the Kubernetes master. ```bash kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/severed-apps/pods/*/nginx_http_requests_total" ``` ### Cheat Sheet | Symptom | Probable Cause | Fix | |----------------------------|-----------------------------|-------------------------------------------| | `502 Bad Gateway` | Node resource exhaustion | Restart K3d or increase Docker RAM | | `strconv.ParseFloat` error | Missing Nginx Exporter | Use `prometheus.exporter.nginx` in Alloy | | HPA shows `` | Prometheus Adapter mismatch | Verify `adapter-values.yaml` metric names | | `No nodes found` | Corrupted cluster state | Run `k3d cluster delete` and recreate |