3.6 KiB
3.6 KiB
Severed-Infra: Health & Diagnostics Guide
1. The Foundation: Node & Storage Stability
Before troubleshooting apps, ensure the physical (Docker) layer is stable.
- Node Readiness: All 3 nodes (1 server, 2 agents) must be
Ready.
kubectl get nodes
- Storage Binding: Verify that the OpenEBS Persistent Volume Claims (PVCs) for Loki and Prometheus are
Bound.
kubectl get pvc -n monitoring
kubectl get pods -n severed-apps kubectl get pods -n monitoring kubectl get pods -n kubernetes-dashboard kubectl get pods -n openebs
kubectl rollout restart deployment grafana -n monitoring
2. The Telemetry Bridge: Alloy & Exporter
Check if Alloy is successfully translating raw Nginx text into Prometheus numbers.
- Error Scan: Check Alloy logs specifically for
scrape_uriorconnection refusederrors.
kubectl logs -n monitoring -l name=alloy --tail=50
- Internal Handshake: Use your
access-hub.shscript and visitlocalhost:12345. - Find the
prometheus.exporter.nginx.blogcomponent. - Ensure the health status is Green/Up.
3. The Database: Prometheus Query Test
If the exporter is working, the metrics will appear in the Prometheus time-series database.
- Live Traffic Check: Verify that
nginx_http_requests_totalis returning a data vector (not an empty list[]).
kubectl exec -it prometheus-0 -n monitoring -- \
wget -qO- "http://localhost:9090/api/v1/query?query=nginx_http_requests_total"
- Metric Discovery: List all Nginx-related metrics currently being stored.
kubectl exec -it prometheus-0 -n monitoring -- \
wget -qO- "http://localhost:9090/api/v1/label/__name__/values" | grep nginx
4. The "Brain": Horizontal Pod Autoscaler (HPA)
The HPA is the final consumer of this data. If this is healthy, the cluster is auto-scaling correctly.
- Target Alignment: The
TARGETScolumn should show a real value (e.g.,0/10) rather than<unknown>.
kubectl get hpa -n severed-apps
- Adapter Check: Ensure the Custom Metrics API is serving the translated Nginx metrics to the Kubernetes master.
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/severed-apps/pods/*/nginx_http_requests_total"
Cheat Sheet
| Symptom | Probable Cause | Fix |
|---|---|---|
502 Bad Gateway |
Node resource exhaustion | Restart K3d or increase Docker RAM |
strconv.ParseFloat error |
Missing Nginx Exporter | Use prometheus.exporter.nginx in Alloy |
HPA shows <unknown> |
Prometheus Adapter mismatch | Verify adapter-values.yaml metric names |
No nodes found |
Corrupted cluster state | Run k3d cluster delete and recreate |