grafana-dashboard Runbook¶
Metadata¶
| Field | Value |
|---|---|
| Service | grafana-dashboard |
| Criticality | Tier 2 |
| Owner | Platform / Observability owner |
| Namespace | grafana |
| Clusters | homelab, local |
| Last validated | 2026-05-20 |
| Related service page | ../services/grafana-dashboard.md |
Trigger Conditions¶
- Expected dashboards are missing.
- git-sync fails to update dashboard content.
- Grafana provisioning logs show parse or auth errors.
1. Health Checks¶
2. Troubleshooting Workflows¶
Check the provisioning side first:
kubectl -n grafana describe configmap
kubectl -n grafana logs deploy/grafana --tail=400 | grep -i provision
Look for broken dashboard JSON, bad git credentials, or missing ConfigMaps.
3. Disaster Recovery¶
- Restore dashboard ConfigMaps from Git.
- Restore any git-sync credentials.
- Restart the Grafana Deployment if provisioning remains stale.
- Verify dashboard folders in the UI.
4. Scaling and Resource Management¶
Grafana dashboard provisioning usually needs config fixes rather than scaling, but raise resources if git-sync or dashboard rendering is OOM-killed.
5. Maintenance Procedures¶
- Rotate git-sync credentials.
- Validate dashboard JSON before merging large updates.
- Recheck folder and provisioning names after directory reorganizations.
6. Rollback Strategy¶
- Revert the dashboard overlay or provisioning ConfigMap to the previous revision.
- Restart Grafana if provisioning caches stale content.
7. Post-Incident Actions¶
- Add a changelog fragment if manual remediation was required.
- Update the service page if provisioning behavior changed.
- Extend this runbook when a new failure mode appears.