Skip to content

grafana-dashboard Runbook

Metadata

Field Value
Service grafana-dashboard
Criticality Tier 2
Owner Platform / Observability owner
Namespace grafana
Clusters homelab, local
Last validated 2026-05-20
Related service page ../services/grafana-dashboard.md

Trigger Conditions

  • Expected dashboards are missing.
  • git-sync fails to update dashboard content.
  • Grafana provisioning logs show parse or auth errors.

1. Health Checks

kubectl -n grafana get pods,configmap,secret
kubectl -n grafana logs deploy/grafana --tail=200

2. Troubleshooting Workflows

Check the provisioning side first:

kubectl -n grafana describe configmap
kubectl -n grafana logs deploy/grafana --tail=400 | grep -i provision

Look for broken dashboard JSON, bad git credentials, or missing ConfigMaps.

3. Disaster Recovery

  1. Restore dashboard ConfigMaps from Git.
  2. Restore any git-sync credentials.
  3. Restart the Grafana Deployment if provisioning remains stale.
  4. Verify dashboard folders in the UI.

4. Scaling and Resource Management

kubectl -n grafana top pod

Grafana dashboard provisioning usually needs config fixes rather than scaling, but raise resources if git-sync or dashboard rendering is OOM-killed.

5. Maintenance Procedures

  • Rotate git-sync credentials.
  • Validate dashboard JSON before merging large updates.
  • Recheck folder and provisioning names after directory reorganizations.

6. Rollback Strategy

  • Revert the dashboard overlay or provisioning ConfigMap to the previous revision.
  • Restart Grafana if provisioning caches stale content.

7. Post-Incident Actions

  1. Add a changelog fragment if manual remediation was required.
  2. Update the service page if provisioning behavior changed.
  3. Extend this runbook when a new failure mode appears.