argocd Runbook¶
Metadata¶
| Field | Value |
|---|---|
| Service | argocd |
| Criticality | Tier 1 |
| Owner | Platform / GitOps owner |
| Namespace | argocd |
| Clusters | homelab, oci |
| Last validated | 2026-05-20 |
| Related service page | ../services/argocd.md |
Trigger Conditions¶
- Argo CD UI or API is unavailable.
- Applications remain OutOfSync or degraded after manifest changes.
- Repository access fails.
- Legacy workloads still managed by Argo CD stop reconciling.
1. Health Checks¶
kubectl -n argocd get pods,svc,ingressroute
kubectl -n argocd get applications,applicationsets
kubectl -n argocd logs deploy/argocd-application-controller --tail=200
2. Troubleshooting Workflows¶
Check repository access and app generation first:
kubectl -n argocd describe application <name>
kubectl -n argocd describe applicationset <name>
kubectl -n argocd logs deploy/argocd-repo-server --tail=200
Focus on:
- repository credentials under
argocd/shared - invalid manifest paths after layout migrations
- broken ingress or TLS settings for the UI
3. Disaster Recovery¶
- Restore repository and cluster secrets in the
argocdnamespace. - Reapply
argocd/base,argocd/shared, and the active overlay. - Wait for controller and repo-server health.
- Reconcile affected Applications and confirm sync status.
4. Scaling and Resource Management¶
Increase controller or repo-server resources through Git if app generation or diffing is CPU-bound.
5. Maintenance Procedures¶
- Rotate repository credentials.
- Review ApplicationSet paths after repository migrations.
- Revalidate UI ingress after Traefik changes.
6. Rollback Strategy¶
- Reapply the previous Argo CD overlay revision.
- Restore the previous repository secret if auth changed.
- Revert ApplicationSet path changes that invalidated manifest generation.
7. Post-Incident Actions¶
- Add a changelog fragment for manual interventions.
- Update the argocd service page if overlays or access paths changed.
- Update this runbook when a new failure mode is discovered.