Skip to content

argocd Runbook

Metadata

Field Value
Service argocd
Criticality Tier 1
Owner Platform / GitOps owner
Namespace argocd
Clusters homelab, oci
Last validated 2026-05-20
Related service page ../services/argocd.md

Trigger Conditions

  • Argo CD UI or API is unavailable.
  • Applications remain OutOfSync or degraded after manifest changes.
  • Repository access fails.
  • Legacy workloads still managed by Argo CD stop reconciling.

1. Health Checks

kubectl -n argocd get pods,svc,ingressroute
kubectl -n argocd get applications,applicationsets
kubectl -n argocd logs deploy/argocd-application-controller --tail=200

2. Troubleshooting Workflows

Check repository access and app generation first:

kubectl -n argocd describe application <name>
kubectl -n argocd describe applicationset <name>
kubectl -n argocd logs deploy/argocd-repo-server --tail=200

Focus on:

  • repository credentials under argocd/shared
  • invalid manifest paths after layout migrations
  • broken ingress or TLS settings for the UI

3. Disaster Recovery

  1. Restore repository and cluster secrets in the argocd namespace.
  2. Reapply argocd/base, argocd/shared, and the active overlay.
  3. Wait for controller and repo-server health.
  4. Reconcile affected Applications and confirm sync status.

4. Scaling and Resource Management

kubectl -n argocd top pod
kubectl -n argocd describe deploy argocd-application-controller

Increase controller or repo-server resources through Git if app generation or diffing is CPU-bound.

5. Maintenance Procedures

  • Rotate repository credentials.
  • Review ApplicationSet paths after repository migrations.
  • Revalidate UI ingress after Traefik changes.

6. Rollback Strategy

  • Reapply the previous Argo CD overlay revision.
  • Restore the previous repository secret if auth changed.
  • Revert ApplicationSet path changes that invalidated manifest generation.

7. Post-Incident Actions

  1. Add a changelog fragment for manual interventions.
  2. Update the argocd service page if overlays or access paths changed.
  3. Update this runbook when a new failure mode is discovered.