Skip to content

semaphore Runbook

Metadata

Field Value
Service semaphore
Criticality Tier 2
Owner Platform / Automation owner
Namespace semaphore
Clusters dev, homelab, oci
Last validated 2026-05-20
Related service page ../services/semaphore.md

Trigger Conditions

  • Semaphore UI is unavailable.
  • Job execution fails.
  • MariaDB is degraded.
  • Generated secrets or .semaphore.env values drift.

1. Health Checks

kubectl -n semaphore get pods,svc,pvc,ingressroute
kubectl -n semaphore logs deploy/semaphore --tail=200
kubectl -n semaphore get statefulset

2. Troubleshooting Workflows

kubectl -n semaphore describe deploy semaphore
kubectl -n semaphore logs statefulset/semaphore-db --tail=100
kubectl -n semaphore get secret

Check encryption keys, DB connectivity, and volume mounts before restarting jobs.

3. Disaster Recovery

  1. Restore .semaphore.env inputs and generated Secrets.
  2. Restore MariaDB state and PVC-backed inventory data.
  3. Reconcile the active overlay.
  4. Validate web access and a small test job.

4. Scaling and Resource Management

kubectl -n semaphore top pod

Scale app or DB resources in Git if queued jobs or heavy inventories strain the deployment.

5. Maintenance Procedures

  • Rotate application and DB secrets.
  • Revalidate ingress after hostname changes.
  • Keep inventory and playbook references aligned with repository moves.

6. Rollback Strategy

  • Revert the active overlay to the last known-good revision.
  • Restore the previous DB snapshot if startup or migrations fail.

7. Post-Incident Actions

  1. Add a changelog fragment for manual remediation.
  2. Update the service page if deployment or secret handling changed.
  3. Extend this runbook with the incident signature.