semaphore Runbook¶
Metadata¶
| Field | Value |
|---|---|
| Service | semaphore |
| Criticality | Tier 2 |
| Owner | Platform / Automation owner |
| Namespace | semaphore |
| Clusters | dev, homelab, oci |
| Last validated | 2026-05-20 |
| Related service page | ../services/semaphore.md |
Trigger Conditions¶
- Semaphore UI is unavailable.
- Job execution fails.
- MariaDB is degraded.
- Generated secrets or
.semaphore.envvalues drift.
1. Health Checks¶
kubectl -n semaphore get pods,svc,pvc,ingressroute
kubectl -n semaphore logs deploy/semaphore --tail=200
kubectl -n semaphore get statefulset
2. Troubleshooting Workflows¶
kubectl -n semaphore describe deploy semaphore
kubectl -n semaphore logs statefulset/semaphore-db --tail=100
kubectl -n semaphore get secret
Check encryption keys, DB connectivity, and volume mounts before restarting jobs.
3. Disaster Recovery¶
- Restore
.semaphore.envinputs and generated Secrets. - Restore MariaDB state and PVC-backed inventory data.
- Reconcile the active overlay.
- Validate web access and a small test job.
4. Scaling and Resource Management¶
Scale app or DB resources in Git if queued jobs or heavy inventories strain the deployment.
5. Maintenance Procedures¶
- Rotate application and DB secrets.
- Revalidate ingress after hostname changes.
- Keep inventory and playbook references aligned with repository moves.
6. Rollback Strategy¶
- Revert the active overlay to the last known-good revision.
- Restore the previous DB snapshot if startup or migrations fail.
7. Post-Incident Actions¶
- Add a changelog fragment for manual remediation.
- Update the service page if deployment or secret handling changed.
- Extend this runbook with the incident signature.