pulp3 Runbook¶
Metadata¶
| Field | Value |
|---|---|
| Service | pulp3 |
| Criticality | Tier 1 |
| Owner | Platform / Content supply owner |
| Namespace | pulp3 |
| Clusters | homelab, ozirepo |
| Last validated | 2026-05-20 |
| Related service page | ../services/pulp3.md |
Trigger Conditions¶
- Pulp API or content routes fail.
- Mirror sync or publication jobs fail.
- Operator reconciliation is unhealthy.
- Signing, storage, or DB-backed state becomes invalid.
1. Health Checks¶
kubectl -n pulp3 get pods,svc,pvc,ingressroute
kubectl -n pulp3 get pulp
kubectl -n pulp3 logs deploy/pulp-operator-controller-manager --tail=200
2. Troubleshooting Workflows¶
Inspect the operator and the Pulp custom resource first:
kubectl -n pulp3 describe pulp
kubectl -n pulp3 get events --sort-by=.lastTimestamp | tail -20
kubectl -n pulp3 get secret
Check signing passphrases, admin credentials, and storage readiness before changing manifests.
3. Disaster Recovery¶
- Restore signing and admin secrets.
- Restore database and content storage if needed.
- Reapply
pulp3/operatorand the active overlay. - Wait for the Pulp CR to become healthy.
- Validate API, content, and a test sync path.
4. Scaling and Resource Management¶
Scale worker or API resources in Git if sync or publication jobs queue up.
5. Maintenance Procedures¶
- Rotate admin and signing credentials.
- Revalidate exposed URLs after ingress changes.
- Schedule sync-heavy changes to avoid resource starvation.
6. Rollback Strategy¶
- Revert the operator or overlay revision.
- Restore the previous DB and content snapshot if reconciliation broke runtime state.
7. Post-Incident Actions¶
- Add a changelog fragment describing the remediation.
- Update the service page if overlays or operational assumptions changed.
- Extend this runbook when a new operator failure mode is found.