Skip to content

infisical Runbook

Metadata

Field Value
Service infisical
Criticality Tier 1
Owner Platform / Security owner
Namespace infisical
Clusters local
Last validated 2026-06-30
Related service page ../services/infisical.md

Trigger Conditions

  • Infisical UI or API at https://infisical.mutana.fr fails.
  • CLI or future GitOps clients cannot authenticate to Infisical.
  • Infisical, Postgres, or Redis pods are not Ready.
  • PVC-backed state or the infisical-secrets Secret is unavailable.

1. Health Checks

kubectl -n infisical get deploy,statefulset,pod,svc,pvc,ingressroute
kubectl -n infisical get events --sort-by='.lastTimestamp' | tail -n 20
kubectl -n infisical logs deploy/infisical --tail=200
kubectl -n infisical logs statefulset/infisical-postgres --tail=100
kubectl -n infisical logs statefulset/infisical-redis --tail=100

Verify the application health endpoint from inside the cluster:

kubectl -n infisical exec deploy/infisical -- \
  wget -qO- http://localhost:8080/api/status && echo

2. Troubleshooting Workflows

Check the Secret and dependency readiness before changing manifests.

kubectl -n infisical get secret infisical-secrets
kubectl -n infisical describe deploy infisical
kubectl -n infisical describe statefulset infisical-postgres
kubectl -n infisical describe statefulset infisical-redis
kubectl -n infisical describe pod -l app.kubernetes.io/part-of=infisical

Common causes:

  • Missing Secret: create infisical-secrets with the documented required keys.
  • KMS migration fails with ERR_CRYPTO_INVALID_KEYLEN / Invalid key length: ENCRYPTION_KEY is not a 256-bit base64-encoded key. Infisical ≥ v0.161.10 requires a 32-byte base64 key for KMS AES-256-GCM. Regenerate with openssl rand -base64 32, update the Secret, and restart the Infisical pod.
  • Infisical not Ready: inspect DB_CONNECTION_URI, REDIS_URL, and dependency pod readiness; do not print secret values.
  • Postgres not Ready: check PVC binding, PGDATA, and pg_isready probe failures.
  • Redis not Ready: check PVC binding and redis-cli ping probe failures.
  • Ingress failure: confirm the IngressRoute hostname, Traefik route, TLS, and service endpoints.

3. Disaster Recovery

  1. Confirm whether the incident is app-only, dependency-only, storage-related, or cluster-wide.
  2. Restore the Postgres PVC from the selected snapshot.
  3. Restore the Redis PVC if the incident affected Redis persistence.
  4. Recreate infisical-secrets from the secure out-of-band source.
  5. Reconcile infisical/overlays/local.
  6. Validate pod readiness, /api/status, UI login, and a known secret read/write workflow.

The restore drill for this service still needs to be validated. Treat the first real restore as a high-attention operation and capture the result in this runbook.

4. Scaling and Resource Management

kubectl -n infisical top pod
kubectl -n infisical top pod --containers
kubectl -n infisical describe deploy infisical

The standalone deployment intentionally runs one replica for a low footprint. Scale and resource changes should be made in Git after checking Postgres and Redis capacity.

5. Maintenance Procedures

  • Rotate AUTH_SECRET, ENCRYPTION_KEY, ROOT_ENCRYPTION_KEY, database credentials, or Redis credentials only with an explicit rotation plan and tested rollback path.
  • Snapshot Postgres before Infisical version upgrades.
  • Re-check upstream release notes before changing the pinned Infisical image tag.
  • Apply future secret-sync integration as a separate change; do not add the Infisical operator or CRDs without a dedicated plan.

6. Rollback Strategy

  • Revert the Git changes that add the infisical/ workload and remove the Fleet path.
  • If the workload already ran, preserve PVC snapshots and infisical-secrets before deleting resources.
  • A downgrade after application data changes may require restoring a pre-change Postgres snapshot rather than rolling the image back in place.

7. Post-Incident Actions

  1. Add a changelog fragment for any recovery or security-relevant intervention.
  2. Update the service page if dependencies, exposure, or secret handling changed.
  3. Update this runbook with commands or checks learned during the incident.
  4. Record backup and restore validation results in context/progress-tracker.md.