Skip to content

authelia Runbook

Metadata

Field Value
Service authelia
Criticality Tier 1
Owner Platform / Identity owner
Namespace auth
Clusters homelab, local, jls
Last validated 2026-06-29
Related service page ../services/authelia.md

Trigger Conditions

  • Protected apps return authentication errors or redirect loops.
  • The Authelia portal is unavailable.
  • SMTP or identity flows fail.
  • MariaDB or session state becomes unhealthy.

1. Health Checks

kubectl -n auth get pods,svc,pvc,ingressroute
kubectl -n auth logs deploy/authelia --tail=200
kubectl -n auth get statefulset,deploy
kubectl -n auth exec deploy/authelia -- authelia validate-config --config /config/configuration.yaml

2. Troubleshooting Workflows

Check app, DB, and ingress together:

kubectl -n auth describe deploy authelia
kubectl -n auth describe statefulset authelia-db
kubectl -n auth logs statefulset/authelia-db --tail=100

Look for:

  • bad DB credentials or missing secrets
  • broken Traefik middleware references
  • SMTP endpoint or certificate errors
  • missing TOTP enrollment after MFA policy changes
  • Duo API hostname, integration key, or secret key problems
  • stale sessions after Authelia secret rotation

3. Disaster Recovery

  1. Restore secrets and DB credentials.
  2. Recover MariaDB state or PVC snapshot.
  3. Reconcile the active overlay.
  4. Validate forward-auth against a protected application.

4. Scaling and Resource Management

kubectl -n auth top pod
kubectl -n auth describe deploy authelia

Increase app or DB resources in Git if logins or session checks become resource-bound.

Current Git resource profile:

  • Authelia: 100m CPU request, 128Mi memory request, 512Mi memory limit.
  • Authelia local overlay: 100m CPU request, 128Mi memory request, 1Gi memory limit.
  • MariaDB: 50m CPU request, 128Mi memory request, 512Mi memory limit.

5. Maintenance Procedures

  • Rotate auth and DB secrets.
  • Rotate Duo API material through the authelia Secret, then validate phone push.
  • Confirm TOTP enrollment for users before tightening protected service routes.
  • Test SMTP and reset flows after mail changes.
  • Recheck middleware names after Traefik refactors.

6. Rollback Strategy

  • Revert the active overlay to the previous working revision.
  • Restore the previous DB snapshot if schema or startup changes failed.
  • If users are locked out by MFA policy, temporarily roll back the access-control rule change rather than editing secrets in place.
  • Keep trusted-network bypass rules available for internal services that cannot complete a 2FA flow.

7. Post-Incident Actions

  1. Record manual recovery in a changelog fragment.
  2. Update the service page if auth flows or dependencies changed.
  3. Update this runbook with the exact failure signature and fix.