Skip to content

renovate Runbook

Metadata

Field Value
Service renovate
Criticality Tier 2
Owner Platform owner
Namespace renovate
Clusters local
Last validated 2026-06-30
Related service page ../services/renovate.md

Trigger Conditions

  • No dependency-update PRs have appeared on the dev branch for longer than expected.
  • The Renovate Dependency Dashboard issue in GitHub has stopped updating.
  • Recent Jobs under the renovate CronJob are failing.
  • A token rotation, expiry, or suspected GitHub PAT compromise occurred.

1. Health Checks

kubectl -n renovate get cronjob,job,pod
kubectl -n renovate get events --sort-by='.lastTimestamp' | tail -n 20
kubectl -n renovate logs -l app.kubernetes.io/name=renovate --tail=200

Confirm the last scheduled run produced a successful Job:

kubectl -n renovate get jobs --sort-by=.status.startTime

2. Troubleshooting Workflows

Check the token secret and config before changing manifests.

kubectl -n renovate get secret renovate-env
kubectl -n renovate describe cronjob renovate

Common causes:

  • No PRs opening: RENOVATE_TOKEN is missing, expired, or lacks repo scope. Recreate overlays/local/.renovate.env from .renovate.env.example with a fresh GitHub PAT, then re-apply the overlay or let Fleet reconcile. Do not print the token.
  • Jobs failing with auth errors: the GitHub token is invalid or revoked. Rotate it.
  • Jobs failing with rate-limit messages: raise prConcurrentLimit spacing or reduce scan frequency in config.json after reviewing Renovate docs.
  • Invalid config: a malformed config.json prevents startup. Validate JSON locally and rebuild the overlay.
  • Image pull failure: the pinned tag in overlays/local/kustomization.yaml is unavailable; pick a valid renovate/renovate tag.

Force a manual run to reproduce a failure:

JOB="manual-$(date +%s)"
kubectl -n renovate create job --from=cronjob/renovate "$JOB"
kubectl -n renovate logs -l "job-name=$JOB" -f

3. Disaster Recovery

  1. Confirm whether the issue is token-only, config-only, or cluster-wide.
  2. Recreate renovate-env with a valid GitHub PAT.
  3. Reconcile renovate/overlays/local (or let Fleet reconcile).
  4. Trigger a manual run and confirm a PR appears on dev when an update exists.

Renovate is stateless. Recovery is recreating the Secret and re-running.

4. Scaling and Resource Management

kubectl -n renovate top pod
kubectl -n renovate describe cronjob renovate

Resource requests/limits are set on the CronJob pod. Adjust in Git if a run is OOM-killed; one run at a time is enforced by concurrencyPolicy: Forbid.

5. Maintenance Procedures

  • Rotate the GitHub PAT on a schedule or after any suspected exposure. After rotation, recreate renovate-env.
  • Review major-update PRs manually; majors are disabled by default and require a config change to enable.
  • When changing config.json, rebuild the overlay locally before pushing so Fleet reconciles a known-good state.

6. Rollback Strategy

  • Revert the renovate/ Git changes to restore the previous image tag, schedule, and config.
  • Recreate renovate-env if a token rotation happened in the same window.
  • Because the workload is stateless, rollback is a redeploy from Git.

7. Post-Incident Actions

  1. Add a changelog fragment for any recovery or security-relevant intervention (for example a token rotation).
  2. Update the service page if scope, schedule, or secret handling changed.
  3. Update this runbook with commands or checks learned during the incident.
  4. Record results in context/progress-tracker.md.