Skip to content

Service Catalogue Strategy

Naming convention

The documentation model mirrors the repository layout.

  • Top-level deployment directory: SERVICE_NAME/
  • Service page: docs/services/SERVICE_NAME.md
  • Runbook page when required: docs/runbooks/SERVICE_NAME.md

This removes ambiguity and allows operators to infer the documentation path directly from the repository tree.

When a service page is mandatory

Every top-level deployment directory should eventually gain a service page, including platform components and application workloads.

The minimum standard is:

  • Service overview and owner
  • Architecture and dependency map
  • Deployment source and namespace details
  • Configuration and secrets handling notes
  • Access methods and URLs

When a runbook is mandatory

A dedicated runbook is required if any of the following are true:

  • The service is Tier 0 or Tier 1
  • The service is security-sensitive or part of the access path
  • The service is shared, stateful, or required for platform recovery
  • The service is externally exposed and has non-trivial rollback or restore steps

Current coverage snapshot

Status as of 2026-07-02:

  • Current service pages: 26.
  • Current runbooks: 23.
  • Current deployment-like top-level directories detected by the validation heuristic: 58.
  • Deployment directories with service pages: 25.
  • fleet is documented as a GitOps bootstrap service even though it is not a normal deployment directory.
  • Existing service pages and runbooks are enforced by scripts/validate_docs.py.

Current priority gaps

The next documentation backfill should prioritize services with the highest operational impact.

Priority Services Why they come first
1 calico, csi-driver-nfs, longhorn, velero, democratic-csi They control networking, storage, or recovery for the rest of the estate
2 cloudflared, crowdsec-lapi, headscale, wazuh They protect or expose edge and security workflows
3 kube-prometheus-stack, mimir, grafana-agent, minio They provide observability or shared state needed during incidents
4 arr-stack, netbox, seafile, portainer, uptime-kuma They are stateful, externally useful, or operationally visible services
5 Remaining deployment directories Lower blast radius or easier to backfill during routine maintenance

Current catalogue pages

Service page creation workflow

  1. Copy the application template into docs/services/SERVICE_NAME.md.
  2. Fill the metadata block first so ownership, tier, clusters, and namespace are visible immediately.
  3. Add a separate runbook if the service meets the mandatory criteria.
  4. Add the service to mkdocs.yml and scripts/validate_docs.py once the README, service page, and required runbook exist.
  5. Add or update a changelog fragment when the service page is introduced during a broader deployment or validation change.

Completion standard

The catalogue is only complete when a new operator can locate a service, understand its dependencies, and open the correct runbook without searching outside the repository.