Skip to content

lgtm-distributed

Metadata

Field Value
Service lgtm-distributed
Purpose Bundled shared observability stack for Grafana, Loki, Tempo, and Mimir
Criticality Tier 1
Owner Platform / Observability owner
Clusters homelab
Namespace monitoring
Exposure internal
Stateful yes
Backup class snapshot
RPO / RTO Daily backup target, 4 to 8 hours to restore
Last reviewed 2026-05-20

1. Service Overview

LGTM Distributed packages the shared observability backends used by Grafana and other monitoring consumers.

Summary

If this stack fails, shared logs, traces, metrics, and related observability workflows degrade or disappear.

Dependencies

Dependency Type Why it matters
Object and persistent storage storage Required for Loki, Tempo, and Mimir state
Grafana consumers application dependency Shared dashboards and queries depend on the backends
Fleet GitOps Deploys and updates the Helm bundle

2. Architecture Diagram

[Collectors / applications]
  -> [Loki / Tempo / Mimir]
  -> [Grafana]
  -> [Operators]

3. Deployment Specifications

Item Value
Source path lgtm-distributed/prod
Deployment model Fleet Helm bundle
Namespace monitoring
Workload kind Multiple StatefulSets and Deployments
Chart or image version See lgtm-distributed/prod/fleet.yaml and values.yaml
Config files prod/fleet.yaml, prod/values.yaml, root fleet.yaml

Cluster mapping

Cluster Overlay path Notes
homelab lgtm-distributed/prod Current shared observability deployment

4. Configuration Guide

Environment variables

Variable Source Purpose Secret?
Helm values-driven runtime settings lgtm-distributed/prod/values.yaml and runtime Secrets Configure Loki, Tempo, Mimir, and Grafana integration mixed

ConfigMaps

Resource Path Purpose
Helm-generated ConfigMaps lgtm-distributed/prod/values.yaml Configure the bundled observability stack

Secrets management

  • Secret names: storage, credentials, and backend integration secrets created by the Helm bundle
  • Source of truth: values files plus runtime secret material
  • Rotation trigger: object-store credential changes or backend integration updates
  • Recovery note: restore backend and storage credentials before the bundle is reconciled

5. Access Protocols

Path URL or endpoint Audience Auth TLS terminates at
Internal monitoring namespace services for Grafana, Loki, Tempo, and Mimir Operators and platform services cluster RBAC Service / ingress where configured
External Usually consumed through shared Grafana or dedicated gateways Platform operators Grafana auth or service-specific auth Traefik where configured

6. Operations and Observability

  • Primary health indicators: all backend components healthy, persistent storage attached, and query paths available.
  • Dashboards or alerts: shared Grafana plus backend-specific health dashboards.
  • Log locations: pod logs across the monitoring namespace.
  • Known failure modes: object-store auth failures, compactor issues, PVC attach problems, or Helm drift.

7. Backup and Recovery Notes

  • Backup method: backend snapshots and object-store backup policy.
  • Restore prerequisites: restored storage credentials and recovered backend data.
  • Related runbook: ../runbooks/lgtm-distributed.md

8. Release and Change Notes

  • Current deployed app version: see the Helm values and chart release.
  • Current chart version: see lgtm-distributed/prod/fleet.yaml.
  • Last significant change: bundled observability stack documented and aligned with the current Fleet path.
  • Rollback reference: previous Helm bundle revision in Git.