lgtm-distributed
| Field |
Value |
| Service |
lgtm-distributed |
| Purpose |
Bundled shared observability stack for Grafana, Loki, Tempo, and Mimir |
| Criticality |
Tier 1 |
| Owner |
Platform / Observability owner |
| Clusters |
homelab |
| Namespace |
monitoring |
| Exposure |
internal |
| Stateful |
yes |
| Backup class |
snapshot |
| RPO / RTO |
Daily backup target, 4 to 8 hours to restore |
| Last reviewed |
2026-05-20 |
1. Service Overview
LGTM Distributed packages the shared observability backends used by Grafana and other monitoring consumers.
Summary
If this stack fails, shared logs, traces, metrics, and related observability workflows degrade or disappear.
Dependencies
| Dependency |
Type |
Why it matters |
| Object and persistent storage |
storage |
Required for Loki, Tempo, and Mimir state |
| Grafana consumers |
application dependency |
Shared dashboards and queries depend on the backends |
| Fleet |
GitOps |
Deploys and updates the Helm bundle |
2. Architecture Diagram
[Collectors / applications]
-> [Loki / Tempo / Mimir]
-> [Grafana]
-> [Operators]
3. Deployment Specifications
| Item |
Value |
| Source path |
lgtm-distributed/prod |
| Deployment model |
Fleet Helm bundle |
| Namespace |
monitoring |
| Workload kind |
Multiple StatefulSets and Deployments |
| Chart or image version |
See lgtm-distributed/prod/fleet.yaml and values.yaml |
| Config files |
prod/fleet.yaml, prod/values.yaml, root fleet.yaml |
Cluster mapping
| Cluster |
Overlay path |
Notes |
| homelab |
lgtm-distributed/prod |
Current shared observability deployment |
4. Configuration Guide
Environment variables
| Variable |
Source |
Purpose |
Secret? |
| Helm values-driven runtime settings |
lgtm-distributed/prod/values.yaml and runtime Secrets |
Configure Loki, Tempo, Mimir, and Grafana integration |
mixed |
ConfigMaps
| Resource |
Path |
Purpose |
| Helm-generated ConfigMaps |
lgtm-distributed/prod/values.yaml |
Configure the bundled observability stack |
Secrets management
- Secret names: storage, credentials, and backend integration secrets created by the Helm bundle
- Source of truth: values files plus runtime secret material
- Rotation trigger: object-store credential changes or backend integration updates
- Recovery note: restore backend and storage credentials before the bundle is reconciled
5. Access Protocols
| Path |
URL or endpoint |
Audience |
Auth |
TLS terminates at |
| Internal |
monitoring namespace services for Grafana, Loki, Tempo, and Mimir |
Operators and platform services |
cluster RBAC |
Service / ingress where configured |
| External |
Usually consumed through shared Grafana or dedicated gateways |
Platform operators |
Grafana auth or service-specific auth |
Traefik where configured |
6. Operations and Observability
- Primary health indicators: all backend components healthy, persistent storage attached, and query paths available.
- Dashboards or alerts: shared Grafana plus backend-specific health dashboards.
- Log locations: pod logs across the monitoring namespace.
- Known failure modes: object-store auth failures, compactor issues, PVC attach problems, or Helm drift.
7. Backup and Recovery Notes
- Backup method: backend snapshots and object-store backup policy.
- Restore prerequisites: restored storage credentials and recovered backend data.
- Related runbook: ../runbooks/lgtm-distributed.md
8. Release and Change Notes
- Current deployed app version: see the Helm values and chart release.
- Current chart version: see lgtm-distributed/prod/fleet.yaml.
- Last significant change: bundled observability stack documented and aligned with the current Fleet path.
- Rollback reference: previous Helm bundle revision in Git.