loki
| Field |
Value |
| Service |
loki |
| Purpose |
Shared log aggregation and query backend |
| Criticality |
Tier 1 |
| Owner |
Platform / Observability owner |
| Clusters |
prod, jls, layer7, oci, oci-free |
| Namespace |
loki |
| Exposure |
internet |
| Stateful |
yes |
| Backup class |
app-native |
| RPO / RTO |
Daily storage target, 2 to 6 hours to restore depending on backend mode |
| Last reviewed |
2026-05-20 |
1. Service Overview
Loki stores and serves application and platform logs for the monitored clusters.
Summary
If Loki fails, log ingestion and historical log search disappear for the affected environments.
Dependencies
| Dependency |
Type |
Why it matters |
| Object storage or persistent volumes |
storage |
Stores chunks, indexes, and compactor state |
| Traefik |
ingress |
Exposes Loki APIs where configured |
| Collectors and monitoring stack |
ingest path |
Feed logs into the backend |
2. Architecture Diagram
[Collectors]
-> [Loki gateway / distributors]
-> [Loki storage components]
-> [Grafana / log queries]
3. Deployment Specifications
| Item |
Value |
| Source path |
loki/prod and environment-specific directories under loki/ |
| Deployment model |
Helm-based deployments with Fleet and environment-specific overlays |
| Namespace |
loki |
| Workload kind |
Gateway, StatefulSets, and Deployments depending on the mode |
| Chart or image version |
See the Helm chart versions committed under each environment |
| Config files |
fleet.yaml, prod/fleet.yaml, jls, layer7, oci, and oci-free manifests |
Cluster mapping
| Cluster |
Overlay path |
Notes |
| homelab / prod |
loki/prod |
Fleet Helm bundle entry point |
| jls |
loki/jls |
Environment-specific deployment |
| layer7 |
loki/layer7 |
Legacy environment retained in repo history |
| oci |
loki/oci |
OCI-specific path |
| oci-free |
loki/oci-free |
OCI-free-specific path |
4. Configuration Guide
Environment variables
| Variable |
Source |
Purpose |
Secret? |
| Loki runtime settings |
Helm values and overlay manifests |
Configure gateways, storage, retention, and scaling |
mixed |
ConfigMaps
| Resource |
Path |
Purpose |
| Helm-generated runtime config |
loki/* values and manifests |
Configure the deployed Loki mode |
Secrets management
- Secret names: storage credentials and optional auth secrets in the loki namespace
- Source of truth: values files plus runtime secrets
- Rotation trigger: storage credential changes or endpoint rotation
- Recovery note: restore object-store credentials before restarting ingestion components
5. Access Protocols
| Path |
URL or endpoint |
Audience |
Auth |
TLS terminates at |
| Internal |
Loki services in the loki namespace |
Collectors and Grafana |
cluster RBAC / service auth |
Service / gateway |
| External |
https://loki.mutana.site where enabled |
Operators and integrations |
ingress auth policy |
Traefik |
6. Operations and Observability
- Primary health indicators: ingestion succeeds, queries return, and storage backends are healthy.
- Dashboards or alerts: shared Grafana and Loki health dashboards.
- Log locations: gateway and backend pod logs in the loki namespace.
- Known failure modes: object-store auth failures, compactor drift, retention misconfiguration, or ingress breakage.
7. Backup and Recovery Notes
- Backup method: backend storage snapshots or object-store policy.
- Restore prerequisites: restored storage credentials and healthy compactor/query paths.
- Related runbook: ../runbooks/loki.md
8. Release and Change Notes
- Current deployed app version: see each environment's committed chart version.
- Current chart version: environment-specific under the loki directory.
- Last significant change: documentation coverage added for the multi-environment Loki layout and Fleet entry points.
- Rollback reference: previous Helm values or environment revision in Git.