Skip to content

loki

Metadata

Field Value
Service loki
Purpose Shared log aggregation and query backend
Criticality Tier 1
Owner Platform / Observability owner
Clusters prod, jls, layer7, oci, oci-free
Namespace loki
Exposure internet
Stateful yes
Backup class app-native
RPO / RTO Daily storage target, 2 to 6 hours to restore depending on backend mode
Last reviewed 2026-05-20

1. Service Overview

Loki stores and serves application and platform logs for the monitored clusters.

Summary

If Loki fails, log ingestion and historical log search disappear for the affected environments.

Dependencies

Dependency Type Why it matters
Object storage or persistent volumes storage Stores chunks, indexes, and compactor state
Traefik ingress Exposes Loki APIs where configured
Collectors and monitoring stack ingest path Feed logs into the backend

2. Architecture Diagram

[Collectors]
  -> [Loki gateway / distributors]
  -> [Loki storage components]
  -> [Grafana / log queries]

3. Deployment Specifications

Item Value
Source path loki/prod and environment-specific directories under loki/
Deployment model Helm-based deployments with Fleet and environment-specific overlays
Namespace loki
Workload kind Gateway, StatefulSets, and Deployments depending on the mode
Chart or image version See the Helm chart versions committed under each environment
Config files fleet.yaml, prod/fleet.yaml, jls, layer7, oci, and oci-free manifests

Cluster mapping

Cluster Overlay path Notes
homelab / prod loki/prod Fleet Helm bundle entry point
jls loki/jls Environment-specific deployment
layer7 loki/layer7 Legacy environment retained in repo history
oci loki/oci OCI-specific path
oci-free loki/oci-free OCI-free-specific path

4. Configuration Guide

Environment variables

Variable Source Purpose Secret?
Loki runtime settings Helm values and overlay manifests Configure gateways, storage, retention, and scaling mixed

ConfigMaps

Resource Path Purpose
Helm-generated runtime config loki/* values and manifests Configure the deployed Loki mode

Secrets management

  • Secret names: storage credentials and optional auth secrets in the loki namespace
  • Source of truth: values files plus runtime secrets
  • Rotation trigger: storage credential changes or endpoint rotation
  • Recovery note: restore object-store credentials before restarting ingestion components

5. Access Protocols

Path URL or endpoint Audience Auth TLS terminates at
Internal Loki services in the loki namespace Collectors and Grafana cluster RBAC / service auth Service / gateway
External https://loki.mutana.site where enabled Operators and integrations ingress auth policy Traefik

6. Operations and Observability

  • Primary health indicators: ingestion succeeds, queries return, and storage backends are healthy.
  • Dashboards or alerts: shared Grafana and Loki health dashboards.
  • Log locations: gateway and backend pod logs in the loki namespace.
  • Known failure modes: object-store auth failures, compactor drift, retention misconfiguration, or ingress breakage.

7. Backup and Recovery Notes

  • Backup method: backend storage snapshots or object-store policy.
  • Restore prerequisites: restored storage credentials and healthy compactor/query paths.
  • Related runbook: ../runbooks/loki.md

8. Release and Change Notes

  • Current deployed app version: see each environment's committed chart version.
  • Current chart version: environment-specific under the loki directory.
  • Last significant change: documentation coverage added for the multi-environment Loki layout and Fleet entry points.
  • Rollback reference: previous Helm values or environment revision in Git.