I have installed the Grafana Loki single binary in my Kubernetes cluster using the Helm chart. Everything works great except that my persistent storage (filesystem) is filling up. I have read the storage retention configuration docs from Grafana and many posts here and elsewhere about this. I believe that I have configured my Loki installation to remove logs using the compactor, but my persistent volume keeps filling up.
I am using version 3.1.0 of the Loki helm chart (loki-3.1.0.tgz) to install version 2.6.1 of the Loki image (grafana/loki:2.6.1)
Here is my values.yaml file that I am using to install Loki:
# fullnameOverride: loki
# global:
# image:
# registry: null
monitoring:
dashboards:
enabled: false
rules:
enabled: false
alerts:
enabled: false
serviceMonitor:
enabled: false
selfMonitoring:
enabled: false
grafanaAgent:
installOperator: false
lokiCanary:
enabled: false
loki:
image:
# -- The Docker registry
registry: harbor.fractilia.com/library
# -- Docker image repository
repository: grafana/loki
# -- Overrides the image tag whose default is the chart's appVersion
tag: 2.6.1
# -- Docker image pull policy
pullPolicy: IfNotPresent
# Should authentication be enabled
auth_enabled: false
storage:
type: filesystem
compactor:
shared_store: filesystem
working_directory: /var/loki/boltdb-shipper-compactor
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 1h
retention_delete_worker_count: 100
limits_config:
retention_period: 2d
storage_config:
boltdb_shipper:
active_index_directory: /var/loki/boltdb-shipper-active
cache_location: /var/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /var/loki/chunks
# commonConfig:
# path_prefix: /var/loki
# replication_factor: 1
# server:
# log_level: debug
# NOTE: We need the chunk_store_config and ingester setting, and I don't see another way of getting them into the config.
config: |
{{- if .Values.enterprise.enabled}}
{{- tpl .Values.enterprise.config . }}
{{- else }}
auth_enabled: {{ .Values.loki.auth_enabled }}
{{- end }}
{{- with .Values.loki.server }}
server:
{{- toYaml . | nindent 2}}
{{- end}}
memberlist:
join_members:
- {{ include "loki.memberlist" . }}
{{- if .Values.loki.commonConfig}}
common:
{{- toYaml .Values.loki.commonConfig | nindent 2}}
storage:
{{- include "loki.commonStorageConfig" . | nindent 4}}
{{- end}}
{{- with .Values.loki.limits_config }}
limits_config:
{{- tpl (. | toYaml) $ | nindent 4 }}
{{- end }}
{{- with .Values.loki.memcached.chunk_cache }}
{{- if and .enabled .host }}
chunk_store_config:
chunk_cache_config:
memcached:
batch_size: {{ .batch_size }}
parallelism: {{ .parallelism }}
memcached_client:
host: {{ .host }}
service: {{ .service }}
{{- end }}
{{- end }}
{{- if .Values.loki.schemaConfig}}
schema_config:
{{- toYaml .Values.loki.schemaConfig | nindent 2}}
{{- else }}
schema_config:
configs:
- from: 2022-01-11
store: boltdb-shipper
{{- if eq .Values.loki.storage.type "s3" }}
object_store: s3
{{- else if eq .Values.loki.storage.type "gcs" }}
object_store: gcs
{{- else }}
object_store: filesystem
{{- end }}
schema: v12
index:
prefix: loki_index_
period: 24h
{{- end }}
{{- if or .Values.minio.enabled (eq .Values.loki.storage.type "s3") (eq .Values.loki.storage.type "gcs") }}
ruler:
storage:
{{- include "loki.rulerStorageConfig" . | nindent 4}}
{{- end -}}
{{- with .Values.loki.memcached.results_cache }}
query_range:
align_queries_with_step: true
{{- if and .enabled .host }}
cache_results: {{ .enabled }}
results_cache:
cache:
default_validity: {{ .default_validity }}
memcached_client:
host: {{ .host }}
service: {{ .service }}
timeout: {{ .timeout }}
{{- end }}
{{- end }}
{{- with .Values.loki.storage_config }}
storage_config:
{{- tpl (. | toYaml) $ | nindent 4 }}
{{- end }}
{{- with .Values.loki.query_scheduler }}
query_scheduler:
{{- tpl (. | toYaml) $ | nindent 4 }}
{{- end }}
{{- with .Values.loki.compactor }}
compactor:
{{- tpl (. | toYaml) $ | nindent 4 }}
{{- end }}
chunk_store_config:
max_look_back_period: "0s"
ingester:
chunk_block_size: 262144
chunk_idle_period: 30m
chunk_retain_period: 1m
lifecycler:
ring:
replication_factor: 1
max_transfer_retries: 0
wal:
dir: /var/loki/wal
# TODO: There might be nothing to do here.
# memberlist:
# abort_if_cluster_join_fails: false
# join_members:
# - loki-memberlist
# - loki-memberlist.logging.svc.cluster.local
singleBinary:
# -- Number of replicas for the single binary
replicas: 1
# -- Resource requests and limits for the single binary
resources: {}
# -- Node selector for single binary pods
nodeSelector: {}
persistence:
# -- Size of persistent disk
size: 500Gi
# -- Storage class to be used.
# If defined, storageClassName: <storageClass>.
# If set to "-", storageClassName: "", which disables dynamic provisioning.
# If empty or set to null, no storageClassName spec is
# set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
storageClass: "fame-storage-vsan-policy"
This creates this Loki config file:
apiVersion: v1
data:
config.yaml: |
auth_enabled: false
chunk_store_config:
max_look_back_period: 0s
common:
path_prefix: /var/loki
replication_factor: 3
storage:
filesystem:
chunks_directory: /var/loki/chunks
rules_directory: /var/loki/rules
compactor:
compaction_interval: 10m
retention_delete_delay: 1h
retention_delete_worker_count: 100
retention_enabled: true
shared_store: filesystem
working_directory: /var/loki/boltdb-shipper-compactor
ingester:
chunk_block_size: 262144
chunk_idle_period: 30m
chunk_retain_period: 1m
lifecycler:
ring:
replication_factor: 1
max_transfer_retries: 0
wal:
dir: /var/loki/wal
limits_config:
enforce_metric_name: false
max_cache_freshness_per_query: 10m
reject_old_samples: true
reject_old_samples_max_age: 168h
retention_period: 2d
split_queries_by_interval: 15m
memberlist:
join_members:
- loki-memberlist
query_range:
align_queries_with_step: true
schema_config:
configs:
- from: "2022-01-11"
index:
period: 24h
prefix: loki_index_
object_store: filesystem
schema: v12
store: boltdb-shipper
server:
grpc_listen_port: 9095
http_listen_port: 3100
storage_config:
boltdb_shipper:
active_index_directory: /var/loki/boltdb-shipper-active
cache_location: /var/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /var/loki/chunks
hedging:
at: 250ms
max_per_second: 20
up_to: 3
kind: ConfigMap
It looks like this is configured to delete logs after 2 days, (I initially had 3), but the usage on in my persistent volume keeps going up even after a week of running.
Is there something that I am missing in this configuration to get log retention working correctly?