I have Loki deployed in distributed mode using MinIO as my chunk storage (all deployed on Kubernetes using Helm) and it seems to be mostly working. However, I don’t think the Compactor is working properly because I keep seeing this error in the compactor logs:
level=error ts=2022-02-09T14:20:56.444771061Z caller=delete_requests_table.go:89 msg="failed to upload delete requests file" err="mkdir index: read-only file system"
This is my Loki config:
auth_enabled: false
server:
http_listen_port: 3100
distributor:
ring:
kvstore:
store: memberlist
memberlist:
join_members:
- obs-loki-memberlist
ingester:
lifecycler:
ring:
kvstore:
store: memberlist
replication_factor: 1
chunk_idle_period: 30m
chunk_block_size: 262144
chunk_encoding: snappy
chunk_retain_period: 1m
max_transfer_retries: 0
wal:
dir: /var/loki/wal
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
max_cache_freshness_per_query: 10m
retention_period: 36h
schema_config:
configs:
- from: "2022-02-01"
index:
period: 24h
prefix: index_
object_store: aws
schema: v11
store: boltdb-shipper
storage_config:
aws:
# Note: use a fully qualified domain name, like localhost.
# full example: http://loki:supersecret@localhost.:9000
s3: http://loki:supersecret@minio.obs:80
bucketnames: loki
s3forcepathstyle: true
boltdb_shipper:
active_index_directory: /var/loki/boltdb-shipper-active
cache_location: /var/loki/boltdb-shipper-cache
cache_ttl: 12h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: s3
index_gateway_client:
server_address: dns:///obs-loki-index-gateway:9095
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: false
retention_period: 0s
query_range:
align_queries_with_step: true
max_retries: 5
split_queries_by_interval: 15m
cache_results: true
results_cache:
cache:
enable_fifocache: true
fifocache:
max_size_items: 1024
validity: 24h
frontend_worker:
frontend_address: obs-loki-query-frontend:9095
frontend:
log_queries_longer_than: 5s
compress_responses: true
tail_proxy_url: http://obs-loki-querier:3100
compactor:
working_directory: /var/loki/compactor/retention
shared_store: filesystem
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
ruler:
storage:
type: local
local:
directory: /etc/loki/rules
ring:
kvstore:
store: memberlist
rule_path: /tmp/loki/scratch
alertmanager_url: https://alertmanager.xx
external_url: https://alertmanager.xx
This is my Compactor deployment spec:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
meta.helm.sh/release-name: obs
meta.helm.sh/release-namespace: obs
creationTimestamp: "2022-02-04T19:05:49Z"
generation: 3
labels:
app.kubernetes.io/component: compactor
app.kubernetes.io/instance: obs
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: loki
app.kubernetes.io/version: 2.4.2
helm.sh/chart: loki-0.42.0
name: obs-loki-compactor
namespace: obs
resourceVersion: "3072053"
uid: c4cc57c0-6c05-415e-8b6f-2c501bd87c89
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/component: compactor
app.kubernetes.io/instance: obs
app.kubernetes.io/name: loki
strategy:
type: Recreate
template:
metadata:
annotations:
checksum/config: 684214a28423c534d7a4088fec6916acd29a3bcb9623d07b3860cf1c3dd0acd2
prometheus.io/path: /metrics
prometheus.io/port: "3100"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app.kubernetes.io/component: compactor
app.kubernetes.io/instance: obs
app.kubernetes.io/name: loki
spec:
containers:
- args:
- -config.file=/etc/loki/config/config.yaml
- -target=compactor
- -boltdb.shipper.compactor.working-directory=/var/loki/compactor
image: docker.io/grafana/loki:2.4.2
imagePullPolicy: IfNotPresent
name: compactor
ports:
- containerPort: 3100
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /ready
port: http
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: temp
- mountPath: /etc/loki/config
name: config
- mountPath: /var/loki
name: data
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 10001
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001
serviceAccount: obs-loki-compactor
serviceAccountName: obs-loki-compactor
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: temp
- configMap:
defaultMode: 420
name: obs-loki
name: config
- name: data
persistentVolumeClaim:
claimName: data-obs-loki-compactor
As you can see there, I do have the compactor enabled with a persistent volume even though I’m not sure if this is needed – from my Helm values.yaml:
compactor:
enabled: true
persistence:
# -- Enable creating PVCs for the compactor
enabled: true
# -- Size of persistent disk
size: 4Gi
# -- Storage class to be used.
# If defined, storageClassName: <storageClass>.
# If set to "-", storageClassName: "", which disables dynamic provisioning.
# If empty or set to null, no storageClassName spec is
# set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
storageClass: null
serviceAccount:
create: true
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/path: /metrics
prometheus.io/port: "3100"
Is this error benign or is my compactor actually not working properly? The reason I think it is not working is because of the error above and also my MinIO storage keeps growing slowly with no discernible drops like I’d expect once compaction runs.