Loki Compactor error "failed to upload delete requests file" err="mkdir index: read-only file system"

I have Loki deployed in distributed mode using MinIO as my chunk storage (all deployed on Kubernetes using Helm) and it seems to be mostly working. However, I don’t think the Compactor is working properly because I keep seeing this error in the compactor logs:

level=error ts=2022-02-09T14:20:56.444771061Z caller=delete_requests_table.go:89 msg="failed to upload delete requests file" err="mkdir index: read-only file system"

This is my Loki config:

    auth_enabled: false

    server:
      http_listen_port: 3100

    distributor:
      ring:
        kvstore:
          store: memberlist

    memberlist:
      join_members:
        - obs-loki-memberlist

    ingester:
      lifecycler:
        ring:
          kvstore:
            store: memberlist
          replication_factor: 1
      chunk_idle_period: 30m
      chunk_block_size: 262144
      chunk_encoding: snappy
      chunk_retain_period: 1m
      max_transfer_retries: 0
      wal:
        dir: /var/loki/wal

    limits_config:
      enforce_metric_name: false
      reject_old_samples: true
      reject_old_samples_max_age: 168h
      max_cache_freshness_per_query: 10m
      retention_period: 36h
    schema_config:
      configs:
      - from: "2022-02-01"
        index:
          period: 24h
          prefix: index_
        object_store: aws
        schema: v11
        store: boltdb-shipper
    storage_config:
      aws:
        # Note: use a fully qualified domain name, like localhost.
        # full example: http://loki:supersecret@localhost.:9000
        s3: http://loki:supersecret@minio.obs:80
        bucketnames: loki
        s3forcepathstyle: true
      boltdb_shipper:
        active_index_directory: /var/loki/boltdb-shipper-active
        cache_location: /var/loki/boltdb-shipper-cache
        cache_ttl: 12h         # Can be increased for faster performance over longer query periods, uses more disk space
        shared_store: s3
        index_gateway_client:
          server_address: dns:///obs-loki-index-gateway:9095

    chunk_store_config:
      max_look_back_period: 0s

    table_manager:
      retention_deletes_enabled: false
      retention_period: 0s

    query_range:
      align_queries_with_step: true
      max_retries: 5
      split_queries_by_interval: 15m
      cache_results: true
      results_cache:
        cache:
          enable_fifocache: true
          fifocache:
            max_size_items: 1024
            validity: 24h

    frontend_worker:
      frontend_address: obs-loki-query-frontend:9095

    frontend:
      log_queries_longer_than: 5s
      compress_responses: true
      tail_proxy_url: http://obs-loki-querier:3100

    compactor:
      working_directory: /var/loki/compactor/retention
      shared_store: filesystem
      compaction_interval: 10m
      retention_enabled: true
      retention_delete_delay: 2h
      retention_delete_worker_count: 150

    ruler:
      storage:
        type: local
        local:
          directory: /etc/loki/rules
      ring:
        kvstore:
          store: memberlist
      rule_path: /tmp/loki/scratch
      alertmanager_url: https://alertmanager.xx
      external_url: https://alertmanager.xx

This is my Compactor deployment spec:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "3"
    meta.helm.sh/release-name: obs
    meta.helm.sh/release-namespace: obs
  creationTimestamp: "2022-02-04T19:05:49Z"
  generation: 3
  labels:
    app.kubernetes.io/component: compactor
    app.kubernetes.io/instance: obs
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: loki
    app.kubernetes.io/version: 2.4.2
    helm.sh/chart: loki-0.42.0
  name: obs-loki-compactor
  namespace: obs
  resourceVersion: "3072053"
  uid: c4cc57c0-6c05-415e-8b6f-2c501bd87c89
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: compactor
      app.kubernetes.io/instance: obs
      app.kubernetes.io/name: loki
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        checksum/config: 684214a28423c534d7a4088fec6916acd29a3bcb9623d07b3860cf1c3dd0acd2
        prometheus.io/path: /metrics
        prometheus.io/port: "3100"
        prometheus.io/scrape: "true"
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: compactor
        app.kubernetes.io/instance: obs
        app.kubernetes.io/name: loki
    spec:
      containers:
      - args:
        - -config.file=/etc/loki/config/config.yaml
        - -target=compactor
        - -boltdb.shipper.compactor.working-directory=/var/loki/compactor
        image: docker.io/grafana/loki:2.4.2
        imagePullPolicy: IfNotPresent
        name: compactor
        ports:
        - containerPort: 3100
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: http
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources: {}
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /tmp
          name: temp
        - mountPath: /etc/loki/config
          name: config
        - mountPath: /var/loki
          name: data
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 10001
        runAsGroup: 10001
        runAsNonRoot: true
        runAsUser: 10001
      serviceAccount: obs-loki-compactor
      serviceAccountName: obs-loki-compactor
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir: {}
        name: temp
      - configMap:
          defaultMode: 420
          name: obs-loki
        name: config
      - name: data
        persistentVolumeClaim:
          claimName: data-obs-loki-compactor

As you can see there, I do have the compactor enabled with a persistent volume even though I’m not sure if this is needed – from my Helm values.yaml:

  compactor:
    enabled: true
    persistence:
      # -- Enable creating PVCs for the compactor
      enabled: true
      # -- Size of persistent disk
      size: 4Gi
      # -- Storage class to be used.
      # If defined, storageClassName: <storageClass>.
      # If set to "-", storageClassName: "", which disables dynamic provisioning.
      # If empty or set to null, no storageClassName spec is
      # set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
      storageClass: null
    serviceAccount:
      create: true
    podAnnotations:
      prometheus.io/scrape: "true"
      prometheus.io/path: /metrics
      prometheus.io/port: "3100"

Is this error benign or is my compactor actually not working properly? The reason I think it is not working is because of the error above and also my MinIO storage keeps growing slowly with no discernible drops like I’d expect once compaction runs.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.