Loki Compactor error "failed to upload delete requests file" err="mkdir index: read-only file system"

justinstauffer · February 9, 2022, 2:29pm

I have Loki deployed in distributed mode using MinIO as my chunk storage (all deployed on Kubernetes using Helm) and it seems to be mostly working. However, I don’t think the Compactor is working properly because I keep seeing this error in the compactor logs:

level=error ts=2022-02-09T14:20:56.444771061Z caller=delete_requests_table.go:89 msg="failed to upload delete requests file" err="mkdir index: read-only file system"

This is my Loki config:

    auth_enabled: false

    server:
      http_listen_port: 3100

    distributor:
      ring:
        kvstore:
          store: memberlist

    memberlist:
      join_members:
        - obs-loki-memberlist

    ingester:
      lifecycler:
        ring:
          kvstore:
            store: memberlist
          replication_factor: 1
      chunk_idle_period: 30m
      chunk_block_size: 262144
      chunk_encoding: snappy
      chunk_retain_period: 1m
      max_transfer_retries: 0
      wal:
        dir: /var/loki/wal

    limits_config:
      enforce_metric_name: false
      reject_old_samples: true
      reject_old_samples_max_age: 168h
      max_cache_freshness_per_query: 10m
      retention_period: 36h
    schema_config:
      configs:
      - from: "2022-02-01"
        index:
          period: 24h
          prefix: index_
        object_store: aws
        schema: v11
        store: boltdb-shipper
    storage_config:
      aws:
        # Note: use a fully qualified domain name, like localhost.
        # full example: http://loki:supersecret@localhost.:9000
        s3: http://loki:supersecret@minio.obs:80
        bucketnames: loki
        s3forcepathstyle: true
      boltdb_shipper:
        active_index_directory: /var/loki/boltdb-shipper-active
        cache_location: /var/loki/boltdb-shipper-cache
        cache_ttl: 12h         # Can be increased for faster performance over longer query periods, uses more disk space
        shared_store: s3
        index_gateway_client:
          server_address: dns:///obs-loki-index-gateway:9095

    chunk_store_config:
      max_look_back_period: 0s

    table_manager:
      retention_deletes_enabled: false
      retention_period: 0s

    query_range:
      align_queries_with_step: true
      max_retries: 5
      split_queries_by_interval: 15m
      cache_results: true
      results_cache:
        cache:
          enable_fifocache: true
          fifocache:
            max_size_items: 1024
            validity: 24h

    frontend_worker:
      frontend_address: obs-loki-query-frontend:9095

    frontend:
      log_queries_longer_than: 5s
      compress_responses: true
      tail_proxy_url: http://obs-loki-querier:3100

    compactor:
      working_directory: /var/loki/compactor/retention
      shared_store: filesystem
      compaction_interval: 10m
      retention_enabled: true
      retention_delete_delay: 2h
      retention_delete_worker_count: 150

    ruler:
      storage:
        type: local
        local:
          directory: /etc/loki/rules
      ring:
        kvstore:
          store: memberlist
      rule_path: /tmp/loki/scratch
      alertmanager_url: https://alertmanager.xx
      external_url: https://alertmanager.xx

This is my Compactor deployment spec:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "3"
    meta.helm.sh/release-name: obs
    meta.helm.sh/release-namespace: obs
  creationTimestamp: "2022-02-04T19:05:49Z"
  generation: 3
  labels:
    app.kubernetes.io/component: compactor
    app.kubernetes.io/instance: obs
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: loki
    app.kubernetes.io/version: 2.4.2
    helm.sh/chart: loki-0.42.0
  name: obs-loki-compactor
  namespace: obs
  resourceVersion: "3072053"
  uid: c4cc57c0-6c05-415e-8b6f-2c501bd87c89
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: compactor
      app.kubernetes.io/instance: obs
      app.kubernetes.io/name: loki
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        checksum/config: 684214a28423c534d7a4088fec6916acd29a3bcb9623d07b3860cf1c3dd0acd2
        prometheus.io/path: /metrics
        prometheus.io/port: "3100"
        prometheus.io/scrape: "true"
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: compactor
        app.kubernetes.io/instance: obs
        app.kubernetes.io/name: loki
    spec:
      containers:
      - args:
        - -config.file=/etc/loki/config/config.yaml
        - -target=compactor
        - -boltdb.shipper.compactor.working-directory=/var/loki/compactor
        image: docker.io/grafana/loki:2.4.2
        imagePullPolicy: IfNotPresent
        name: compactor
        ports:
        - containerPort: 3100
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: http
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources: {}
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /tmp
          name: temp
        - mountPath: /etc/loki/config
          name: config
        - mountPath: /var/loki
          name: data
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 10001
        runAsGroup: 10001
        runAsNonRoot: true
        runAsUser: 10001
      serviceAccount: obs-loki-compactor
      serviceAccountName: obs-loki-compactor
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir: {}
        name: temp
      - configMap:
          defaultMode: 420
          name: obs-loki
        name: config
      - name: data
        persistentVolumeClaim:
          claimName: data-obs-loki-compactor

As you can see there, I do have the compactor enabled with a persistent volume even though I’m not sure if this is needed – from my Helm values.yaml:

  compactor:
    enabled: true
    persistence:
      # -- Enable creating PVCs for the compactor
      enabled: true
      # -- Size of persistent disk
      size: 4Gi
      # -- Storage class to be used.
      # If defined, storageClassName: <storageClass>.
      # If set to "-", storageClassName: "", which disables dynamic provisioning.
      # If empty or set to null, no storageClassName spec is
      # set, choosing the default provisioner (gp2 on AWS, standard on GKE, AWS, and OpenStack).
      storageClass: null
    serviceAccount:
      create: true
    podAnnotations:
      prometheus.io/scrape: "true"
      prometheus.io/path: /metrics
      prometheus.io/port: "3100"

Is this error benign or is my compactor actually not working properly? The reason I think it is not working is because of the error above and also my MinIO storage keeps growing slowly with no discernible drops like I’d expect once compaction runs.

system · February 9, 2023, 2:29pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Delete request stuck at status received Grafana Loki loki	3	281	January 12, 2024
Problem with compaction .. Cant seem to find the index correctly Grafana Loki	2	1144	November 8, 2022
Deleting a corrupted directory for boltdb (file too small) Grafana Loki loki	2	1020	July 18, 2023
Newbie Question on Loki data storage : minIO Grafana Loki	2	765	October 21, 2023
Why data not being stored in minio? Grafana Loki loki	7	535	February 6, 2024

Loki Compactor error "failed to upload delete requests file" err="mkdir index: read-only file system"

Related topics