Slow querying and 504 timeouts on AWS using s3

jandomanskiit · September 23, 2022, 7:18pm

Describe the bug
Querying over all logs from specific namespace, we’re getting error 504 (GATE TIMEOUT) despite increasing timeouts. Our stack is base on helm chart loki-simple-scalable-1.8.5 and we’ are using s3 bucket for storing data.

I’m just wondering about changing that solution to dynamodb instead of s3 bucket or efs volume?

To Reproduce
Steps to reproduce the behavior:

Started Loki (SHA or version): 2.6.1
Started Promtail (SHA or version): 2.6.1
Query: {namespace="some-namespace"} |= 504``

Expected behavior
Faster querying without 504 error over logs older than 3 days

Environment:

Infrastructure: K8s
Deployment tool: helm, loki-simple-scalable-1.8.5

Screenshots, Promtail config, or terminal output

loki:
  readinessProbe:
    httpGet:
      path: /ready
      port: http-metrics
    initialDelaySeconds: 30
    timeoutSeconds: 1
  image:
    registry: docker.io
    repository: grafana/loki
    tag: null
    pullPolicy: IfNotPresent
  podAnnotations: {}
  revisionHistoryLimit: 10
  podSecurityContext:
    fsGroup: 10001
    runAsGroup: 10001
    runAsNonRoot: true
    runAsUser: 10001
  containerSecurityContext:
    readOnlyRootFilesystem: true
    capabilities:
      drop:
        - ALL
    allowPrivilegeEscalation: false
  existingSecretForConfig: ""
  config: |
    {{- if .Values.enterprise.enabled}}
    {{- tpl .Values.enterprise.config . }}
    {{- else }}
    auth_enabled: {{ .Values.loki.auth_enabled }}
    {{- end }}

    server:
      http_listen_port: 3100
      grpc_listen_port: 9095
      grpc_server_max_recv_msg_size: 15662562
      grpc_server_max_send_msg_size: 15662562
      http_server_read_timeout: 300s
      http_server_write_timeout: 300s

    memberlist:
      join_members:
        - {{ include "loki.name" . }}-memberlist

    {{- if .Values.loki.commonConfig}}
    common:
    {{- toYaml .Values.loki.commonConfig | nindent 2}}
      storage:
      {{- include "loki.commonStorageConfig" . | nindent 4}}
    {{- end}}

    limits_config:
      enforce_metric_name: false
      reject_old_samples: true
      reject_old_samples_max_age: 14h
      max_cache_freshness_per_query: 10m
      split_queries_by_interval: 15m
      max_query_length: 0h
      max_query_parallelism: 20

    {{- with .Values.loki.memcached.chunk_cache }}
    {{- if and .enabled .host }}
    chunk_store_config:
      chunk_cache_config:
        memcached:
          batch_size: {{ .batch_size }}
          parallelism: {{ .parallelism }}
        memcached_client:
          host: {{ .host }}
          service: {{ .service }}
    {{- end }}
    {{- end }}

    {{- if .Values.loki.schemaConfig}}
    schema_config:
    {{- toYaml .Values.loki.schemaConfig | nindent 2}}
    {{- else }}
    schema_config:
      configs:
        - from: 2022-09-01
          store: boltdb-shipper
          {{- if eq .Values.loki.storage.type "s3" }}
          object_store: s3
          {{- else if eq .Values.loki.storage.type "gcs" }}
          object_store: gcs
          {{- else }}
          object_store: filesystem
          {{- end }}
          schema: v12
          index:
            prefix: loki_index_
            period: 24h
    {{- end }}

    {{- if or .Values.minio.enabled (eq .Values.loki.storage.type "s3") (eq .Values.loki.storage.type "gcs") }}
    ruler:
      storage:
      {{- include "loki.rulerStorageConfig" . | nindent 4}}
    {{- end -}}

    {{- with .Values.loki.memcached.results_cache }}
    query_range:
      match_max_concurrent: false
      parallelism: 24
      align_queries_with_step: true
      {{- if and .enabled .host }}
      cache_results: {{ .enabled }}
      results_cache:
        cache:
          default_validity: {{ .default_validity }}
          memcached_client:
            host: {{ .host }}
            service: {{ .service }}
            timeout: {{ .timeout }}
      {{- end }}
    {{- end }}

    {{- with .Values.loki.storage_config }}
    storage_config:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.query_scheduler }}
    query_scheduler:
      max_outstanding_requests_per_tenant: 2048
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}
common configuration
  commonConfig:
    path_prefix: /var/loki
    replication_factor: 3

  storage:
    bucketNames:
      admin:  bucket-logs
      chunks:   bucket-logs
      ruler:   bucket-logs
    type: s3
    s3:
      s3: s3://eu-central-1
      region: eu-central-1
      s3ForcePathStyle: true
      insecure: false
    local:
      chunks_directory: /var/loki/chunks
      rules_directory: /var/loki/rules

  memcached:
    chunk_cache:
      enabled: false
      host: ""
      service: "memcached-client"
      batch_size: 256
      parallelism: 10
    results_cache:
      enabled: false
      host: ""
      service: "memcached-client"
      timeout: "500ms"
      default_validity: "12h"
schemas
  schemaConfig: {}

  structuredConfig: {}

  query_scheduler: {}

  storage_config:
    hedging:
      at: "250ms"
      max_per_second: 20
      up_to: 3

# Configuration for the write
write:
  replicas: 3
  image:
    registry: null
    repository: null
    tag: null
  priorityClassName: null
  podAnnotations: {}
  selectorLabels: {}
  serviceLabels: {}
  extraArgs: []
  extraEnv: []
  extraEnvFrom: 
    - secretRef:
        name: aws-s3-credentials
  extraVolumeMounts: []
  extraVolumes: []
  resources: {}


read:
  replicas: 3
  autoscaling:
    enabled: false
    minReplicas: 1
    maxReplicas: 3
    targetCPUUtilizationPercentage: 60
    targetMemoryUtilizationPercentage:
  image:
    registry: null
    repository: null
    tag: null
  priorityClassName: null
  podAnnotations: {}
  selectorLabels: {}
  serviceLabels: {}
  extraArgs: []
  extraEnv: []
  extraEnvFrom: 
    - secretRef:
        name: aws-s3-credentials

gateway:
  enabled: true
  replicas: 1
  # -- Enable logging of 2xx and 3xx HTTP requests
  verboseLogging: true

system · September 23, 2023, 7:19pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loki -Queries older that few hours are timing out (S3 backend) Grafana Loki	2	251	February 7, 2024
Improving Performance in Loki System for Production Use Grafana Loki loki	3	217	June 15, 2024
Loki complains about S3 bucket Grafana	5	2822	April 1, 2022
S3 + DynamoDB = error querying storage" err="QueryPages error: table=loki_XXX: RequestCanceled: request context canceled\ncaused by: context canceled" Grafana Loki aws	1	1768	May 11, 2022
Some starting questions about loki configuration Grafana Loki	1	129	May 29, 2024

Slow querying and 504 timeouts on AWS using s3

Related topics