Loki error on port 9095 - error contacting scheduler

lucarepe · June 16, 2022, 11:37am

I’m developing a Docker Swarm application and I have a problem with Loki.
Sometimes I get this error on the logs:

level=error caller=scheduler_processor.go:87 msg=“error contacting scheduler” err=“rpc error: code = Unavailable desc = connection error: desc= "transport: Error while dialing dial tcp 10.0.0.47:9095 i/o timeout"” addr=10.0.0.47:9095

I said that sometimes I get this error because randomly after some new deploys it runs like a charm.
The strange thing is also that the port 9095 ins’t used by anyone inside my swarm so I really don’t know why it says that cannot dial that port.

This is how I have created the service inside my docker-compose.yml:

...
mon_loki:
    image: grafana/loki:2.5.0
    hostname: mon_loki
    restart: always
    ports:
      - 3100:3100
    command: -config.file=/etc/loki/local-config.yaml
    volumes:
      - /data/loki/config.yaml:/etc/loki/config.yaml
      - /data/loki:/data/loki
    deploy:
      placement:
        constraints:
          - node.role == worker
    depends_on: 
      - mon_node-exporter
      - mon_cadvisor
    networks:
      - docker
...

This is loki’s config.yaml

auth_enabled: false

server:
  http_listen_port: 3100

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 5m
  chunk_retain_period: 30s

schema_config:
  configs:
  - from: 2020-05-15
    store: boltdb
    object_store: filesystem
    schema: v11
    index:
      prefix: index_
      period: 168h

storage_config:
  boltdb:
    directory: /tmp/loki/index

  filesystem:
    directory: /tmp/loki/chunks

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h

rengenesio · June 19, 2022, 1:47am

I was experiencing the same issue here on Docker Swarm. In my setup the Loki container contains more than one network (the ingress network where it exposes the port 3100 and a private network with promtail). This issue seems be related with setups containing more than one network.

When inspecting the Loki configuration (with -print-config-stderr arg) I noticed that the setting instance_interface_names had all my network interfaces (on random order). Using telnet I noticed that the scheduler (that listens on port 9095) was only binding to one of the addresses which was not the address the processor was trying to connect.

As workaround I set the following properties to force the Loki components’ internal traffic use only the local interface:

common:
  instance_interface_names:
    - "lo"
  ring:
    instance_interface_names:
      - "lo"

I don’t know if this is the best approach for production environments and I’m still testing it but it seems solved this issue on my setup.

lucarepe · June 19, 2022, 9:06am

Thank you @rengenesio, I also have changed the network configuration in order to solve the problem.

system · June 19, 2023, 9:07am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error while adding the loki datasource in grafana Grafana Loki	4	1437	September 13, 2022
Loki driver on docker and getting stuck container Grafana Loki	4	2447	November 8, 2023
Loki and Promtail working but can't connect from Grafana Grafana Loki	4	2300	November 5, 2023
Loki container not responding Grafana Loki	1	77	July 9, 2024
Problem to call loki from other instance in swarm Grafana Loki	2	108	July 9, 2024

Loki error on port 9095 - error contacting scheduler

Related topics