I run Loki with S3(minio) storage for logs and trying to run it in HA in docker swarm with 2 nodes. The problem is, they are not able to create a ring. This is my configuration of loki (only difference between nodes is node name):
auth_enabled: false
server:
http_listen_port: 3100
common:
instance_interface_names:
- "lo"
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 2
ring:
instance_interface_names:
- "lo"
kvstore:
store: memberlist
memberlist:
abort_if_cluster_join_fails: false
randomize_node_name: false
node_name: loki1
bind_port: 7946
join_members:
- loki1:7946
- loki2:7946
max_join_backoff: 1m
max_join_retries: 10
min_join_backoff: 1s
compactor:
working_directory: /loki/compactor
shared_store: s3
compaction_interval: 5m
storage_config:
boltdb_shipper:
active_index_directory: /loki/index
cache_location: /loki/index_cache
shared_store: s3
cache_ttl: 24h
aws:
s3: http://minio:9000
bucketnames: loki
endpoint: minio:9000
insecure: true
access_key_id: minio
secret_access_key: miniominio
s3forcepathstyle: true
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: aws
schema: v11
index:
prefix: index_
period: 24h
Ther are able to see on each other and both of the has opened port 7946:
# From lok1
netstat -ltpn | grep 7946 && ping -c 1 loki2
tcp 0 0 :::7946 :::* LISTEN 1/loki
PING loki2 (10.0.6.162): 56 data bytes
64 bytes from 10.0.6.162: seq=0 ttl=42 time=0.094 ms
--- loki2 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.094/0.094/0.094 ms
# From lok2
netstat -ltpn | grep 7946 && ping -c 1 loki1
tcp 0 0 :::7946 :::* LISTEN 1/loki
PING loki1 (10.0.6.160): 56 data bytes
64 bytes from 10.0.6.160: seq=0 ttl=42 time=0.150 ms
--- loki1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.150/0.150/0.150 ms
When I check distributor ring api:
On Loki1:
On Loki2:
- Loki1 does not see Loki2.
- Loki2 does see Loki1 but evaluate it as Unhealthy. Comparition of Ownership differ from time to time, last time It was 49 to 51 percent.
From logs on Loki1:
2022-06-25T22:24:41.292941593Z ts=2022-06-25T22:24:41.292132302Z caller=memberlist_logger.go:74 level=warn msg="Failed to resolve loki2:7946: lookup loki2 on 127.0.0.11:53: no such host"
2022-06-25T22:24:44.318437095Z ts=2022-06-25T22:24:44.317763386Z caller=memberlist_logger.go:74 level=warn msg="Got ping for unexpected node 'loki2' from=[::]:7946"
2022-06-25T22:24:46.319644804Z ts=2022-06-25T22:24:46.317614471Z caller=memberlist_logger.go:74 level=warn msg="Got ping for unexpected node loki2 from=127.0.0.1:42442"
2022-06-25T22:24:46.319867679Z ts=2022-06-25T22:24:46.317690929Z caller=memberlist_logger.go:74 level=error msg="Failed fallback ping: EOF"
2022-06-25T22:24:49.316437708Z ts=2022-06-25T22:24:49.315831583Z caller=memberlist_logger.go:74 level=info msg="Suspect loki2 has failed, no acks received"
2022-06-25T22:24:54.318574960Z ts=2022-06-25T22:24:54.31814571Z caller=memberlist_logger.go:74 level=warn msg="Got ping for unexpected node 'loki2' from=[::]:7946"
2022-06-25T22:24:56.318793253Z ts=2022-06-25T22:24:56.318574211Z caller=memberlist_logger.go:74 level=warn msg="Got ping for unexpected node loki2 from=127.0.0.1:42480"
2022-06-25T22:24:56.318846295Z ts=2022-06-25T22:24:56.318640586Z caller=memberlist_logger.go:74 level=error msg="Failed fallback ping: EOF"
2022-06-25T22:25:04.318133298Z ts=2022-06-25T22:25:04.316741132Z caller=memberlist_logger.go:74 level=info msg="Suspect loki2 has failed, no acks received"
2022-06-25T22:25:04.318222173Z ts=2022-06-25T22:25:04.317927423Z caller=memberlist_logger.go:74 level=warn msg="Got ping for unexpected node 'loki2' from=[::]:7946"
2022-06-25T22:25:06.318917258Z ts=2022-06-25T22:25:06.318676008Z caller=memberlist_logger.go:74 level=warn msg="Got ping for unexpected node loki2 from=127.0.0.1:42496"
2022-06-25T22:25:06.318963133Z ts=2022-06-25T22:25:06.318794924Z caller=memberlist_logger.go:74 level=error msg="Failed fallback ping: EOF"
2022-06-25T22:25:09.322544342Z ts=2022-06-25T22:25:09.322220051Z caller=memberlist_logger.go:74 level=info msg="Marking loki2 as failed, suspect timeout reached (0 peer confirmations)"
2022-06-25T22:25:19.316513625Z ts=2022-06-25T22:25:19.316218583Z caller=memberlist_logger.go:74 level=info msg="Suspect loki2 has failed, no acks received"
According to documentation, I have configured common.ring
seciton and memberlist
section but it seems it still cannot create a cluster according to errors.
Can somebody help me to find out why my loki1 and loki2 cannot create cluster?
Is it ok to mount both of them the same S3 bucket? From logs it looks like problem of other type than shared storage.
Thank you for any tips.