Hi, I a k8s cluster deployed with loki-stack Ihave a loki server that worked well until the volume filled (100%). It started eating RAM. After enlarging the volume it showed permission problems that I fixed with chown/chmod.
Now I can’t start it, it seems it does not pass the liveness test (but honestly I can’t see anymore that warning that I saw in the morning). I just get
# kubectl describe -n monitoring pod/loki-0
Name: loki-0
Namespace: monitoring
...
Containers:
loki:
Container ID: containerd://b310f8f6edf97de394424ba21c905340e972013a1b3324b67854ce633c6a2efe
Image: grafana/loki:2.5.0
Image ID: docker.io/grafana/loki@sha256:f9ef133793af0b8dc9091fb9694edebb2392a17558639b8a17767afddcca7a0f
Ports: 3100/TCP, 9095/TCP, 7946/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
-config.file=/etc/loki/loki.yaml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 06 Oct 2022 13:47:45 +0000
Finished: Thu, 06 Oct 2022 13:48:55 +0000
Ready: False
Restart Count: 73
Liveness: http-get http://:http-metrics/ready delay=45s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http-metrics/ready delay=45s timeout=1s period=10s #success=1 #failure=3
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 4m32s (x816 over 4h12m) kubelet Back-off restarting failed container
and logs report:
# kubectl logs -f -n monitoring loki-0
level=info ts=2022-10-06T13:53:59.136420745Z caller=main.go:106 msg="Starting Loki" version="(version=2.5.0, branch=HEAD, revision=2d9d0ee23)"
level=info ts=2022-10-06T13:53:59.136966393Z caller=server.go:260 http=[::]:3100 grpc=[::]:9095 msg="server listening on addresses"
level=info ts=2022-10-06T13:53:59.137071848Z caller=modules.go:597 msg="RulerStorage is not configured in single binary mode and will not be started."
level=info ts=2022-10-06T13:53:59.137615082Z caller=memberlist_client.go:394 msg="Using memberlist cluster node name" name=loki-0-f8157810
level=info ts=2022-10-06T13:53:59.14276134Z caller=memberlist_client.go:513 msg="joined memberlist cluster" reached_nodes=1
level=warn ts=2022-10-06T13:53:59.144747503Z caller=experimental.go:20 msg="experimental feature in use" feature="In-memory (FIFO) cache"
level=info ts=2022-10-06T13:53:59.181752254Z caller=table_manager.go:239 msg="loading table index_19256"
[...]
level=info ts=2022-10-06T13:55:05.103199819Z caller=table.go:443 msg="cleaning up unwanted dbs from table index_19267"
level=info ts=2022-10-06T13:55:05.10380821Z caller=table.go:358 msg="uploading table index_19256"
level=info ts=2022-10-06T13:55:05.411105415Z caller=table.go:385 msg="finished uploading table index_19256"
level=info ts=2022-10-06T13:55:05.411164686Z caller=table.go:443 msg="cleaning up unwanted dbs from table index_19256"
level=info ts=2022-10-06T13:55:05.411244085Z caller=module_service.go:96 msg="module stopped" module=store
level=info ts=2022-10-06T13:55:05.412562643Z caller=modules.go:877 msg="server stopped"
level=info ts=2022-10-06T13:55:05.412613538Z caller=module_service.go:96 msg="module stopped" module=server
level=info ts=2022-10-06T13:55:05.412644661Z caller=loki.go:373 msg="Loki stopped"
level=error ts=2022-10-06T13:55:05.412703241Z caller=log.go:100 msg="error running loki" err="failed services\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:419\nmain.main\n\t/src/loki/cmd/loki/main.go:108\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581"
At this point the server restarts…
When I had permission problems logs clearly stated that, what else could it be? How can I debug it?
TIA