Hello,
We’ve deployed a beefy (100+ cores, 124GB RAM, 1.8T of SSDs for storage) test server with loki/minio and grafana to test loki and its performance.
For now, we only installed loki in a monolithic way, with the RPM, along with Grafana.
My issue is that data visualization on Grafana is quite slow. looking at a rate of all logs for 7 days can take 30s for grafana to load on explorer view.
So I tried to test loki performance with logcli tools, and got these results :
[bla]# time /usr/local/bin/logcli query --stats --since=168h 'rate({filename="/var/log/squid/access.log"}[1m])' > file.txt
http://<ip>:3100/loki/api/v1/query_range?direction=BACKWARD&end=1668017851935752943&limit=30&query=rate%28%7Bfilename%3D%22%2Fvar%2Flog%2Fsquid%2Faccess.log%22%7D%5B1m%5D%29&start=1667413051935752943
Ingester.TotalReached 80
Ingester.TotalChunksMatched 4
Ingester.TotalBatches 13067
Ingester.TotalLinesSent 6686238
Ingester.TotalChunksRef 162
Ingester.TotalChunksDownloaded 162
Ingester.ChunksDownloadTime 591.392749ms
Ingester.HeadChunkBytes 430 kB
Ingester.HeadChunkLines 2092
Ingester.DecompressedBytes 1.5 GB
Ingester.DecompressedLines 6697519
Ingester.CompressedBytes 201 MB
Ingester.TotalDuplicates 0
Querier.TotalChunksRef 4340
Querier.TotalChunksDownloaded 4340
Querier.ChunksDownloadTime 11.741086355s
Querier.HeadChunkBytes 0 B
Querier.HeadChunkLines 0
Querier.DecompressedBytes 57 GB
Querier.DecompressedLines 252310768
Querier.CompressedBytes 4.7 GB
Querier.TotalDuplicates 0
Summary.BytesProcessedPerSecond 4.1 GB
Summary.LinesProcessedPerSecond 18190820
Summary.TotalBytesProcessed 58 GB
Summary.TotalLinesProcessed 259010379
Summary.ExecTime 14.238520823s
Summary.QueueTime 5m10.55325166s
so apparently 4.1 GB processed per second is not that bad, so I’m not sure what to look next to improve this.
Grafana takes double the time for the same request (~30s)
My ideas :
- loki is not meant for this kind of workload ?
- Grafana is not using loki at full potential, limiting the query somehow ?
- chunks not optimized ? label count is very low (too low?) we have only hostname and filename as labels and we log less than 10GB a day.
Thank you for your help and for any link or ideas I could explore to further dig into loki/grafana performance and optimization !