Timeouts with suspicious trace when running queries over larger ranges

harry-gadget · September 9, 2022, 7:26pm

Hi folks – I’m trying to debug a loki query timeout for a deployment I’m running. The query makes it to loki just fine, but times out after 60s. Here’s a trace for the query which is failing:

Here’s the details:

we’re running loki 2.6 in k8s using the simple-scalable helm chart with 5 read replicas
we’re using GCS for storage and boltdb-shipper for index storage
the query is over a 6 hour time range
the log volume is substantial but not huge (like 10GB of logs)
this is the query: {app_logs_service=~".+", environment_id="5343"} | json | userspaceLogLevel>=30 | connectionSyncId = "96"
the same query works over a much shorter time range
I don’t see any errors in the logs surrounding the underlying store’s rate limits or something like that (no 429s from GCP I can spot)
It seems like sometimes restarting the loki pods fixes queries, but doesn’t other times.

Can anyone see what might be happening within that trace that would explain this timeout pattern? Or can I provide any more info that might illuminate the issue?

system · September 9, 2023, 7:27pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loki times out when querying larger data sets Grafana Loki loki	8	302	March 22, 2024
Query error: upstream request timeout Grafana Loki	5	913	July 5, 2024
Loki won't excecute large queries Grafana Loki	9	4822	February 29, 2024
Logging a million lines per minute Grafana Loki	3	1600	February 20, 2023
Large Loki Logs Grafana Loki	4	804	February 7, 2023

Timeouts with suspicious trace when running queries over larger ranges

Related topics