/loki/api/v1/series?match api request result problem

ibizdevops · March 13, 2023, 7:55am

Hello, everyone! The grafana (9.3.8) and loki (2.7.4) we used encountered some problems when querying the log labels(label browser → select labels) in Grafana Explore.
From the documentation, the time range of Grafana query labels is 1 hour, but the results obtained by Grafana Explore (Loki API) in 1 hour are inconsistent. What causes this?
We have tried grafana 9.1.x -9.4.x version, so guessing should be Loki problem (we upgraded Loki last week, from 2.4.2-> 2.7.4). The related screenshots are as follows.

Why did labels change after a period of time? (within 1 hour)

==> this is first request match lables (grafana explore)

==> this is first request match labels (loki api)

*** new user restrictions, the second request screenshot is below ***

ibizdevops · March 13, 2023, 7:56am

==> this is second request (loki api)

tonyswumac · March 13, 2023, 4:17pm

What’s your configuration for max_chunk_age and query_ingesters_within?

I’ve not noticed this myself, what level of inconsistencies have you noticed? How many labels and potential label values do you have roughly?

ibizdevops · March 14, 2023, 1:36am

Hello, thank you for your reply. Our max_chunk_age and query_ingsters_within configuration is default.
We only have about 20 label_values. In addition, we also noticed that this github issue seems to be similar to it.

github.com/grafana/loki

Series API endpoint unreliable for sparse logs, breaks label_values in Grafana

opened 06:11PM - 03 Jan 23 UTC

DieterDP-ng

I have a Grafana dashboard that displays logs for several pods in a kubernetes e…nvironment. We ingest logs from both console and log files, and apply a `source` label to differentiate these. In the dashboard we allow selecting the Kubernetes namespace, Kubernetes pod and source using dropdowns (Grafana variables) so the user can select which logs he wants to see. One of these pods only outputs a single log line every +- 15 minutes. For this pod, I noticed that the `source` dropdown will not get filled in, i.e. **the Grafana label_values call fails for Loki data** . This causes it to default to `None`, and applied to the LogQL query, results in no logs being found. In the end, it seems that there are no logs, while in fact there are. For reference, the `source` variable is a Grafana configured as: `label_values({namespace="$namespace", pod="$pod"}, source)` I tracked this problem down to the Loki series API, as this is the one [used by `label_values`](https://grafana.com/docs/grafana/latest/datasources/prometheus/template-variables/). When using the `/api/v1/series` API, Loki returns no data. ``` curl -g 'http://localhost:42135/loki/api/v1/series?end=1672767401037029205&start=1672763401037029205' --data-urlencode 'match[]={namespace="devic1-shared",pod="yarn-resourcemanager-0-0"}' {"status":"success","data":[]} ``` However, when using the `/api/v1/query_range` over the same time range, data is returned: ``` curl -G -s 'http://localhost:42135/loki/api/v1/query_range?' --data-urlencode 'end=1672767401037029205' --data-urlencode 'start=1672763401037029205' --data-urlencode 'query={namespace="devic1-shared",pod="yarn-resourcemanager-0-0"}' | jq { "status": "success", "data": { "resultType": "streams", "result": [ { "stream": { "component": "resourcemanager", "filename": "/var/lib/kubelet/pods/d59a01f9-8733-4560-b7cc-78cd0514c375/volumes/kubernetes.io~empty-dir/logs/gc.log", "job": "devic1-shared/yarn", "namespace": "devic1-shared", "node_name": "10.178.0.95", "pod": "yarn-resourcemanager-0-0", "source": "gc.log", "app": "yarn" }, "values": [ [ "1672766971030608973", "[2023-01-03T17:29:30.927+0000][94592.430s][1] GC(152) Pause Young (Normal) (G1 Evacuation Pause) 125M->39M(145M) 1.925ms" ], [ "1672766250984737472", "[2023-01-03T17:17:30.950+0000][93872.452s][1] GC(151) Pause Young (Normal) (G1 Evacuation Pause) 125M->39M(145M) 26.382ms" ], [ "1672765471022969603", "[2023-01-03T17:04:30.927+0000][93092.430s][1] GC(150) Pause Young (Normal) (G1 Evacuation Pause) 125M->39M(145M) 1.888ms" ], [ "1672764691019264949", "[2023-01-03T16:51:30.954+0000][92312.457s][1] GC(149) Pause Young (Normal) (G1 Evacuation Pause) 125M->39M(145M) 28.135ms" ], [ "1672763970962902691", "[2023-01-03T16:39:30.944+0000][91592.446s][1] GC(148) Pause Young (Normal) (G1 Evacuation Pause) 125M->39M(145M) 19.950ms" ] ] } ], "stats": { "summary": { "bytesProcessedPerSecond": 231286, "linesProcessedPerSecond": 1644, "totalBytesProcessed": 703, "totalLinesProcessed": 5, "execTime": 0.00303952, "queueTime": 6.1705e-05, "subqueries": 1, "totalEntriesReturned": 5 }, "querier": { "store": { "totalChunksRef": 0, "totalChunksDownloaded": 0, "chunksDownloadTime": 0, "chunk": { "headChunkBytes": 0, "headChunkLines": 0, "decompressedBytes": 0, "decompressedLines": 0, "compressedBytes": 0, "totalDuplicates": 0 } } }, "ingester": { "totalReached": 1, "totalChunksMatched": 0, "totalBatches": 1, "totalLinesSent": 5, "store": { "totalChunksRef": 5, "totalChunksDownloaded": 5, "chunksDownloadTime": 521213, "chunk": { "headChunkBytes": 0, "headChunkLines": 0, "decompressedBytes": 703, "decompressedLines": 5, "compressedBytes": 778, "totalDuplicates": 0 } } } } } } ``` The strange thing is, the behavior of the first query depends on when exactly you execute it: - < 5 mins after a log entry was ingested: the query will return the series - > 5 mins, < X: the query will return empty (as above) - > X (at some point in the future, I am unsure what triggers this): the query will return the series I am using Loki v2.6.1.

The following is the loki configuration we use (run in single-node Pod).

loki.yaml

target: all
auth_enabled: true

server:
http_listen_port: 3100
grpc_listen_port: 9095
http_server_read_timeout: 120s # default: 30s
http_server_write_timeout: 120s # default: 30s
http_server_idle_timeout: 300s # default: 120s
grpc_server_max_recv_msg_size: 8388608 # default: 4194304
grpc_server_max_send_msg_size: 8388608 # default: 4194304
grpc_server_max_concurrent_streams: 200 # default: 100
log_level: warn # default: info

querier:
tail_max_duration: 30m # default: 1h
max_concurrent: 6 # default: 10
multi_tenant_queries_enabled: false # default: false
engine:
max_look_back_period: 60s # default: 30s

query_scheduler:
max_outstanding_requests_per_tenant: 200 # default: 100
querier_forget_delay: 1m # default: 0s

frontend:
max_outstanding_per_tenant: 200 # default: 100
querier_forget_delay: 1m # default: 0s

query_range:
max_retries: 3 # default: 5
parallelise_shardable_queries: false # default: true

ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
heartbeat_timeout: 1m
replication_factor: 1 # default: 3
heartbeat_period: 5s
interface_names: [ “eth0”, “en0”, “ens33”, “ens160” ]
final_sleep: 0s # default: 30s
chunk_retain_period: 30s # default: 0s
chunk_idle_period: 5m # default: 30m
wal:
enabled: true
dir: /data/wal
flush_on_shutdown: true # default: false
replay_memory_ceiling: 1GB # default: 4GB

storage_config:
boltdb_shipper:
active_index_directory: /data/index
shared_store: filesystem
cache_location: /data/index_cache
cache_ttl: 24h
resync_interval: 5m
query_ready_num_days: 0
filesystem:
directory: /data/chunks
index_cache_validity: 5m
max_chunk_batch_size: 50

schema_config:
configs:

from: 2020-01-01
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h # default: 168h
chunks:
period: 168h
row_shards: 16

compactor:
working_directory: /data/compactor
shared_store: filesystem
retention_enabled: true # default: false
retention_delete_delay: 10m # default: 2h
retention_delete_worker_count: 150
delete_request_cancel_period: 1h # default: 24h

limits_config:
ingestion_rate_strategy: global
split_queries_by_interval: 10m # default: 30m
query_timeout: 10m # default: 1m
ingestion_rate_mb: 40 # default: 4
ingestion_burst_size_mb: 60 # default: 6
enforce_metric_name: false # default: true
max_entries_limit_per_query: 10000 # default: 5000
deletion_mode: filter-and-delete

table_manager:
retention_deletes_enabled: true # default: false
retention_period: 17280h # default: 0

ibizdevops · March 22, 2023, 9:14am

We tried to further understand the working principles and configurations of each component in Loki, and then conducted tests. We found that the issue seemed to be related to our chunk_idle_period configuration. Our development and testing environments have fewer log streams, it was easy for us to reach the chunk_idle_period(5m), which caused chunk flushing (perhaps for this reason). Now we have adjusted it to 1h, and the query label values meet our expectations.
During our understanding of Loki components and testing process, we found that some previous configurations were incorrect. Therefore, adjustments were made based on our actual environment, and now our configuration is as follows, running well.

loki.yaml

target: all
auth_enabled: true
server:
http_listen_port: 3100 # default: 80
grpc_listen_port: 9095 # default: 9095
http_server_read_timeout: 120s # default: 30s
http_server_write_timeout: 120s # default: 30s
http_server_idle_timeout: 300s # default: 120s
graceful_shutdown_timeout: 30s # default: 30s
grpc_server_max_recv_msg_size: 8388608 # default: 4194304
grpc_server_max_send_msg_size: 8388608 # default: 4194304
grpc_server_max_concurrent_streams: 1000 # default: 100
log_level: warn # default: info
common:
path_prefix: /data
replication_factor: 1 # default: 3
ring:
kvstore:
store: inmemory
heartbeat_period: 15s
heartbeat_timeout: 1m
limits_config:
ingestion_rate_strategy: global
ingestion_rate_mb: 40 # default: 4
ingestion_burst_size_mb: 60 # default: 6
reject_old_samples: true
reject_old_samples_max_age: 168h
enforce_metric_name: false # default: true
max_entries_limit_per_query: 10000 # default: 5000
max_global_streams_per_user: 10000 # default: 5000
unordered_writes: true
max_query_length: 2161h # default: 721h
max_query_series: 5000 # default: 500
max_streams_matchers_per_query: 1000
max_cache_freshness_per_query: 10m # default: 1m
max_query_lookback: 2160h # default: 0
split_queries_by_interval: 15m # default: 30m
query_timeout: 10m # default: 5m
retention_period: 8784h # default: 744h
deletion_mode: filter-and-delete
ingester_client:
pool_config:
health_check_ingesters: false
client_cleanup_period: 15s
ingester:
lifecycler:
heartbeat_period: 5s
final_sleep: 5s # default: 0s
chunk_retain_period: 30s # default: 0s
chunk_idle_period: 1h # default: 30m
max_chunk_age: 2h
autoforget_unhealthy: false
wal:
enabled: true
flush_on_shutdown: false
replay_memory_ceiling: 1GB # default: 4GB
frontend:
max_outstanding_per_tenant: 1024 # default: 2048
querier_forget_delay: 1m # default: 0s
log_queries_longer_than: 0s
query_scheduler:
max_outstanding_requests_per_tenant: 512 # default: 100
querier_forget_delay: 1m # default: 0s
use_scheduler_ring: true # default: false
query_range:
align_queries_with_step: true # default: false
cache_results: true # default: false
results_cache:
cache:
enable_fifocache: false # default: true
default_validity: 1h
embedded_cache:
enabled: true # default: false
max_size_mb: 256 # default: 100
ttl: 1h
max_retries: 3 # default: 5
parallelise_shardable_queries: false # default: true
querier:
tail_max_duration: 30m # default: 1h
query_ingesters_within: 2h # default: 3h
max_concurrent: 10
query_store_only: false
query_ingester_only: false
multi_tenant_queries_enabled: true # default: false
chunk_store_config:
chunk_cache_config:
enable_fifocache: false
default_validity: 1h
embedded_cache:
enabled: true
max_size_mb: 100
ttl: 1h
write_dedupe_cache_config:
enable_fifocache: false
default_validity: 1h
embedded_cache:
enabled: true
max_size_mb: 100
ttl: 1h

schema_config:
configs:

from: 2020-01-01
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
chunks:
period: 168h

storage_config:
boltdb_shipper:
active_index_directory: /data/index # default: boltdb-shipper-active
shared_store: filesystem
cache_location: /data/index_cache # default: boltdb-shipper-cache
cache_ttl: 24h
filesystem:
directory: /data/chunks
index_queries_cache_config:
enable_fifocache: false
default_validity: 1h
embedded_cache:
enabled: true
max_size_mb: 100
ttl: 1h
compactor:
shared_store: filesystem
retention_enabled: true # default: false
table_manager:
retention_deletes_enabled: true # default: false
retention_period: 8784h # default: 0

system · March 21, 2024, 9:15am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problems retrieving Loki logs in Grafana on Openshift Grafana Loki	15	366	July 18, 2024
Query error: upstream request timeout Grafana Loki	5	913	July 5, 2024
No logs and no labels showing in Loki Grafana Loki loki	3	4527	July 13, 2023
Empty Results - No Matching Labels Grafana Loki	1	269	May 11, 2023
Timestamp hh:mm:ss Grafana Loki	16	1817	September 21, 2022

/loki/api/v1/series?match api request result problem

Related topics