Hi, I actually try to run Loki in a simple scalable deployment mode as statefulset. I have a statefulset for the read and write instances with a persistent volume. I also configured loki to use the filesystem with rook-cephfs as persitent volume to store everything except the wal directory. That is working well but I got error messages for the tsdb index:
remove /data/tsdb-index/multitenant/index_19558/1689841408-loki-write-5.tsdb: no such file or directory
The index was already deleted on a different node/pod, since the data is stored on a shared volume it can’t be deleted within this pod anymore.I read the documentation and found the information to set an index_gateway_client - unfortunately the documentation is a little bit tenuous about that component and it is not clear for me how to configure that in a simple scalable deployment mode?
// Edit:
Or to ask in another way, is it possible to run Loki in simple scalable deployment mode and use only the filesystem as storage? The helm chart for example mentioned it is not possible but I already run Loki in this mode. But I have problems with random restarts and the mentioned error messages in my log.
My Loki config looks like the code below. This config is used by two statefulsets that are configured as read or write Loki deployment, in front of the pods is a loadbalancer that send the traffic in round robbin.
auth_enabled: false
chunk_store_config:
max_look_back_period: 120h
common:
path_prefix: /data
storage:
filesystem:
chunks_directory: /data/chunks
rules_directory: /data/rules
compactor_address: http://loki-loadbalancer:3100
compactor:
working_directory: /data/compactor
shared_store: filesystem
frontend:
log_queries_longer_than: 5s
compress_responses: true
max_outstanding_per_tenant: 2048
ingester:
lifecycler:
join_after: 10s
observe_period: 5s
ring:
replication_factor: 3
kvstore:
store: memberlist
final_sleep: 0s
chunk_idle_period: 1m
wal:
enabled: true
dir: /wal
checkpoint_duration: 15m
max_chunk_age: 1m
chunk_retain_period: 30s
chunk_encoding: snappy
chunk_target_size: 1.572864e+06
chunk_block_size: 262144
flush_op_timeout: 10s
limits_config:
max_cache_freshness_per_query: '10m'
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 30m
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
# parallelize queries in 15min intervals
split_queries_by_interval: 15m
query_range:
# make queries more cache-able by aligning them with their step intervals
align_queries_with_step: true
max_retries: 5
parallelise_shardable_queries: true
cache_results: true
ruler:
enable_api: true
wal:
dir: /wal/ruler-wal
storage:
type: local
local:
directory: /data/rules
rule_path: /tmp/prom-rules
remote_write:
enabled: true
clients:
local:
url: http://prometheus:9090/api/v1/write
queue_config:
# send immediately as soon as a sample is generated
capacity: 1
batch_send_deadline: 0s
schema_config:
configs:
- from: "2023-07-01"
index:
period: 24h
prefix: index_
object_store: filesystem
schema: v12
store: tsdb
server:
http_listen_address: 0.0.0.0
grpc_listen_address: 0.0.0.0
http_listen_port: 3100
grpc_listen_port: 9095
log_level: info
storage_config:
tsdb_shipper:
active_index_directory: /data/tsdb-index
cache_location: /data/tsdb-cache
shared_store: filesystem
filesystem:
directory: /data/chunks
table_manager:
retention_deletes_enabled: true
retention_period: 120h
memberlist:
join_members: ["loki-read-headless", "loki-write-headless"]
dead_node_reclaim_time: 30s
gossip_to_dead_nodes_time: 15s
left_ingesters_timeout: 30s
bind_addr: ['0.0.0.0']
bind_port: 7946
gossip_interval: 2s
querier:
query_ingesters_within: 2h
query_scheduler:
max_outstanding_requests_per_tenant: 1024
Edit2:
I configured Loki now like the following:
- Each write pod deployed through the statefulset has a dedicated volume for the index, that means that every write pods writes and stores own index files
- The read pods have a shared volume and store the cache of the index, that means all read pods access the same cache files
Till now I don’t see index deletion errors anymore, can I be sure, that with every read request the whole index is accessed?