Hey, I had configured Loki and Promtail to grab the logs for my docker containers. I had some tags on my docker containers and was able to create labels from them no problem. My containers are web applications, so I wanted to create some labels for things like, IP address, Status Code, etc, but ever since I updated my config, I keep getting these errors in the promtail logs, and when I try to view dashboards for Loki in Grafana, I keep getting gateway timeouts. I’m guessing my configuration is bad, but can’t figure it out.
The ERROR:
level=warn ts=2022-03-04T09:42:57.65872831Z caller=client.go:349 component=client host=10.128.2.123:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Maximum active stream limit exceeded, reduce the number of active streams (reduce labels or reduce label values), or contact your Loki administrator to see if the limit can be increased"
My Promtail Config:
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://10.128.2.123:3100/loki/api/v1/push
scrape_configs:
- job_name: containers
static_configs:
- targets:
- localhost
labels:
job: containerlogs
host: ${HOSTNAME}
__path__: /var/lib/docker/containers/*/*log
pipeline_stages:
- json:
expressions:
output: log
stream: stream
attrs:
- regex:
expression: (?P<ip>((?:[0-9]{1,3}\.){3}[0-9]{1,3})).+(?P<request>(GET|POST|HEAD|PUT|DELETE|CONNECT|OPTIONS|TRACE|PATCH)).(?P<endpoint>(.+) ).+\".(?P<status>([0-9]{3}))
source: output
- json:
expressions:
tag:
source: attrs
- regex:
expression: (?P<image_name>(?:[^|]*[^|])).(?P<container_name>(?:[^|]*[^|])).(?P<image_id>(?:[^|]*[^|])).(?P<container_id>(?:[^|]*[^|]))
source: tag
- timestamp:
format: RFC3339Nano
source: time
- labels:
tag:
stream:
image_name:
container_name:
image_id:
container_id:
ip:
request:
endpoint:
status:
- output:
source: output
The lines that I added that started causing the issues:
...
- regex:
expression: (?P<ip>((?:[0-9]{1,3}\.){3}[0-9]{1,3})).+(?P<request>(GET|POST|HEAD|PUT|DELETE|CONNECT|OPTIONS|TRACE|PATCH)).(?P<endpoint>(.+) ).+\".(?P<status>([0-9]{3}))
source: output
...
ip:
request:
endpoint:
status:
Any help would be greatly appreciated!
EDIT
The error in Loki when getting the gateway timeout was:
level=error ts=2022-03-04T14:08:59.793376377Z caller=scheduler_processor.go:199 org_id=fake msg="error notifying frontend about finished query" err="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (7954289 vs. 4194304)" frontend=172.28.0.25:9095
I also found the logcli tool and ran the following:
# logcli --addr="http://localhost:3100" series {} --analyze-labels
Total Streams: 147
Unique Labels: 13
Label Name Unique Values Found In Streams
filename 45 147
endpoint 45 126
container_id 36 134
tag 36 134
image_id 30 134
container_name 21 134
image_name 19 134
ip 10 126
status 5 126
host 4 147
request 2 126
stream 2 147
job 1 147
Looking at the config, the default value for grpc_server_max_concurrent_streams
is 100, so what I did was changed the following configuration options:
grpc_server_max_concurrent_streams: 500
grpc_server_max_recv_msg_size: 10000000
grpc_server_max_send_msg_size: 10000000
Seems to have stopped the errors for now. The Grafana dashboard is slow to load, but it doesn’t throw the error anymore. Still not 100% sure if this is the fix or just a band aid for my bad Promtail config.