I’m trying to collect almost 3GB of log files everyday with promtail.
Promtail runs fine until some hours and starts to throw the error such as:
Jul 07 14:54:53 ip-127.0.0.1 promtail-linux-arm64[74099]: level=error ts=2023-07-07T14:54:53.188346106Z caller=positions.go:179 msg="error writing positions file" error="open/local/promtail/.promtail_positions.yaml3507440037425283515: too many open files"
Jul 07 14:54:58 ip-127.0.0.1 promtail-linux-arm64[74099]: level=error ts=2023-07-07T14:54:58.078325647Z caller=positions.go:179 msg="error writing positions file" error="open /local/promtail/.promtail_positions.yaml6187472917671051760: too many open files"
Jul 07 14:55:03 ip-127.0.0.1 promtail-linux-arm64[74099]: level=error ts=2023-07-07T14:55:03.246668661Z caller=positions.go:179 msg="error writing positions file" error="open /local/promtail/.promtail_positions.yaml5397198011226637596: too many open files"
Jul 07 14:55:08 ip-127.0.0.1 promtail-linux-arm64[74099]: level=error ts=2023-07-07T14:55:08.157937338Z caller=positions.go:179 msg="error writing positions file" error="open /local/promtail/.promtail_positions.yaml4642695468272810694: too many open files"
Jul 07 14:55:13 ip-127.0.0.1 promtail-linux-arm64[74099]: level=error ts=2023-07-07T14:55:13.048319052Z caller=positions.go:179 msg="error writing positions file" error="open /local/promtail/.promtail_positions.yaml6508309409450339181: too many open files"
Jul 07 14:55:18 ip-127.0.0.1 promtail-linux-arm64[74099]: level=error ts=2023-07-07T14:55:18.146311846Z caller=positions.go:179 msg="error writing positions file" error="open /local/promtail/.promtail_positions.yaml4642093189922208207: too many open files"
Jul 07 14:55:23 ip-127.0.0.1 promtail-linux-arm64[74099]: level=error ts=2023-07-07T14:55:23.149764251Z caller=positions.go:179 msg="error writing positions file" error="open /local/promtail/.promtail_positions.yaml3585195896468389427: too many open files"
Note:
- Not running on containers and pods but on a VM
- The config for loki and promtail are set accordingly to the size and they run on the same host (aarch64)
Earlier GitHub issues gave an impression to change the following
fs.inotify.max_user_watches = 100000
fs.inotify.max_user_instances = 512
fs.inotify.max_queued_events = 100000
ulimit -n : 1000000
Ya, changed those and was not able to identify what was the problem.
Need Help…