*Resolved* - Promtail "runs away" until system lockup, (5 seconds) on a near stock Debian Stable install (bookworm)

Hello everyone, first post as I am trying to build an IDS panel leveraging on-prem (no i wont pay for the cloud, ever) grafana + loki + promtail + snort3

I have all of these working BUT promtail because it just loops and chokes itself to death

VM hosted on Proxmox VE w/ 4 cpu’s and 24GB of vRAM

Version:

promtail, version 2.8.2 (branch: HEAD, revision: 9f809eda7)
  build user:       root@b7e9ca0bf6e0
  build date:       2023-05-03T11:13:57Z
  go version:       go1.20.4
  platform:         linux/amd64

The Issue:

  • Regardless if its 8GB, 64GB of RAM, as soon as the promtail service starts, the leaks/runs away until complete lockup (under 20 seconds)

What have I tried:

  • Different vCPU (now on host cpu, EPYC 7551P, which has AVX2 encoding, to be safe)
  • Different RAM
  • Various bandwidth limits in the .yaml file (no change in end result)

Current Configs:

  • Promtail .service file
[Unit]
Description=Promtail Service
After=network.target

[Service]
Type=simple
User=promtail
ExecStart=/opt/loki/promtail-linux-amd64 -config.file=/opt/loki/promtail-local-config.yaml

[Install]
WantedBy=multi-user.target
  • Current YAML file for Promtails
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://localhost:3100/loki/api/v1/push

limits_config:
  readline_rate_enabled: true
  readline_rate: 10
  readline_burst: 20



scrape_configs:
- job_name: system
  static_configs:
  - targets:
      - localhost
    labels:
      job: varlogs
      __path__: /var/log/*log

Just for reference:

  • Current Loki YAML file
auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093
  • All the other services
user@host:/opt/opensearch$ systemctl status snort3
● snort3.service - Snort Daemon
     Loaded: loaded (/etc/systemd/system/snort3.service; enabled; preset: enabled)
     Active: active (running) since Thu 2023-07-06 23:43:59 EDT; 43min ago
   Main PID: 636 (snort)
      Tasks: 2 (limit: 28769)
     Memory: 295.5M
        CPU: 33.325s
     CGroup: /system.slice/snort3.service
             └─636 /usr/local/bin/snort -c /usr/local/etc/snort/snort.lua -s 65535 -k none -l /var/log/snort -D -i ens18 -m 0x1b -u snort ->

user@host:/opt/opensearch$ systemctl status grafana-server.service 
● grafana-server.service - Grafana instance
     Loaded: loaded (/lib/systemd/system/grafana-server.service; enabled; preset: enabled)
     Active: active (running) since Thu 2023-07-06 23:44:02 EDT; 43min ago
       Docs: http://docs.grafana.org
   Main PID: 890 (grafana)
      Tasks: 20 (limit: 28769)
     Memory: 172.6M
        CPU: 5.336s
     CGroup: /system.slice/grafana-server.service
             └─890 /usr/share/grafana/bin/grafana server --config=/etc/grafana/grafana.ini --pidfile=/run/grafana/grafana-server.pid --pack>

user@host:/opt/opensearch$ systemctl status loki.service
● loki.service - Loki logging daemon
     Loaded: loaded (/etc/systemd/system/loki.service; enabled; preset: enabled)
     Active: active (running) since Thu 2023-07-06 23:43:59 EDT; 43min ago
   Main PID: 632 (loki-linux-amd6)
      Tasks: 9 (limit: 28769)
     Memory: 93.5M
        CPU: 5.291s
     CGroup: /system.slice/loki.service
             └─632 /opt/loki/loki-linux-amd64 -config.file=/opt/loki/loki-local-config.yaml

Thank you for your time,
Scott

RESOLVED!!

There was a corrupt system log file in /var/log

I ran ls -lshat /var/log and noticed that /var/log/lastlog was 531GIGABYTES
2023-07-07_16-09

For a system with less than 100GB of storage, that was impossible.

I cleared the log file by running (as root) >/var/log/lastlog and it cleared it out. Be aware this does track who all signed into the machine so there may be operational consequences to you if you remove/delete the fie.

restarted it, CPU and RAM didn’t supersan the VM to death.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.