Creating Alerts

Here is my log pipeline:

{filename="/var/log/laravel/laravel.log"}|~"(?i)error"

When I try to create Alert rule I get the following error:

Failed to evaluate queries and expressions: failed to execute conditions: failed to query data: Run out of attempts while querying the server.

But when I try to run the same thing in Grafana Explore I get nice output which shows everything I need. Why can’t same be accomplished while creating rules?

Hey @kilio

Alerting rules can only be defined with metric queries. You could change your query to:

sum by (some_label) (rate({filename="/var/log/laravel/laravel.log"}|~"(?i)error"[1m]))

Loki uses Prometheus’ alerting rules code, so you can view more detail on alerting rules here:

1 Like

Hi @dannykopping,

This was very much helpful, however, while creating the Alert rule from the panel using the above provided metric query I get the following Health errors

Invalid format of evaluation results for the alert definition A: looks like time-series data, only reduced data can be alerted on.

Failed to query data: run out of attempts while querying data.

For the first error, I changed Query Type and defined time range but the second one doesn’t seem to be gone away even though I tried all the combinations with alerting if all values are null or no data received.

Which interface are you using? I’m not too familiar with the Grafana-based mechanisms for creating alerts unfortunately.

Hi @dannykopping,

Yes, I am focusing on Grafana alerts as Loki/Cortex rules can’t be defined. I facing the same problems as reported here, If Cortex/Loki managed alert Rule type is selected there is no data sources to select alongside even though Loki is added as Datasource.

I think you should ask over at the Grafana #support forum channel - we in the Loki team do not have much involvement on the frontend side of things

But technically rules (written in yaml format) placed in Loki container and ruler configured to send alerts to Alertmanager would need to work without even using Grafana right?

This is ruler config:

ruler:
  alertmanager_url: http://localhost:9093
  enable_alertmanager_v2: true
  enable_api: true
  enable_sharding: true
  ring:
    kvstore:
      store: inmemory
  rule_path: /loki/rules
  storage:
    type: local
    local:
      directory: /tmp/rules

This is rules.yaml:

groups:
    - name: should_fire
      rules:
        - alert: Error
          expr: sum by (job) (rate({filename="/var/log/laravel/laravel.log"}|~"(?i)error"[1m]))
          labels:
            category: app-logs
            severity: Critical
            environment: development
          annotations:
            action: action
            description: Application is logging errors
            impact: impact
            summary: Laravel application has error
            title: 'Alert: Application error'

I have this setup and when error is logged into laravel logs no alert is fired and I can’t understand why it is not.

In ruler config, replaced http://localhost:9093 with http://alertmanager:9093 and alert was fired to the right notification channel.

HI all, I have this error now for a couple of days and I stil have no clue why it is giving this error. I have the following dashboard where the log files are displayed at the top and a timeseries of that log file where the word “Scheduler” is looked for. I need to build an alert on this time series model, which will give an alert when the amount the word “Scheduler” is below 6 in one hour. Underneath a picture:

At this picture you could see that the data import is working of the job. When i click on the time series picture and try to make an alert. I end up in the alert page. underneath a picture:

When I press the run queries button, will it give the same error above:

Failed to evaluate queries and expressions: failed to execute conditions: failed to query data: Run out of attempts while querying the server

How could i resolve this weird error? Thing i have done:

  • Changed the ruler config file where “http:localhost:9093” to “http:alertmanager:9093” .
  • Changed the job file on promtail.yml on the server.
  • Changed the evaluate function to different time frames.

Hopefully i could work it out in the future, Help is very much appreciated!

Kind regards,

Elias

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.