What Grafana version and what operating system are you using?
Grafana v8.3.4 on Linux RPM
What are you trying to achieve?
To have a single multi-dimentional alert rule on multiple queries. These queries query for same metrics from several different environments.
This is desired in order to avoid duplicating same alert rule several times for dev/test/qa environments and so on.
The real use case, which I am struggling here with - “collect all not running Kafka Connect connectors from all dev, test and qa environments in a single alert rule, and send an alert for each degraded connector”.
Kafka metrics for each environment are scraped by different Prometheus instances, which represent different Data Sources in Grafana.
How are you trying to achieve it?
I configure a separate query for a time series for each environment. The metrics are different only in label values (i.e. job
and env
labels have different value). The list of labels is equal in all environments.
A metric looks something like this:
kafka_connect_connect_connector_metrics{connector="connector1", env="dev1", instance="broker1.example.com", job="dev1-kafka-connect", prometheus="dev1-monitoring/dev1-monitoring", prometheus_replica="prometheus-dev1-monitoring-0", status="running"} 1
kafka_connect_connect_connector_metrics{connector="connector2", env="dev1", instance="broker1.example.com", job="dev1-kafka-connect", prometheus="dev1-monitoring/dev1-monitoring", prometheus_replica="prometheus-dev1-monitoring-0", status="stopped"} 1
My queries look something like this:
kafka_connect_connect_connector_metrics{job="dev1-kafka-connect", status=~"^(stopped|failed)$", task="", connector_class="", connector_version="", connector_type=""}
And let’s say, I have 3 such queries, using different data sources (dev, test and qa Prometheus), which return time series from 3 different environments.
Then, I apply Reduce
operation to each time series to leave only one value for the alerting rule. I use Min
operation for this. This means, I have 3 Reduce
expressions.
Then, I do not understand how I should combine these several queries/expressions in a single “alert condition”.
What happened?
When I try to combine several expressions using another Math
expression (i.e. $A + $B + $C > 0
), such an expression returns “No Data”. Here already described a similar issue.
UPDATE: It looks like Math
returns “No Data” when any of the used queries/expressions within it, return more than one time series (is multi-dimensional). Which is complete nonsence, because using Math
is the only and the officially recommended way of how to work with multi-dimensional alerts.
Furthermore, even for single-dimensional queries, after applying Math
expression on them, the user loses the ability to use $labels
variable, because all labels disappear.
Without this combination, I cannot choose several conditions in “Define alert conditions” section.
With classic conditions, it is also not possible to use $labels
, because classic condition-based alert doesn’t support them…
What did you expect to happen?
If several queries are allowed, then these several queries should be somehow possible to be used in a single alerting rule.
It should be possible to either “combine several queries in another query” (using “special” data source, i.e. “This Alert”), so the alert then treats time series returned by all queries as a single query.
Or it should be possible to choose several alert conditions in “Define alert conditions” section.
Can you copy/paste the configuration(s) that you are having problems with?
This is more a general question, so I don’t think it’s any how helpful.
Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
No errors.
Did you follow any online instructions? If so, what is the URL?
I used only official Grafana docs, which are not in the ideal state, to be honest: