I’ve been struggling to implement a really simple alert but the behaviour im seeing is not as expected so wanted to know if the community has any ideas of whats going wrong
Here is what I what I want to alert on
metric: argocd_app_info
condition: when health_status!=“Healthy” (aka '1)
for: 5m (when the condition is true for 5 mins or longer
I am using Grafana v9.4.4 and alerts are provisioned via file provisioning (yaml)
Here is the definition im using
- orgId: 1
name: 'ArgoCD'
folder: 'Infrastructure Alerts'
interval: 2m
rules:
- uid: xxx
title: 'ArgoCD - Application Unhealthy - Warning'
condition: C
data:
- refId: A
datasourceUid: 'xxx'
model:
datasource:
type: 'prometheus'
uid: 'xxxx'
expr: |
sum by (name, health_status) (avg_over_time(argocd_app_info{health_status!="Healthy"}[1m])) > 0
queryType: 'range'
refId: 'A'
queryType: 'range'
relativeTimeRange:
from: 600
to: 0
- refId: B
datasourceUid: '__expr__'
model:
refId: B
expression: A
reducer: 'mean'
type: reduce
datasource:
type: __expr__
uid: '__expr__'
- refId: C
datasourceUid: '__expr__'
model:
refId: C
expression: '$B > 0'
type: math
datasource:
type: __expr__
uid: '__expr__'
noDataState: OK
for: 5m
annotations:
summary: 'xxx'
description: |
xxx
owner: 'xxx'
support_team: 'xxxx'
labels:
severity: warning
Behaviour: The alert fires whenever there is ANY status == 1, regardless if it is for 1 minute or two minutes and not only after 5 mins like desired
Can someone point out the obvious mistake or give any suggestions why this is happening please?