What Grafana version and what operating system are you using?
Grafana: v8.0.5 (cbb2aa5001)
OS: Linux prometheus-dbp01-grafana-85dcc79d6-4btbw 5.4.0-1059-azure #62~18.04.1-Ubuntu SMP Tue Sep 14 17:53:18 UTC 2021 x86_64 Linux
note: additional sidecars for provision of datasources, notifier and dashboards
What are you trying to achieve?
Goal is to
use MySQL instead of SQlite3 as database to circumvent the “Database is locked” issue
have more than 1 pod so a rolling update won’t cause service interruption for the Grafana users.
How are you trying to achieve it?
We changed to 1-pod/SQlite setup of the prometheus-operation config to have two replicas and use mysql instead of SQlite3.
First pod (instance) starts up without any issues, 2nd pods errors out and is not able to start
What did you expect to happen?
I expected both instances to be able to share one database and both pods (instances) to start up.
Can you copy/paste the configuration(s) that you are having problems with?
grafana:
replicas: 2
podDisruptionBudget:
maxUnavailable: 1
grafana.ini:
database:
type: mysql
host: 10.0.8.15:3306
name: grafana-cluster-stage
user: 'grafana-cluster-stage@some-mysql-dbms'
password: ****************
Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
service init failed: Alert notification provisioning error: alert notification with same name already exists
Did you follow any online instructions? If so, what is the URL?
DB created, pointed 1st Grafana to it, Grafana created tabled, all good.
Started 2nd Grafana … not working
1st Grafana works, 2nd not
opened 06:36AM - 10 Jun 21 UTC
closed 02:39PM - 14 Oct 21 UTC
needs investigation
area/alerting/unified
**What happened**: We have enabled new alerting and routed the alerts to webhook… . We see the same notifications twice (both on fire and resolved). Although I must admit that sometimes the notification arrives only once (so the behavior seems to be unstable). We have an HA setup over Thanos and PostgreSQL.
Thanos series seem to be properly deduplicated (there is really just one time series)
Tested also locally with one instance+sqlite+testdb: no issues observed.
**What you expected to happen**: Deduplication in HA setup
**How to reproduce it (as minimally and precisely as possible)**:
**Anything else we need to know?**: I also see "firing" time in UI switching between "1h 23mins" and "13mins". So it really seems that two replicas are fighting over the alert
**Environment**:
- Grafana version: 8.0.0
- Data source type & version: Prometheus
- OS Grafana is installed on: Azure AKS, official Grafana
- User OS & Browser: not relevant
- Grafana plugins: clean install
- Others:
support for Unified Alerting in HA setups introduced in Grafana 8.2
Hi Matt.
Thanks for the instant reply. I updated our setup to go for Grafana 8.2.3 in the prometheus-operator Chart values.yaml and we are really now presented with v8.2.3.
grafana:
image:
repository: grafana/grafana
tag: 8.2.3
but sadly I get the same result of service init failed: Alert notification provisioning error: alert notification with same name already exists
. I also truncate all tables in the MySQL DB but same issue. enabled unified_alerting and disabled alerting.
Do I still miss something that needs to be enabled for alert deduplication or do I just need more patience till HA setup is in a stable state?
thanks for working with the squad to get this sorted @manfredackermann . super interesting edge case
1 Like
system
Closed
November 5, 2022, 12:22pm
6
This topic was automatically closed after 365 days. New replies are no longer allowed.