Server unresponsive after upgrade to Grafana OSS 8.5.2 from 8.2.0

lloiacono · May 17, 2022, 8:09am

After upgrading our self hosted (AWS ECS) Grafana from 8.2.0 to 8.5.2 I started noticing that Grafana was very slow, even loading homepage (’/’) would take more than 1 minute, eventually the server would not send a response at all, and would remain in that state. I have rolled back to 8.2.0 and it seems to work better but not 100% back to normal as before. After rolling back I eventually get one instance of a very slow response and then everything is back to normal.

I checked the ECS task metrics and CPU is mostly idle, always below 3%, memory usage is also fine, no network issues, RDS (Postgres) is also mostly idle and no issues there. I also set the log level to debug and re-deployed 8.5.2 but I don’t find any relevant logs that could explain the issue.

I would need help debugging this, and I would also like to know if there is anyone else experiencing this issues.

yosiasz · May 17, 2022, 1:33pm

Have you tried the good old reboot server and restart service?

lloiacono · May 17, 2022, 2:39pm

Yes, several times. Rather than rebooting what I did was stopping the ECS task and starting a new one with the same config. Also redeployed several times which has the same effect. Always the same issue with 8.5.2. Is there a way to have more details logs on what grafana is doing in the background? Is it possible to have Grafana log all the arriving HTTP requests?

yosiasz · May 17, 2022, 7:22pm

Check the grafana logs would be one way

I had the same issue, rebooted and it was all good except I lost all of my https configuration so check to see if you lost config settigs which one is supposed to save before upgrade

lloiacono · May 18, 2022, 11:08am

No luck for me, I can’t reboot the server as I’m running Grafana in ECS, what I’m doing is stopping the task and starting a new one.

I managed to get more details on the issue, it seems that the delay is on the initial connection, this is the time taken to perform the initial TCP handshake and negotiate SSL. Usually slowness here is due to congestion, the server hit a limit and can’t respond to new connections. I was wondering if it would be possible to see this in the logs.

jangaraj · May 18, 2022, 3:04pm

It works fine on my ECS. It doesn’t look like Grafana issue, but your infra issue. Invetigate it on your browser (network console - which request are slow, which times are slow,…), maybe your proxy/vpn, check your ALB Cloudwatch metrics,… - there is many moving parts until requests will reach your ecs task where can be a problem.

lloiacono · May 19, 2022, 1:40pm

I found the issue, it was indeed a problem with my infrastructure not related with Grafana. Thanks for your support

Topic		Replies	Views
Very slow performance / connection errors Grafana Cloud	5	1802	May 13, 2019
Slow Grafana since upgrade to v7 Grafana	1	372	September 29, 2020
Curl grafana server is not responding Configuration api , grafana-ui	1	493	December 23, 2021
[SOLVED] Grafana (standalone) is very slow Configuration performance	3	2894	October 24, 2021
Refresh rate is set to 5s but it's not refreshing the dashboard Grafana Cloud postgres	3	2507	May 11, 2021

Server unresponsive after upgrade to Grafana OSS 8.5.2 from 8.2.0

Related topics