I have three graph panels on my dashboard displaying the number of concurrent calls in progress on a PBX setup. One graph has a timescale of 1 hour, the second has a timescale of 24 hours, and the third has a timescale of 7 days.
Under normal circumstances, the PBX system handles up to 4 calls at once, and I generally get a Y-axis on all three graphs autoscaled at 0 to 5. So far, so good.
Yesterday someone managed to misconfigure something (on the PBX, not in Grafana), resulting in 20 simultaneous calls for a few minutes. Naturally enough, this was displayed on all three graphs, and all three auto-scaled themselves to 0 to 25 on the Y-axis.
However, after 2 hours, the “past hour” graph was still stuck with a Y-axis of 0 to 25, even though the spike of 20 calls had long scrolled off the left side of the graph and no value was higher than 3.
Today, the “past hour” graph and the “past 24 hours” graph are both still stuck with a Y-axis of 0 to 25, even though both are showing data values in the range 0 to 3 (the 20-call spike has disappeared from both).
Only the “past 7 days” graph still has the spike showing, and as expected has a Y-axis scale of 0 to 25 to accommodate it.
So, why does my “past hour” graph still have a Y-axis scale of 0 to 25, even though the data values shown in it do not exceed 3?
For reference, Grafana 5.4.4 (6.x still being unusable due to the logout bug) and InfluxDB 1.7.6.
Today, the two graphs whose Y-axes are still 0-25 even though no value in the
graphs is higher than 4, have started to flicker, or fluctuate.
“Flicker” suggests something faster than what’s happening, but basically one
of the graphs will show a Y-axis of 0-25 for several minutes, and then it’ll
shrink to 0-5 (making the graph itself much more visible and easy to read),
stay like that for several minutes, and then go back to 0-25 again.
It always fluctuates between “something sensible” (which might be 0-2, or 0-5,
depending on what values are in the graph at the time) and 0-25.
In case it helps, here’s the query I’m using for the 24-hour graph:
select sum(max) from (select max(CIP) from mqtt_consumer where(topic=‘Live’)
and $timeFilter and server!=’’ group by server,time(60s)) group by time(60s)
fill(linear)
“Override relative time” is “24h” and “Add time shift” is blank.
I do have another graph in the same dashboard showing another value over the
same 24 hours, and that’s adjusting its Y-axis perfectly sensibly.
Anyone got any clues (to what the problem is, or how to find out more)?
As the day has gone on, this appears to have stabilised. Both graphs are now
steadily showing me a sensible Y-axis, with no flickering or fluctuation to
0-25.
It seems as though the Y-axis auto-scaling mechanism is paying attention to
something in the background data which is beyond the time filter which has been
set for the graph (and no, it can’t be the default dashboard time period
setting, as that is 6 hours, and wouldn’t account for the 24 hour graph
behaving weirdly).
Sooner or later the auto-scaling mechanism “loses sight” of that extra data,
though, and settles on what really is within the time period for the graph
shown.
If anyone can tell me how to capture information which would be helpful in
tracking down how or why this behaviour occurs, I’ll be happy to do so, in
order to help the developers fix this quirk (bug?).
On Thursday 30 May 2019 at 20:49:12, Pooh via Grafana Community wrote:
For reference, Grafana 5.4.4 (6.x still being unusable due to the logout
bug) and InfluxDB 1.7.6.
On Friday 31 May 2019 at 15:38:55, Pooh via Grafana Community wrote:
In case it helps, here’s the query I’m using for the 24-hour graph:
select sum(max) from (select max(CIP) from mqtt_consumer
where(topic=‘Live’) and $timeFilter and server!=‘’ group by
server,time(60s)) group by time(60s) fill(linear)
“Override relative time” is “24h” and “Add time shift” is blank.
I do have another graph in the same dashboard showing another value over
the same 24 hours, and that’s adjusting its Y-axis perfectly sensibly.
In the last days I was following your post, but it seems that I did not do it well, I am a fool, sorry.
But I really do not know what to say about the mistake that occurred to you, everything looks good, and what I do not see I trust that you still do it well.
I think that should only be the cause if the problematic data is less than the GROUP BY interval before the start of the graph. Does that fit your observations?
select sum(max) from (select max(CIP) from mqtt_consumer where(topic=‘Live’)
and $timeFilter and server!=’’ group by server,time(60s)) group by time(60s)
fill(linear)
“Override relative time” is “24h” and “Add time shift” is blank.
The problem manifested permanently for the first 24 hours (by which I mean 24
hours after the data which would cause a scale of 0-25 had disappeared off
the left of the time axis, so 48 hours before the “now” data point), and then
disappeared intermittently over the following 24 hours.
In summary:
The rogue data point (value 20, whereas all other values over the past month
have been 4 or less) appeared at 14:30 on May 29th, and the 24-hour graph
correctly auto-scaled itself to 0-25.
By 14:31 on May 30th, the rogue data point had scrolled off the left of the 24-
hour graph, but the Y-axis scale remained at 0-25.
The scale remained at 0-25 for the next 24 hours (to 14:30 May 31st) - all
graph values during this time were less than 5.
Between afternoon and evening on May 31st, graph scaling was intermittently
0-5 (sensible) and 0-25 (wrong).
Since the evening of May 31st, the scale has been 0-5 (correct).
I still have the value 20 data point on my 7-day and 28-day graphs, both of
which remain correctly scaled at 0-25. I shall be interested to see what
happens to the 7-day graph in 4 days’ time.
Perhaps I am misinterpreting the issue you linked, but I thought that should include at most 60 seconds worth of samples off the front, whereas it seems that is is including data from a full chart span off the front. Also, since the result is that influx returns apparently wrong values then you would see the wrong value as the first point on the graph.
If you can get it to fail again (then it might be worth looking at the query results in the query inspector. The chart scaling should not be able to scale for anything that does not appear in the query results.