I’m looking to learn how to help my self on Grafana. I recently tested this great tool out to monitor some of our servers, tested some dashboards, buildt one by trial-and-error by myself and now decided to dive deeper into it.
I moved from a test server enviroment to prodution recently and also adopted nginx rveerse proxy, https with certbot. server is running with ufw activated. Now I have done the whole migration, last issue is with my custom dashboard: I have some panels that dont show data anymore. this is mostly some checks for RAM and HDD Space usage and one uptime panel.
Now all but one panel worked before and the last one that never worked I would also like to know how to fix. I have read about using inspector several times but couldnt figure out how to do it properly. For now I give one example for you guys to better understand my panel issue:
this relay show no RAM usage anymore. The alias is correct according to prometheus targets. This might be one of the most easier fixes since I can only see several reasons:
the alias is not defined yet and I dont know how/where to do it properly. I remember when entering the alias name in code query it appeared/autocompleted but on the new envioment nothing happens so this is my hot guess.
firewall is blocking something that did not got blocked in test env without fw
maybe you have a good hint or even better a tutorial where I can figure out how to fix this by myself with the inspector, would be great to be able to fix this in the future by myself
For learning purposes, I would suggest using the explore tab (separate from the dashboard) run the same PromQL query, and inspect the results. See this screenshot for a simple example:
Note when I click “Inspect” I get tabs underneath of that (Stats, Query, JSON, data) so we can see the actual data coming back from the store when we run that particular query.
The mental model to have here is that a query runs against a data source, and that data goes into the visualization/panel.
This can go wrong if the data store doesn’t deliver any data, but it can also go wrong if it delivers not the right kind/right shape of data. For example if you tried to make a histogram out of a set of strings, it wouldn’t work, and so forth.
I suspect in your case the query is running fine and returning no data because of your time window (top bar: 5 minutes). If metric reporting stopped in a given time interval, then there’s no data in the last 5 minutes, so the result you’re seeing is accurate and the problem may be elsewhere (why is the data not reported, for example).
Using “Inspect” in the “Explore” bar will let you work out the mechanics of the query separate from the panel. Debug from the bottom up.
@davidallen5 thanks david, this already helped a lot. I ran that qery and could also select the alias from the dropdown menu and data was incoming.
Next I checked if the data (which is in bytes) shape fits to what I want to have. I copied this formula from another dashboard back then so shouldnt be a problem but I double checked if the final unit is in percent (the panel shows RAM usage %) and the units where set alright.
I already fall for the time window at another problem (nodes showing up twice) so I was aware of this one. unfortunately I dont matter what time window I select, problem remains. Data should have been collected plenty in the last 24h.
I guess when data is incoming in the inspector then the problem has to be with the panel settings but when I edit the panel I cant see what could cause it…
so this sounds like the issue may be in the panel config and have to do with the shape of the data. Best I can recommend here is to look up the docs for the visualization that you’re using, take careful note of what shape data it needs, and then check that against your settings and iterate. This if you’ve proven that the query returns data.
There are many cases where you might need to transform something; the visualization panels expect data in a very precise format. As an example, you can’t geomap data where latitude and longitude come in under the wrong names and as “string” data types, so ultimately you need to check into what the visualization panel expects and make sure you’re providing it.
One “trick” is to switch the panel to a table viz (since it will take anything) and inspect what the panel sees
thanks @davidallen5 but meanwhile I figured out: I simply forgot to configure not only the ‘application’ port 12798 in the prometheus.yml but also host metrics 9100 to get those stats these panels work now.
But hold on, there is one last panel I have that does not work and never did before on tsest lab so this one might be a special one: I have mounted a seperate cloud volume on one of my instances and migrated a db there.
I check partition usage of / here successfully like this:
mountpoint is the same as shown in df -h. I already tried to chown -r the partition but no success. Now the helpful thing is the inspector: It shows that only mountpoint / and /boot/efi is avilable from the node. How can I make the added cloud volume also available?