Approaches for templating large clusters

I am setting up a grafana+prometheus install for ~2000 servers, and am getting client performance issues with some of my template-generated pages.

The basic scheme right now is giving a visitor a top level dashboard that shows aggregate stats for a cluster, providing drilldown links for each graph that allows viewing a metric on a per-node basis. This works fine until a cluster has hundreds of servers in it… the per-node pages get very sluggish as a user’s web browser struggles to render all the graphs.

This isn’t a surprise to me, but I would like to come up with a way to deal with it and googling hasn’t found me anything on the topic.

Some kind of “result limit” might work, but I think a pagination system would be ideal. Are there any tricks out there to do something like this?

Or is there a smarter way to be doing this sort of thing?



Add panels to row and let rows be collapsed by default. That way they will load first on expand.