What's the point of Grafana Cloud Prometheus?

As far as i’m aware, Prometheus scrapes http endpoints for metrics, then Grafana queries Prometheus for data to graph… but on Grafana Cloud what is the point of a cloud-hosted Prometheus if it can’t scrape metrics?

After reading this page Prometheus | Grafana Labs it says that we are supposed to install a local version of Prometheus, and then push the data straight to Grafana.

Then this doesn’t make sense, what’s the point of the cloud based Prometheus then?

Hope someone can enlighten me

Hi,

Included in your Grafana Cloud stack is a massively scalable, high-performance, and highly available Prometheus instance.

Typically, Prometheus pulls metrics. The Grafana Cloud Agent reverses this into a push model meaning the agent installed on a monitoring target scraps metrics and pushes metrics to the remote monitoring system versus that remote monitoring system polling (or pulling) metrics from a set of defined targets, as is the case with non-agent Prometheus.

You can read here about the cloud agent :

Hope it helps

Good Luck

2 Likes

Hi,

This is a good question, I am a Prometheus and Cortex maintainer and speak to a lot of our users regularly. There are many reasons people use our hosted Prometheus service but the main reasons are:

  1. Scalable querying and storage: We heavily parallelise and cache our queries and we store 13months of data. We can answer 100s of queries at sub-second latency many of which can span months of data. While you can store years of data in Prometheus, you’re limited by a single node and would need to setup backups, restores and a proxy for caching and proper limits to make sure Prometheus doesn’t fall over at scale and you can recover in case of issues. We do that for you and we are powered by GitHub - cortexproject/cortex: A horizontally scalable, highly available, multi-tenant, long term Prometheus.

  2. Global view: Another issue with the single node model of Prometheus is how you combine data b/w different Prometheus servers. The most obvious case is when you run a Prometheus per datacenter (as recommended), and you need to perform aggregations across Prometheus servers. While this can be achieved with Federation in OSS, it is tricky to get right and can easily end up being a hassle at scale. We do that for you and provide you a single endpoint to query all your data (however many 100s of Millions of series it is :))

  3. Generally more reliable and makes running Prometheus easier: We have strict SLAs and are built in a fully HA manner. Some people prefer this over running beefy Prometheus servers locally. Using hosted Prometheus lets them run their Prometheus with only a few hours of retention which makes it easier to keep the local Prometheus happy too.

While these are the three main reasons, but it is a fair question to ask why even run Prometheus in the first place if there is hosted service. You still need something to collect the data and before we built the agent (great recommendation @wlargou!), Prometheus was the only way. Now you can install the agent which collects the data and sends it to Grafana Cloud.

Having said that, there are still some reasons people choose to run a Prometheus locally, mainly as a backup if the cloud service has issues. Local alerting is in general more reliable than alerting which depends on network (cloud alerting), but that level of reliability is a requirement for only a few customers. The local Prometheus are also typically run with very low retention to reduce costs.

I hope this answers your questions and if you have more, feel free to ask here!

Thanks,
Goutham.

3 Likes