Is there a recommended Loki configuration for high availability, similar to Grafana’s https://grafana.com/docs/grafana/latest/tutorials/ha_setup/? There is a “query frontend” feature mentioned in the documentation: https://github.com/grafana/loki/blob/v1.5.0/docs/configuration/query-frontend.md, but this assumes the presence of Kubernetes.
Hi @gbrener - I checked in with the Loki team here at Grafana Labs. We don’t have official docs on this just yet (they’re on the way), however what the community is currently doing to achieve HA is deploying Loki to Kubernetes and referencing our Ksonnet config - this helps break Loki into microservices and it’s how our team runs HA for Loki: https://github.com/grafana/loki/tree/master/production/ksonnet
Thanks @samcoren, that link is useful. It would be nice if there was a guide for running Loki in production without k8s. Any idea when the official docs might arrive?
@gbrener not entirely certain, but I’ll put a request in to see if we can get it prioritized. The team knows it’s important and there are quite a few different approaches to take even without Kubernetes - they want to make sure what we’re recommending in the documentation makes sense with upcoming release changes.
We’re really interested in Loki HA. Is there any new information/timeline on this?
Kubernetes isn’t an option for us at this time. Hopefully, eventually, but not as of today.
It’s a little frustrating that this hasn’t been addressed. Loki looks like exactly what we want, and is simple to get running in a non-ha configuration, but the documentation is at best, incomplete. Assuming k8s presupposes it’s availability, as well as expertise, neither of which are available here. I’d really like to know if I can run a distributor, ingester, and querier as separate processes on a single instance, with multiple copies for resilience.
The docs imply that this should be possible - I have a consul cluster running, and several VMS running consul agents joined to the cluster. Each VM should (I think) be able to run a trio of an ingestor, distributor, and querier; scaling independently as necessary.
I’m falling at the basic hurdle of configuring the distributor. The docs seem to suggest that I can use consul as the ring store, but specifying consul then requires a consul_config section - or does it - using “consul_config” produces an error: failed parsing config:
config/distributor.yaml: yaml: unmarshal errors:
line 13: field consul_config not found in type main.Config
Changing it to “consul” produces the same error. Using “memberlist” as the ring store seems to require having not - as the docs suggest - a “memberlist_config” section, but a “memberlist” section - this doesn’t work for consul.
So any updated (and complete!) documentation would be really gratefully received. I realise that it’s a moving target - and I hate documenting - but doing anything outside of what appears to be the only deployment case seems impossible.
HA config.
Having got (I believe) separate instances of queriers, distributors, and ingesters running on 3 VMs (so each VM runs all 3 processes for the moment), with Consul providing the ring storage (the config for that is not obvious! - it seems to work only if the ring storage is set to consul, but with no consul config block).
I’d love to know if this is a reasonable approach to HA - ie to run separate processes for each component (with autoscaling in the mix somehow), using S3 for chunk (and index, using the boltdb-shipper) storage, and with consul providing the ring storage.
Thanks.
PS for all that I’m frustrated with the docs, Loki seems like a fantastic product, and I’ve long been a fan of Grafana.
Grahamn,
Could you share the instructions to set up Loki in HA? I am struggling with the official documentation.
Regards
My email is graham@rockcons.co.uk - continue a conversation there?
Cheers,
Graham
Hi @samcoren, congrats on the recent Loki 2.0 release. Has there been any progress on HA documentation?
Apologies as previously we weren’t paying a lot of attention to this platform for support, however that has changed and moving forward we intend to use these forums as the primary support mechanism for the community.
The ksonnet configs @samcoren linked are what we use to run Loki in an HA fashion.
There is no requirement to use Kubernetes to run Loki as either a single binary or HA, but it is how our infrastructure is run and where most of our users (at least initially) exist so this is why the documentation trends this way.
The only initial requirement for running Loki HA is to setup some mechanism for communicating the “ring” information. This can be done via Consul, Etcd, or memberlist. Memberlist is nice because it doesn’t require any external components, however you do have to specify a few seed nodes in the join_address or have a DNS service that can behave similar to how a kubernetes headless service works which can return multiple records.
The docs seem to suggest that I can use consul as the ring store, but specifying consul then requires a consul_config section - or does it - using “consul_config” produces an error: failed parsing config:
The documentation structure can be confusing here, and recently we merged a PR for the Promtail docs to remove this convention. However it’s still the case in the Loki docs when you see <consul_config>
or similar <xxxx_config>
it’s a reference to another section in the documentation and not the name of an object.
An example working distributor config
distributor:
ring:
kvstore:
consul:
consistent_reads: false
host: consul.loki-prod.svc.cluster.local:8500
http_client_timeout: 20s
watch_burst_size: 1
watch_rate_limit: 1
store: consul
As a quick aside, configuring the distributors with the ring config is not strictly necessary, it’s used for calculating limits properly when running multiple distributors.
The distributors actually use the configuration section for the ingesters to get the ring information for writing data.
I’d love to know if this is a reasonable approach to HA - ie to run separate processes for each component (with autoscaling in the mix somehow), using S3 for chunk (and index, using the boltdb-shipper) storage, and with consul providing the ring storage.
There are several ways you can run Loki HA, the simplest is to run the binary multiple times and specify a shared ring config in the ingester->lifecycler->ring section.
If you would like you can also break Loki apart into “microservices” by launching the process with a -target
flag e.g. -target=distributor
This is how the kubernetes example linked previously is setup. We share the same config file between all components regardless of the -target
label to simplify things.
I know the documentation for Loki needs improvement, but we are doing our best, contributing working configs and examples in the form of PR’s to the project or even just back in this forum is hugely helpful.
We need as much help as we can get from the community and appreciate your help and understanding.
Is there a doc for this now?
I am also looking for better docs on this. Having a dedicated k8s cluster for running monitoring workloads is not feasible for us at this moment and we want Loki/Mimir to be able to ingest logs and metrics from our infrastructure even if the k8s clusters we are monitoring should experience problems.