Hello!
Been trying to figure this out for days and we’re frankly getting nowhere. Context - we’re using Grafana Cloud to monitor a bunch of EKS clusters. All clusters run v15.5.3 of the prometheus-community/prometheus Helm chart, and we define a remote_write
block inside Prometheus’ config that allows us to ship certain metrics, with certain labelled values to Grafana Cloud. So for instance:
- We’d ship
kube_node_status_condition
as is. - We’d like to ship
kube_deployment_status_replicas_ready
, but only whendeployment=coredns
, andexported_namespace=kube-system
. Similarly, to monitor the state of Prometheus on the cluster itself, we’d only like to ship this metric whendeployment=prometheus-server
andexported_namespace=monitoring
. - Much in the same vein - we’d like to only ship those instances of
container_memory_usage_bytes
whencontainer=~prometheus-server
.
The list here goes on, where there are other metrics which we’d like to selectively ship. The problem here is making this selective - we’ve tried a bunch of ways, but all we’ve managed to do is ship the metrics we want to ship - but with all their instances, not just the ones we want to keep.
Here’s what our existing remote_write
block looks like:
serverFiles:
prometheus.yml:
remote_write:
- basic_auth:
password: XXXXX
username: XXXXXX
remote_timeout: 120s
url: https://XXXXXXX
write_relabel_configs:
- action: keep
regex: >-
kube_node_status_condition|kube_deployment_status_replicas_ready|kube_daemonset_status_desired_number_scheduled|kube_statefulset_status_replicas_available|kube_pod_status_ready|container_cpu_usage_seconds_total|container_memory_usage_bytes|kube_pod_container_resource_requests|kube_pod_container_resource_limits|kubelet_volume_stats_used_bytes|kubelet_volume_stats_capacity_bytes
source_labels:
- __name__
Here are some versions of what we’ve tried inside write_relabel_configs
:
- action: keep
regex: kube_deployment_status_replicas_ready;kube-system;coredns
source_labels:
- __name__
- exported_namespace
- deployment
- action: keep
regex: kube_deployment_status_replicas_ready;monitoring;prometheus-server
source_labels:
- __name__
- exported_namespace
- deployment
- action: keep
regex: kube_node_status_condition
source_labels:
- "__name__"
- action: keep
regex: kube_deployment_status_replicas_ready{exported_namespace="^kube-system$", deployment="^coredns$"}
source_labels:
- "__name__"
- "exported_namespace"
- "deployment"
- action: keep
regex: kube_deployment_status_replicas_ready{exported_namespace="^monitoring$", deployment="^prometheus-server"}
source_labels:
- "__name__"
- "exported_namespace"
- "deployment"
- action: keep
regex: kube_deployment_status_replicas_ready.*
source_labels:
- __name__
- action: drop
regex: .+
source_labels:
- exported_namespace
- deployment
- action: keep
regex: kube_deployment_status_replicas_ready.*
source_labels:
- exported_namespace
- deployment
- __name__
- action: keep
regex: ^(kube-system|monitoring)$
source_labels:
- exported_namespace
- action: keep
regex: ^(coredns|prometheus-server)$
source_labels:
- deployment
- action: labelmap
regex: __name__|exported_namespace|deployment
None of these have worked - the result with all three versions is the same, where no metrics are pushed out to Grafana. Incidentally, we also don’t see any error logs on the cluster’s Prometheus server logs - if anything, those logs indicate that a write was successful.
Would definitely love some feedback/help on this - thanks so much in advance!