Displaying Mimir metrics with $__rate_interval always shows "No data"

  • What Grafana version and what operating system are you using?

The standard images from docker.io:

Grafana 9.5.2
Grafana Loki 2.8.0
Grafana Mimir 2.8.0

Host is Fedora Server 38. The images are being run under podman.

Linux services02 6.2.15-300.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu May 11 17:37:39 UTC 2023 x86_64 GNU/Linux

  • What are you trying to achieve?

I’m trying to display the rate of various metrics. This works when the data source is Prometheus,
but does not work when the data source is Mimir.

  • How are you trying to achieve it?

Using the standard queries suggested by the Time Series panel. For example, here’s a simple
query picking a counter metric from Prometheus at random:

And here’s that same query wrapped in rate():

This is as expected.

Now, here’s a counter metric picked at random from a Mimir instance that’s exposing a Prometheus
API:

This already looks kind of weird to me, to be honest. The metric in question is of an HTTP server
under a mild load test, and therefore the number of total requests is increasing at a rate of
between 1 and 10 per second. That graph shows an odd shelf effect that I’m sure is not in the
actual data.

Wrapping this in a rate() fails completely:

It only really starts to work when I increase the range vector to 120s:

Going down to 110s displays odd gaps in the data. I would display an image but I’ve hit the limit for embedded media in posts.

  • What did you expect to happen?

I expect rate queries to work as well as they do with a real Prometheus instance.

  • Can you copy/paste the configuration(s) that you are having problems with?

In this configuration, I’m using a Java application manually instrumented with
the OpenTelemetry Java SDK. I’m using the OpenTelemetry Collector to write metrics
to Mimir.

Otel collector:

#----------------------------------------------------------------------
# Receivers.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

#----------------------------------------------------------------------
# Processors.

processors:
  batch:
    send_batch_max_size: 10000
    timeout: 0s
  memory_limiter:
    check_interval: 1s
    limit_percentage: 80
    spike_limit_percentage: 10

#----------------------------------------------------------------------
# Exporters.

exporters:
  prometheusremotewrite:
    endpoint: http://mimir:9009/api/v1/push
  loki:
    endpoint: http://loki:3100/loki/api/v1/push
  logging:
    verbosity: basic

#----------------------------------------------------------------------
# The service pipelines connecting receivers -> processors -> exporters.

service:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [logging, loki]
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [logging]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [logging, prometheusremotewrite]

Mimir:

multitenancy_enabled: false

blocks_storage:
  backend: filesystem
  bucket_store:
    sync_dir: /mimir/data/tsdb-sync
  filesystem:
    dir: /mimir/data/tsdb/fs
  tsdb:
    dir: /mimir/data/tsdb/db

compactor:
  data_dir: /mimir/data/compactor
  sharding_ring:
    kvstore:
      store: memberlist

distributor:
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: memberlist

ingester:
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: memberlist
    replication_factor: 1

ruler:
  rule_path: /tmp/ruler

ruler_storage:
  backend: filesystem
  filesystem:
    dir: /mimir/data/rules

server:
  http_listen_port: 9009
  log_level: error

store_gateway:
  sharding_ring:
    replication_factor: 1

activity_tracker:
  filepath: "/tmp/metrics-activity.log"

For Grafana itself, I’m using an empty grafana.ini. All servers are running in podman containers on the
same physical host.

  • Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.

None that I could see.

  • Did you follow any online instructions? If so, what is the URL?

Only the official Grafana documentation for the setup of the various containers.

And of course, it took actually asking the question to allow me to almost immediately find the answer myself, afterwards. :roll_eyes:

The issue was that the OpenTelemetry Java SDK uses a very conservative default where it only sends metrics once per minute. The example code changes this to one second, but I managed to omit that part.

Sorry for the noise!