Hi there, we have a singlestat whose jsonnet source looks like:
local maxLatencySLO =
singlestat.new(
'Maximum latency 150ms',
datasource=datasource,
span=4,
decimals=1,
sparklineShow=true,
format='percent',
valueFontSize='150%',
sparklineFull=true,
colorBackground=true,
colors=[
'#d44a3a',
'rgba(237, 129, 40, 0.89)',
'#299c46',
],
thresholds='99,99',
)
.addTarget(
prometheus.target(
'sum(rate(http_request_duration_seconds_bucket{cluster_name="$cluster_name",le="0.15"}[5m])) / sum(rate(http_request_duration_seconds_count{cluster_name="$cluster_name"}[5m])) * 100',
)
);
And it looks great:
However when it fails, I think the intuitive thing to see next is the full graph to debug. Naturally I would click it, but that’s not the way it works.
So do people just create ANOTHER panel or dashboard to drill into this sort of “Service Level Objective” metric?