401 and 404 api errors with docker and traefik

I’m running grafana in a docker swarm. The swarm is using traefik to route incoming traffic to the appropriate containers and traefik is providing ssl termination for incoming https. Grafana inside the swarm is only serving http, not https. Everything works fine up until grafana makes an api call to itself, and then it either gets a 404 or a 401 error.

For instance, the logs I get when trying to update a datasource:

t=2018-10-25T21:23:25+0000 lvl=info msg=“Request Completed” logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/1/api/v1/query status=404 remote_addr=10.255.0.35 time_ms=59 size=19 referer=https://grafana-qa.mydomain.com/datasources/edit/1,
t=2018-10-25T21:23:25+0000 lvl=info msg=“Request Completed” logger=context userId=1 orgId=1 uname=admin method=GET path=/api/frontend/settings status=200 remote_addr=10.255.0.35 time_ms=9 size=14490 referer=https://grafana-qa.mydomain.com/datasources/edit/1,
t=2018-10-25T21:23:25+0000 lvl=info msg=“Request Completed” logger=context userId=1 orgId=1 uname=admin method=PUT path=/api/datasources/1 status=200 remote_addr=10.255.0.35 time_ms=20 size=462 referer=https://grafana-qa.mydomain.com/datasources/edit/1,
t=2018-10-25T21:23:22+0000 lvl=info msg=“Request Completed” logger=context userId=1 orgId=1 uname=admin method=GET path=/api/plugins/prometheus/settings status=200 remote_addr=10.255.0.35 time_ms=6 size=1237 referer=https://grafana-qa.mydomain.com/datasources/edit/1

I’m getting the same sorts of errors in dashboards. Looking at what was supposed to be a graph of container memory usage by image, instead it’s a blank graph with red triangle in the corner, and when I get the error details I find this:

xhrStatus:“complete”

request:Object
method:“GET”

url:“api/datasources/proxy/2/api/v1/query_range?query=sum%20(%20container_memory_usage_bytes%20%7Bid%3D~%22%2Fdocker%2F.%22%2Ccontainer_label_com_docker_swarm_service_name%3D~%22ourservice_uat.%22%7D)%20by%20(container_label_com_docker_swarm_service_name)%0A&start=1540501980&end=1540503795&step=15”

response:"404 page not found "

It was working fine when I set up the container on my laptop, it’s only when I run the container in our swarm environments that this is a problem. It seems pretty clear that this is due to a configuration problem wrt the proxying, but the documentation around proxy configuration is not helpful.

I’ve tried any number of settings for the [server] config block, here is what I am currently using.

[server]
; The public facing domain name used to access grafana from a browser
domain = grafana-qa.mydomain.com

; Redirect to correct domain if host header does not match domain
; Prevents DNS rebinding attacks
enforce_domain = false

protocol = http
; http_port = 443
root_url = https://%(domain)s/
router_logging = true

Part of my confusion is that the proxy and grafana are using different ports (80 vs 3000) and protocols (https vs http) but the documentation offers no suggestion as to what to do in those cases.

Does anyone have any ideas?

This is using the grafana 5.3.1 container image with my own grafana.ini added along with some certs for ssl to the database and three plugins pre-installed (grafana-azure-monitor-datasource, grafana-clock-panel, grafana-simple-json-datasource). The swarms are all docker ce 18.0.3.1. Traefik is running their latest container.

I’ve found the issue with the 404 errors. Looking more closely at the logs, it was only /api/datasources/proxy/calls that were generating the 404 errors. I tried running the api query that was being sent to the proxy directly on prometheus and got a 404 and a weird url, so I started looking at prometheus.

My prometheus web.external-url flag was missing the http://. As a result it was not answering to api calls at http://prometheus.mydomain.com:9090/api/, instead it was answering api calls at http://prometheus.mydomain.com:9090/prometheus.mydomain.com:9090/api/.

Traefik and the root_url config were just a red herring.