[Issue] Grafana sometimes shows "NetworkError when attempting to fetch resource." or "Failed to Fetch" error popups

Grafana version:
Helm chart version 6.56.1

  • What are you trying to achieve?
    Run Grafana in HTTPS inside a k8s cluster behind an Nginx Ingress Controller with a self signed certificate.

  • How are you trying to achieve it?

Grafana helm values:

grafana:
  replicas: 3
  resources:
    limits:
      cpu: 4000m
      memory: 8Gi
    requests:
      cpu: 1000m
      memory: 4Gi
  enabled: true
  env:
    GF_DATABASE_TYPE: mysql
    GF_DATABASE_HOST: [redacted]
    GF_DATABASE_NAME: [redacted]
    GF_DATABASE_USER: [redacted]
    GF_DATABASE_PASSWORD: [redacted]
    GF_AUTH_GENERIC_OAUTH_EMAIL_ATTRIBUTE_NAME: mail:primary
    GF_AUTH_GENERIC_OAUTH_EMAIL_ATTRIBUTE_PATH: mail
  grafana.ini:
    server:
      domain: "grafana.[domain2].com"
      root_url: "https://grafana.[domain2].com"
    live:
      max_connections: 0
    auth.generic_oauth:
      name: LDAP
      icon: signin
      enabled: true
      allow_sign_up: true
      auto_login: false
      client_id: [redacted]
      client_secret: [redacted]
      scopes: [redacted]
      empty_scopes: false
      auth_url: [redacted]
      token_url: [redacted]
      api_url: [redacted]
      tls_skip_verify_insecure: true

Nginx ingress controller ingress
(we have two domains, domain1 directly points to the ip of our load balancer while domain2 points to domain1, inside the grafana “root_url” and “domain” config variables we have domain2):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana-ingress
  namespace: monitoring
  annotations:
    nginx.org/client-max-body-size: "200m"
    nginx.org/client-proxy-body-size: "200m"
    nginx.org/proxy-read-timeout: "36000"
    nginx.org/nginx.orgproxy-send-timeout: "36000"
    nginx.org/server-snippets: |
      location /api/live {
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $http_host;
        proxy_pass http://grafana;
      }
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - grafana.[domain1].com
        - grafana.[domain2].com
  rules:
    - host: grafana.[domain2].com
      http:
        paths:
          - path: /
            pathType: ImplementationSpecific
            backend:
              service:
                name: grafana-service
                port:
                  number: 80
    - host: grafana.[domain1].com
      http:
        paths:
          - path: /
            pathType: ImplementationSpecific
            backend:
              service:
                name: grafana-service
                port:
                  number: 80

Nginx ingress controller values:

controller:
  replicaCount: 3
  service:
    annotations:
      service.beta.kubernetes.io/oci-load-balancer-internal: "true"
      service.beta.kubernetes.io/oci-load-balancer-shape: "flexible"
      service.beta.kubernetes.io/oci-load-balancer-shape-flex-min: 100
      service.beta.kubernetes.io/oci-load-balancer-shape-flex-max: 100
  enableSnippets: true
  config:
    client-body-buffer-size: 32k
    ssl-buffer-size: 32k
    large-client-header-buffers: 8 64k
    proxy-body-size: 100m
    proxy-buffer-size: 32k
  • What happened?
    When using Grafana on any browser sometimes a “NetworkError when attempting to fetch resource.” or “Failed to Fetch” error pop up shows up and the dashboard metrics disappear for a few seconds before reappearing, there’s no holes in the metrics and it’s been verified that it’s not a datasource problem.

For example in Firefox checking the console I see this when it happens:
runRequest.catchError {"data":{"message":"Unexpected error"}}
While in the Network tab I see that queries fail with this error:
NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_SECURITY, MOZILLA_PKIX_ERROR_SELF_SIGNED_CERT)

I did accept the certificate when opening Grafana on my browser, and this error seems to appear only from time to time and not always.

I disabled Grafana live because I saw that the browsers would refuse to accept the self signed certificate used for the websocket. So I was expecting to stop seeing this error now, but it keeps happening

I do not see errors in the Grafana log when these popups show up.

Is there anything else that needs to be configured in a Nginx Ingress Controller for Grafana in HTTPS? There shouldn’t be the need without Grafana live no?

1 Like

I don’t have a firm answer to this, but I have a couple of hints / things to check.

The websocket SSL stuff you observed is pretty important. Browsers differ in terms of their self-signed SSL cert acceptance policy and whether that travels across ports and protocols and such, every one is a little bit different and I haven’t looked at that in a while. One general piece of advice is that you can fetch those certs in your browser and manually “trust” them so that irrespective of how grafana is using them they won’t get blocked. You should do this. Depending on the browser you’re using, you can also check what its acceptance policy is and you may need to “re-trust” the SSL cert per port or protocol depending. I have vague memories of this being particularly painful with Firefox but it’s been a while since I had to sift through this.

The concrete errors you’re reporting suggest to me that it’s your browser (in combination with grafana async fetch patterns) messing things up here and a deeper dive into how that particular browser trusts unsigned certs is a useful thing to do.

But secondarily, intermittent network connection issues are often quite tough to debug. I don’t see anything obvious wrong with your config, but you also do have layers of indirection in the middle and it’s important to prove that the network works end-to-end, reliably. If something neither of us spotted was wrong with the intermediate software defined network, you’d often see the same things and it wouldn’t be the data source’s fault.