Problems retrieving Loki logs in Grafana on Openshift

Hi

Not sure if this is the right forum, but given it relates to the user guides for the Loki operator, I will give it a try.

I have been trying to setup the Loki datasource in grafana on Openshift (version 4.12) by following the examples found here: Connect Grafana to an in-cluster LokiStack - Loki Operator.
(there is a dead link in the documentation, but the yaml file can be found here: loki/addon_grafana_gateway_ocp_oauth.yaml at main · grafana/loki · GitHub)

This works very well for the version of Grafana given in the yaml file (image: docker.io/grafana/grafana:8.5.6), but I have not been able to make it work with any 9.x version of Grafana, so something seems to break between the two major versions.

With version 9.5.3 of Grafana, I am able to retrieve the labels, when I explore the Loki datasource in Grafana, but no data is shown.

I have looked through the release notes for Grafana and enabled more logging in Grafana without being able to figure out what goes wrong. That said I’m both a beginner in regard to Loki and Grafana, so I might have missed something.

Has anybody managed to get it working with a 9.x version of Grafana?

If so any pointers that could lead to a solution would be appreciated.

That sounds rather strange. If you are able to get labels then the data source is working. Do you have some screenshots perhaps?

Hi tonyswumac

Thanks for the reply. Here are some screenshots. The two first is by using Grafana 9.5.3 where the labels are retrieved but no data was found. The last one is using Grafana 8.5.6 (the image from the example), and here the same query returns data.



Hi Again

I noticed that if I press the “live” button I get data (don’t know why I haven’t done that before). It therefore seems to be a problem related to the time range when I do a search.

What’s the version of Loki?

If you suspect time range being the problem, have you tried specifying different time range and see if you get data? When you press “live”, what’s the latest timestamp on the last log? Does new log continue to come in?

Thanks for the reply.

I tried changing the time range, without any luck (including a time range a day in the past to see if its the time offset, that are causing the problem) . I still don’t get data this way.

The entries retrieved when doing live looks fine. At 7:42 local time I got the following entry:

2023-06-13 07:42:29 {“@timestamp”:“2023-06-13T05:42:28.624755492Z”,“file”:“/var/log/pods/infrastructure-tooling_cluster-configurator-bff-7f75d47cd5-pvmd5_1e38df5a-85c9-44fc-ad4d-aeb931e49b50/cluster-configurator-bff/0.log”, …

My time is as you can see offset 2 hours from UTC/Zulu time. The entries keeps comming as they should.

What’s the version of Loki you are running?

We updated to the lastest version of Red Hats Loki Operator yesterday (v5.7.2), which didn’t make a difference

As far as we can see of the images, this should result in Loki version 2.8.x being installed at the cluster. Given that we are not sure we have asked Red Hat. I will get back when we have an answer from Red Hat.

Couple of things to try in the mean time I can think of:

  1. Try dong an API call directly to Loki endpoint as a sanity check.
  2. Create a new data source for the same Loki endpoint in Grafana 9 as sanity check.
  3. Are you using any proxy in front of Loki? If so I’d check the logs to see if anything is being routed incorrectly.

Can you share a screenshot of what your data source looks like?

Hi again

Direct API call:

We tried doing direct api calls, and it seems to work. Only difference in our api calls and the way grafana do them, are a) that we go through an Openshift route and b) that I authenticate using a token while Grafana currently is configured to use authentication with CA cert (see screenshot below). Given that we get the labels. and things work in Grafana 8.5.6, this is what I would have expected. For example we tried:

https://logging-loki-openshift-logging.apps.c03x.paas.corp.jyskebank.net/api/logs/v1/application/loki/api/v1/query_range?query={+log_type%3D"application"+}+|+json&start=1686811511525000000&end=1686815111525000000&limit=100&direction=backward

New datasource

I create the datasource using datasource provisioning (as done in the example in the Loki Operator documentation (see link in post above)). My current datasource definition looks like this:

    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091
      access: proxy
      basicAuth: false
      withCredentials: false
      isDefault: true
      jsonData:
        timeInterval: 5s
        tlsSkipVerify: true
        httpHeaderName1: "Authorization"
      secureJsonData:
        httpHeaderValue1: "Bearer ${PROMETHEUS_ACCESS_TOKEN}"
      editable: false
    - name: Loki - Application
      isDefault: false
      type: loki
      access: proxy
      url: https://${GATEWAY_ADDRESS}/api/logs/v1/application/
      jsonData:
        tlsAuthWithCACert: true
      secureJsonData:
        tlsCACert: ${GATEWAY_SERVICE_CA}

The environment variable is set in the deployment. Here I made a discovery. At some point in trying to get things to work, I have added an extra certificate to this environment variable. I did this because I got the following error while retrieving data (but not when retrieving labels):

Get “https://oauth-openshift.apps.c03x.paas.corp.jyskebank.net/oauth/authorize?approval_prompt=force&client_id=system%3Aserviceaccount%3Aopenshift-logging%3Alogging-loki-gateway&redirect_uri=https%3A%2F%2Flogging-loki-openshift-logging.apps.c03x.paas.corp.jyskebank.net%2Fopenshift%2Fapplication%2Fcallback%3Froute%3D%2Floki%2Fapi%2Fv1%2Fquery_range&response_type=code&scope=user%3Ainfo+user%3Acheck-access+user%3Alist-projects&state=I+love+Observatorium”: tls: failed to verify certificate: x509: certificate signed by unknown authority

It seems that a oauth flow is triggered when retrieving data. The strange thing is that the CA, for the certificate that fails, is present among the containers CA’s. As I understand it go should use these certificates.

In the 8.5.6 container it hasn’t been necessary to ad the extra certificate. Could my problems be related to this?

To be honest I have had trouble finding good detailed information regarding the different authentication methods in the datasources, so any link would be appreciated.

Proxy
We are accessing loki through a service internally on the Openshift cluster, so there shouldn’t be any proxy. But given the error above I suspect that Red Hat might be using Observatorium.

Screenshot of the datasource:

Do you know if there are any way to configure Grafana to logout all the request that the datasources do? I have already enable datasource logging, and plugin login and changed the loglevel to trace.

You can check in your browser’s developer tool (for Chrome under the network section), sometimes it gives you some information.

I haven’t used Loki on openshift before, but if you were able to authenticate to Loki using a token, can you try to configure the data source using a token as well?

I was more after more logging of the requests between the Grafana backend and loki backend.

I tried using the token as you suggested (configured as the Prometheus datasource in my previous post). This works and I get data. My problem therefore seems to be related to authentication or authorization when retrieving data from the backend (strange that I can get the labels).

Unfortunately I can’t configure the datasource with a fixed token, given I need the multi-tenancy that should be provided using the other method. But at least we now know what the problem relates to, which in itself is progress.

Yeah, might want to ask someone with more experience on Openshift. Could also be a question for the Grafana forum since it’s less likely to be directly related to Loki.

Hi tonyswumac

Thanks for trying to help - it’s appreciated.

I have gotten a lead through another channel and are working on it now. The trick seems to be to configure grafana to use the Openshift oauth server (without using the proxy), and set the oauthPassThru property in the datasource definition.

1 Like

Hi erikjb,

did you make some progress here with oauth as you hinted in your last update?
I myself am struggling with similar issues on Openshift 4.10, connecting an external Grafana to Openshift logging / Loki.

Thanks for your reply,
Thomas

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.