Apologies in advance if this is a super basic question, but I’m still new to the whole logging stack ecosystem and grappling with a lot of new vocabulary and concepts.
Here’s a summary of the environment I’ve setup so far:
Nginx configured so the access logs are JSON formatted with the details I care about, most notably user-agent strings.
Loki and Promtail (both 2.0.0) configured to index(?) those access logs
Grafana 7.3.1 with a data source pointing to the Loki instance
In the Explore view, I can run a basic query like {job="myjob"} and see all of the log entries for the selected time period. If I expand an entry, there’s a section for Log labels: and Parsed fields: and I’m trying to figure out how I filter on those parsed fields rather than just using regex against the whole line.
So something like this works to filter against the whole line:
{job="myjob"} |~ "searchvalue"
And I figured out that I can use the json parser expression(?) to turn the parsed fields into labels and query them like this:
But that feels wrong because it seems like I’m having to double parse the json. Is there a way to reference the “Parsed fields:” directly and filter on them? If so, is there a practical/performance difference between using regex on the whole line versus a parsed field?
Bonus question. Can I further parse the user-agent string such that I can query/report on specific elements within it? Everything hitting this web instance is a from a custom app sending customized user-agent strings rather than your typical messy browser user-agent and should have a consistent “key/value key/value key/value” style format.
What happens if you just remove the | json filter and try to use the label filter (the | http_user_agent ... bit) without json parsing? I am not exactly sure but in my setup the parsed fields are already filterable just fine without having to do extra | json-ing on logs already in JSON!
Can I further parse the user-agent string such that I can query/report on specific elements within it?
You could run a regex pipeline stage in your promtail config if the format is super static, or use a regexp expression in your LogQL! See the Parser Expression section here LogQL: Log query language | Grafana Loki documentation which explains the | regexp <re> syntax.
Removing the | json filter unfortunately just gives me no results anymore. In your setup, do you have promtail doing a json parsing step that turns the json fields into real labels first? I’m trying to avoid that because the Loki docs seem to advise against having too many dynamic labels.
I ended up figuring out the regex stuff, but thanks for the suggestion anyway. I ended up adding the regex processing in the promtail config and using the template functionality to re-write the json output that gets sent to Loki to include those regex processed fields as individual json fields. I still need to do the | json step on the Grafana side, but at least it’s less work after that.
Apologies as the UX around this right now is problematic.
The Parsed fields are entirely a Grafana side (browser side) interpretation of the received logs, this functionality existed prior to Loki 2.0’s support for parsing data and currently the Loki API’s don’t provide Grafana any way to differentiate what’s parsed with LogQL | json vs what it can interpret itself.
Likewise any attempt to use the parsed fields fails because again that communication doesn’t exist yet back to Loki.
We are having discussions on the best path forward here, the Parsed fields support in Grafana is used by other datasources besides Loki which make it a little tricky to improve this, and like I mentioned the Loki API’s need to be extended to allow Grafana to make some better decisions on how to display this.
For now the Parsed fields are really not very useful with Loki, and are entirely a browser side interpretation. You should use the newer Loki 2.0 features of parsing | json | http_user_agent =~ .searchvalue.``