Group by a regex field

Grafana8 / FreeBSD 13 / Elasticsearch

In my datasource (ES) I have an email field.
I would like to only grab the TLD part of the email and group by this term (and then create panels accordingly)

How can this be achieved?
I have played around with different metric types (count + group by term) and “raw data” - and various transformation types - but nothing seems to work as I want.

So how can I grab the TLD (email domain) from an email field, and use this value to group results?

Ideally I would like to create a pie chart showing how many of the emails are from different TLDs (google.com, yahoo.com etc etc etc)

Thanks,

Please post a sample json document minus confidential stuff

Thanks for the reply.
You mean from the query inspector?

A ssmple document from elastic search

Here you are:

  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "myindex",
        "_type" : "_doc",
        "_id" : "IrhX1YIBVxDQ-osWFV1i",
        "_version" : 1,
        "_score" : 0.0,
        "_source" : {
          "message" : "[2022-08-25 16:10:26] [app_env:env] [env:www.mysite.com] mychannel.INFO: [grok-my-channel-1][user@gmail.com][][xx.yy.zz.qq][10.506483078003] my string text. [] []",
          "log" : {
            "file" : {
              "path" : "/mylogfile"
            },
            "offset" : 113716088
          },
          "host" : {
            "name" : "my_agent_host"
          },
          "ecs" : {
            "version" : "1.11.0"
          },
          "channel" : "mychannel",
          "email" : "user@gmail.com",
          "type" : "filebeat",
          "ip1" : "xx.yy.zz.qq",
          "app_env" : "env",
          "execdelay" : 10.506483078003,
          "time" : "16:10:26",
          "tags" : [
            "_geoip_lookup_failure"
          ],
          "loglevel" : "INFO",
          "fields" : {
            "document_type" : "mylog"
          },
          "input" : {
            "type" : "log"
          },
          "env" : "www.mysite.com",
          "@timestamp" : "2022-08-25T14:10:29.203Z",
          "agent" : {
            "ephemeral_id" : "8536db90-80bf-4f2a-b5a3-135a39552d61",
            "version" : "7.15.1",
            "type" : "filebeat",
            "name" : "my_agent_host",
            "hostname" : "my_agent_host",
            "id" : "a6f6c10f-3a20-4b1d-b9fe-70349b19920f"
          },
          "month" : "08",
          "year" : "2022",
          "@version" : "1",
          "geoip" : {
            "country_code3" : "ZZ",
            "latitude" : zz.zzzz,
            "coordinates" : [
              z.zzzz,
              zz.zzzz
            ],
            "postal_code" : "xxxx",
            "country_name" : "zzzzz",
            "longitude" : z.zzzz,
            "continent_code" : "XX",
            "location" : {
              "lon" : z.zzzz,
              "lat" : zz.zzzz
            },
            "timezone" : "zz/zz",
            "ip" : "xx.yy.zz.qq",
            "region_code" : "xx",
            "city_name" : "city",
            "region_name" : "region",
            "country_code2" : "XX"
          },
          "monthday" : "25"
        }
      }
    ]

I simply want to create a pie chart that will show the top email TLDs … so in this example gmail.com

1 Like

lots of issues with this blob of data, please vet? this looks more like a result of a es search

If you click on the expand icon and then after expanding click json

Here’s another example (from kibana):

{
  "_index": "myindex",
  "_type": "_doc",
  "_id": "Ev821oIBVxDQ-osWn8T4",
  "_version": 1,
  "_score": null,
  "_source": {
    "@version": "1",
    "monthday": "25",
    "channel": "mychannel",
    "email": "user@gmail.com",
    "time": "18:14:36",
    "message": "[2022-08-25 18:14:36] [app_env:env] [env:env_app] mychannel.INFO: [grok-mychannel-1][user@gmail.com][] my notice string. [] []",
    "input": {
      "type": "log"
    },
    "app_env": "env",
    "type": "filebeat",
    "ecs": {
      "version": "1.12.0"
    },
    "agent": {
      "ephemeral_id": "1407a246-185c-4162-8d23-302e8f75d928",
      "hostname": "my_agent",
      "id": "637d6eaf-9f82-432d-963c-25b0146c1e73",
      "name": "my_agent",
      "type": "filebeat",
      "version": "7.17.3"
    },
    "fields": {
      "document_type": "mylog"
    },
    "month": "08",
    "loglevel": "INFO",
    "tags": [
      "_geoip_lookup_failure"
    ],
    "env": "env_app",
    "host": {
      "name": "my_agent"
    },
    "@timestamp": "2022-08-25T18:14:39.939Z",
    "log": {
      "offset": 140590125,
      "file": {
        "path": "mylog.log"
      }
    },
    "year": "2022"
  },
  "fields": {
    "@timestamp": [
      "2022-08-25T18:14:39.939Z"
    ]
  },
  "sort": [
    1661451279939
  ]
}

In the meantime, I use logstash mutations to add another field (where I split email field, and add a new one with the email TLD.

But I would still like to know how I could have achieved this without manipulating data in ES

1 Like