- What Grafana version and what operating system are you using?
This is a Kubernetes cluster up in Google Cloud, so I’m using the docker image grafana/grafana-oss:latest
.
- What are you trying to achieve?
I want to have a Grafana K8s cluster up in GCP to connect to some custom Prometheus metrics.
- How are you trying to achieve it?
Using some Terraform scripts to automate the deployment. We have a Kubernetes cluster, a Load Balancer, an Ingress, and healthchecks for the kubernetes deployment as well as for the ingress.
- What happened?
My Ingress is complaining about being unhealthy in that it cannot reach the healthcheck endpoint /api/health
. Investigating, I found that my Grafana deployment would restart as soon as it came up. I changed the database from sqlite to Postgres which just got me right back to where I was: Grafana is restarting whenever it comes up. Other than a lack of image renderer, it’s not giving any other errors. My question is, is the image renderer actually required to run Grafana? If so, is the “best practice” (or accepted practice) to create an entirely separate cluster just for the image renderer? If not, why would my instance just keep restarting?
- What did you expect to happen?
The deployment to be successful, my ingress to be healthy, and to be able to connect to it.
- Can you copy/paste the configuration(s) that you are having problems with?
There’s some more configuration in the tf scripts, but this is the Kubernetes deployment as well as the accompanying load balancer:
resource "kubernetes_deployment" "grafana" {
provider = kubernetes
metadata {
name = local.environment
labels = {
environment = local.environment
app = local.environment
}
}
spec {
replicas = var.replicas
selector {
match_labels = {
environment = local.environment
app = local.environment
}
}
template {
metadata {
labels = {
environment = local.environment
app = local.environment
}
}
spec {
container {
image = "grafana/grafana-oss:latest"
image_pull_policy = var.image_pull_policy
name = local.environment
port {
name = "http"
container_port = 3000
host_port = 80
}
env {
name = "GF_LOG_LEVEL"
value = "debug"
}
env {
name = "GF_DATABASE_TYPE"
value = "postgres"
}
env {
name = "GF_DATABASE_HOST"
value = "<HOST>"
}
env {
name = "GF_DATABASE_NAME"
value = "grafana"
}
env {
name = "GF_DATABASE_USER"
value = "<USER>"
}
env {
name = "GF_DATABASE_PASSWORD"
value = "<PASSWORD>"
}
readiness_probe {
http_get {
path = "/api/health"
port = 80
}
}
liveness_probe {
http_get {
path = "/api/health"
port = 80
}
}
}
}
}
}
}
resource "kubernetes_service" "load_balancer" {
provider = kubernetes
metadata {
name = "${local.environment}-load-balancer"
annotations = {
"cloud.google.com/neg" = "{\"ingress\": true}"
}
labels = {
environment = local.environment
}
}
spec {
selector = {
environment = local.environment
app = local.environment
}
port {
name = "http"
port = 80
node_port = 31337
}
port {
name = "https"
port = 443
node_port = 31338
}
external_traffic_policy = "Local"
type = "NodePort"
}
}
I want to make sure to mention that I am editing my load balancer using Gcloud to specify the request path to be /api/health.
- Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
level=debug msg="No image renderer found/installed. For image rendering support please install the grafana-image-renderer plugin. Read more at https://grafana.com/docs/grafana/latest/administration/image_rendering/"
level=debug msg="Starting background service" service=*provisioning.ProvisioningServiceImpl
level=debug msg="Starting background service" service=*manager.PluginManager
level=debug msg="Starting background service" service=*updatechecker.GrafanaService
level=debug msg="Starting background service" service=*live.GrafanaLive
level=debug msg="Starting background service" service=*pushhttp.Gateway
level=debug msg="Starting background service" service=*thumbs.dummyService
level=debug msg="Stopped background service" service=*thumbs.dummyService reason=null
level=debug msg="Starting background service" service=*service.UsageStats
evel=debug msg="Starting background service" service=*store.dummyEntityEventsService
level=debug msg="Starting background service" service=*tracing.Opentelemetry
level=debug msg="Starting background service" service=*remotecache.RemoteCache
evel=debug msg="Stopped background service" service=*store.dummyEntityEventsService reason=null
level=debug msg="Starting background service" service=*manager.SecretsService
level= msg="storage starting"
level=debug msg="Stopped background service" service=*store.standardStorageService reason=null
level=debug msg="Starting background service" service=*manager.ServiceAccountsService
level=debug msg="Started Service Account Metrics collection service"
level=debug msg="Starting background service" service=*cleanup.CleanUpService
level=debug msg="Found old rendered file to delete" folder=/var/lib/grafana/png deleted=0 kept=0
level=debug msg="Found old rendered file to delete" folder=/var/lib/grafana/csv deleted=0 kept=0
level=debug msg="Starting background service" service=*statscollector.Service
level=debug msg="Starting background service" service=*api.HTTPServer
level= msg="HTTP Server Listen" address=[::]:3000 protocol=http subUrl= socket=
level= msg="starting MultiOrg Alertmanager"
level=debug msg="kvstore value not found" orgId=0 namespace=infra.usagestats key=last_sent
level=debug msg="alert rules fetched" count=0
level=debug msg="alert rules fetched" count=0
level=debug msg="recording state cache metrics" now=2022-08-29T20:40:52.837593796Z
level=debug msg="alert rules fetched" count=0
level=info msg="Shutdown started" reason="System signal: terminated"
level=debug msg="Grafana is shutting down; stopping..."
- Did you follow any online instructions? If so, what is the URL?
No. This is based off of previous deployments that I’ve done for other services, so I was a bit surprised to see my Ingress reporting an Unhealthy status being that it works for my other services.