Hello, I tried to deploy Tempo & Tempo-vulture on Kubernetes but it seems like a high number of Tempo Vulture queries are failing (404), even though ingester seems OK.
Here is a bit of context:
Custom Helm Chart Templates:
storage:
trace:
backend: gcs
gcs:
bucket_name: grafana-tempo-storage
traces:
jaeger:
# -- Enable Tempo to ingest Jaeger GRPC traces
grpc: true
# -- Enable Tempo to ingest Jaeger Thrift Binary traces
thriftBinary: false
# -- Enable Tempo to ingest Jaeger Thrift Compact traces
thriftCompact: false
# -- Enable Tempo to ingest Jaeger Thrift HTTP traces
thriftHttp: false
# -- Enable Tempo to ingest Zipkin traces
zipkin: false
otlp:
# -- Enable Tempo to ingest Open Telementry HTTP traces
http: false
# -- Enable Tempo to ingest Open Telementry GRPC traces
grpc: true
# -- Enable Tempo to ingest Open Census traces
opencensus: false
For tempo Vulture:
tempoAddress:
# -- the url towards your Tempo distributor, e.g. http://distributor
push: http://tempo-tempo-distributed-distributor
# -- the url towards your Tempo query-frontend, e.g. http://query-frontend:3100
query: http://tempo-tempo-distributed-query-frontend:3100
Ingester metrics:
Tempo Vulture Logs:
E0827 09:37:13.720721 1 main.go:217] tempo url http://tempo-tempo-distributed-query-frontend:3100/api/traces/7f77b5ead87c970307da776e48f53d01
E0827 09:37:43.720688 1 main.go:217] tempo url http://tempo-tempo-distributed-query-frontend:3100/api/traces/312c83960a99aba1078f439f1339691f
E0827 09:37:43.733165 1 main.go:220] trace not found 312c83960a99aba1078f439f1339691f
E0827 09:37:43.733193 1 main.go:115] error querying Tempo trace not found
E0827 09:38:13.720435 1 main.go:217] tempo url http://tempo-tempo-distributed-query-frontend:3100/api/traces/2ec0d3ff939e977d5f3d81178d499b0b
E0827 09:38:13.738969 1 main.go:220] trace not found 2ec0d3ff939e977d5f3d81178d499b0b
E0827 09:38:13.738997 1 main.go:115] error querying Tempo trace not found
Query Frontend Logs:
level=info ts=2021-08-27T09:36:43.72913336Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=5c085d48f9b70c94 url=/api/traces/6a4128219711c2d03bd036a55c15bf4a duration=2.174074ms response_size=0 status=404
level=info ts=2021-08-27T09:37:13.748972787Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=6ff7dfa7785bc705 url=/api/traces/7f77b5ead87c970307da776e48f53d01 duration=19.823453ms response_size=31175 status=200
level=info ts=2021-08-27T09:37:43.730438653Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=7dfd9078215a3d0a url=/api/traces/312c83960a99aba1078f439f1339691f duration=4.114527ms response_size=0 status=404
level=info ts=2021-08-27T09:38:13.736302197Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=1f2c2de172362a17 url=/api/traces/2ec0d3ff939e977d5f3d81178d499b0b duration=3.890153ms response_size=0 status=404
level=info ts=2021-08-27T09:38:43.731232202Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=430b8396d46bc714 url=/api/traces/1d1d191169694e534c925012e2057f8b duration=2.919309ms response_size=0 status=404
But once in a while, the querier succeeds to query it:
level=info ts=2021-08-27T07:59:43.74124059Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=273151edb55a0b2a url=/api/traces/15004a4f0a6d3ff47b504d2aad6719ea duration=12.496063ms response_size=4221 status=200
level=info ts=2021-08-27T08:21:13.745419617Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=58d9faad6d056415 url=/api/traces/40c7db016c97f94c6b3440cc84d1276b duration=16.035543ms response_size=19685 status=200
level=info ts=2021-08-27T08:21:43.730745961Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=723bbd08e52005e8 url=/api/traces/740c1be16a79090b6fd7658a9d6cc9eb duration=3.221735ms response_size=0 status=404
level=info ts=2021-08-27T08:23:13.733750734Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=4369729f9260c327 url=/api/traces/20041be668d2413169a69085b0af1dbe duration=3.215038ms response_size=0 status=404
level=info ts=2021-08-27T08:29:13.732850912Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=64d61ed72a7e4ee5 url=/api/traces/4ed8edabad630fbb70acc146420076f4 duration=4.500166ms response_size=0 status=404
level=info ts=2021-08-27T08:46:13.731772424Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=5326badbda65e07a url=/api/traces/30a3b3cd751cbc6f3bd64eaf11028f74 duration=3.12009ms response_size=0 status=404
level=info ts=2021-08-27T08:48:43.73743803Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=6217c14e4d43bc40 url=/api/traces/2c95635565f805fe26c24d03880def8a duration=11.208868ms response_size=12860 status=200
level=info ts=2021-08-27T08:52:13.732720859Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=6200951b87e460ef url=/api/traces/086cecfed1ce09b622f022967fa5cf2f duration=3.246412ms response_size=0 status=404
level=info ts=2021-08-27T09:01:44.836208631Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=636faee5ae8ac280 url=/api/traces/66ccbf1cff7118945ecd5f0f2a4e8758 duration=15.751986ms response_size=19375 status=200
level=info ts=2021-08-27T09:10:43.784799736Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=7a08465341a3ecb1 url=/api/traces/716e343c8d14ee9011712534887cae51 duration=57.297725ms response_size=37153 status=200
level=info ts=2021-08-27T09:11:43.837460553Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=75ea476fe1f90c53 url=/api/traces/6f36ca7e6561cc7138bae5eaa601a4d6 duration=109.769497ms response_size=11489 status=200
level=info ts=2021-08-27T09:14:13.731200855Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=3d23f042ae6fcdb3 url=/api/traces/33d9be3f677c8b1d066cdfc75ac999b4 duration=3.028284ms response_size=0 status=404
level=info ts=2021-08-27T09:21:43.732868039Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=17f6c643bf11a2d7 url=/api/traces/6207fca5ac776c64020011603cf47c4a duration=3.197556ms response_size=0 status=404
level=info ts=2021-08-27T09:37:13.748972787Z caller=frontend.go:108 tenant=single-tenant method=GET traceID=6ff7dfa7785bc705 url=/api/traces/7f77b5ead87c970307da776e48f53d01 duration=19.823453ms response_size=31175 status=200
I have no clue where the issue could be, it you can provide some investigation tips, it’s be appreciated!