I’d like to solicit the community’s thoughts on a migration that we’re considering. To cut a long story short, for the past few years we’ve been using InfluxDB to store and process IoT energy data (visualized in Grafana, obviously ). More or more often these days, we find ourselves coming up against various limitations inherent to Influx, especially with respect to queries that involve metadata; more on this below. In this context, we’re considering migrating to a relational database, with Timescale (or even vanilla Postgres) being the main contender.
Is this the kind of migration anyone here has done, and if so, would you mind sharing some experiences/watch-outs/gotchas? Searching online for info on this mostly turns up content published by Timescale, and I’d love to get a more independent view. In terms of gotchas, I’m primarily concerned about query performance. We’ve been particularly impressed with Influx’s ability to quickly run aggregation/selector queries on millions of datapoints, and would like to retain that with any future solution.
To give some more context on our use case and challenges, we have about a thousand energy systems around the world from which we collect operational data. Each of these systems – and corresponding datapoints – has a range of associated metadata (name, location, owner, operator, component types, operating parameters such as setpoints, etc). In order to enable us to run Influx queries that incorporate this metadata, we basically apply most of it as tags to each timeseries datapoint. This means that we usually have 20+ tags applied – and as any Influx user knows, tags get pretty cumbersome pretty quickly.
A better solution would be to keep the metadata separate from the timeseries data. By the way, our metadata is all stored in Postgres already. The following options appear to currently be on the table:
- Embrace the Influx universe fully, and use Flux to do joins on Influx data (with minimal tags) together with metadata from Postgres.
- Leverage Grafana 7’s ability to query and combine data from multiple datasources (Influx + Postgres)
- Switch to a purely relational setup, by migrating the timeseries data to Postgres/Timescale, where we can query everything with SQL
We’ve experimented with option 1, and while it mostly works, there are various kinks to be worked out – e.g. with respect to performance. Plus the Flux learning curve is still rather steep. We’ve not tried option 2 yet, but I suspect it will suffer some of the same drawbacks. Option 3 seems like the cleanest solution, but we’d like to get smart on what we’re getting into before going too deep on it. Hence this post.
Thanks in advance for any pointers!