Experiences migrating between Influx and Timescale/Postgres?

svetb · May 28, 2020, 4:11pm

I’d like to solicit the community’s thoughts on a migration that we’re considering. To cut a long story short, for the past few years we’ve been using InfluxDB to store and process IoT energy data (visualized in Grafana, obviously ). More or more often these days, we find ourselves coming up against various limitations inherent to Influx, especially with respect to queries that involve metadata; more on this below. In this context, we’re considering migrating to a relational database, with Timescale (or even vanilla Postgres) being the main contender.

Is this the kind of migration anyone here has done, and if so, would you mind sharing some experiences/watch-outs/gotchas? Searching online for info on this mostly turns up content published by Timescale, and I’d love to get a more independent view. In terms of gotchas, I’m primarily concerned about query performance. We’ve been particularly impressed with Influx’s ability to quickly run aggregation/selector queries on millions of datapoints, and would like to retain that with any future solution.

To give some more context on our use case and challenges, we have about a thousand energy systems around the world from which we collect operational data. Each of these systems – and corresponding datapoints – has a range of associated metadata (name, location, owner, operator, component types, operating parameters such as setpoints, etc). In order to enable us to run Influx queries that incorporate this metadata, we basically apply most of it as tags to each timeseries datapoint. This means that we usually have 20+ tags applied – and as any Influx user knows, tags get pretty cumbersome pretty quickly.

A better solution would be to keep the metadata separate from the timeseries data. By the way, our metadata is all stored in Postgres already. The following options appear to currently be on the table:

Embrace the Influx universe fully, and use Flux to do joins on Influx data (with minimal tags) together with metadata from Postgres.
Leverage Grafana 7’s ability to query and combine data from multiple datasources (Influx + Postgres)
Switch to a purely relational setup, by migrating the timeseries data to Postgres/Timescale, where we can query everything with SQL

We’ve experimented with option 1, and while it mostly works, there are various kinks to be worked out – e.g. with respect to performance. Plus the Flux learning curve is still rather steep. We’ve not tried option 2 yet, but I suspect it will suffer some of the same drawbacks. Option 3 seems like the cleanest solution, but we’d like to get smart on what we’re getting into before going too deep on it. Hence this post.

Thanks in advance for any pointers!

amotl · May 28, 2020, 5:55pm

Dear @svetb,

your questions and thoughts resonate very much with us. Up until know, we took approach 1.+2. with Luftdatenpumpe, for the very same reasons you have been outlining within your post. Actually, we are using PostGIS, because the metadata contains location information, upon which we wanted to obviously use geospatial querying features which is also something InfluxDB doesn’t provide at all.

While we are pretty much satisfied with the current setup to deliver the best of both worlds to us, we have always looked into the direction of TimescaleDB since its advent as it promises to deliver all of these features out of the box. Obviously, it will become cumbersome to hold the line of keeping everything synchronized between two different DBMS on the data ingestion and population side and we actually ran into different issues regarding these very aspects.

However, we haven’t had the chance to get into this topic and will be all ears about anything which we might gain from this topic here (bookmarked and subscribed just now ;]).

Actually, we would like to use/unlock/bring together all of these features from PostgreSQL within the same datastore to get the best of all worlds:

Relational through vanilla PostgresSQL
Geospatial through PostGIS
Timeseries through TimescaleDB
Document store through JSON types, functions and operators of vanilla PostgresSQL
Full-text search through GIN and GiST indexes of vanilla PostgresSQL

While not everything might be available within Grafana from the start, we are thinking of many options how this actually can make sense. There has been a reason we’ve put quite some efforts into Grafana Map Panel, a fork of the original Grafana Worldmap Panel, in order to stretch the boundaries here and would like to see Grafana become an universal tool bringing everything together to efficiently navigate within space and time on behalf of different kinds of data and metadata.

Thanks again for bringing up that topic and sorry that we can’t provide any specific answers to your questions yet. Nevertheless, we found it appropriate to attach our thoughts here. Please let us know about any outcome of your journey.

Good luck and with kind regards,
Andreas.

nwitham · June 1, 2020, 9:40pm

I am super curious as well.

Our organization is using Influx for the past 2-3 years. We ingest 250k points per minute of IoT data (or intended to be used there eventually) and no one is sure if Influx is right for us or if there is better.

As you said, major upside is the quickly running aggregation/selector queries on millions (or billions) of datapoints.

Downside, no one here really wants to learn Flux unless we have to, and everyone knows SQL.
The idea behind having the DB structured has its plusses a minuses itself… such as restricting data ingestion and providing a logical balance. For us almost all our data is related in some fashion to a piece of equipment, so everything is in one big table which makes things REALLY EASY but bad for other reasons.

We have recently moved to the clustered Influx which short term will solve a lot of problems, but not sure if TimeScale or other DB will solve the problems better.

We’ve also found the analytical tools for Influx severely lacking. Want to know what’s going on? Well you better have been watching the DB when it happened. You cannot analyze quite as well as SQL lets you either. Something I very much want to do to troubleshoot issues without having to depend on grafana development and tools for a third party DB.

Edit-
I should also add, I was speaking privately with someone named David from Grafana and he mentioned if you needed such small measurement data (millisecond/second) then one of the few viable options is Postgres.
There might be more out there, but I personally have not found any to be contenders short term.

EditEdit -

I have reached out to TimeScaleDB and hopefully once they get back to me they can provide some insight on the migration process if they have any other clients that have done so. I am not sure if we even will persue Timescale at this point, but i’ll update you with what I hear if its anything valuable.

endersonmaia · June 2, 2020, 12:55pm

are you aware of https://www.outfluxdata.com ?

svetb · June 3, 2020, 10:14am

Thanks everyone for the input! A few quick thoughts from my side:

Yes, that’s a very pertinent point too! I am all in favor of having a query language fit for the time-series era, but on the other hand my experience with Flux so far is that even simple things require verbose and not always intuitive queries.

Would be interested to hear what concrete benefits Influx Enterprise has had for you, beyond performance (which for us is OK, even with a single - beefy - node). Is it mostly around fine-grained authorization? Or are there other potential Influx tricks I’ve missed?

Hm interesting…I wonder what the context for that comment is. Influx in principle supports nanosecond granularity, and I’ve seen successful millisecond-scale ingestion also. So this isn’t an area where I’ve seen Postgres as having advantages, but again maybe I’ve missed something.

I suppose that was primarily a response to @nwitham’s last point. I personally haven’t had the chance to check it out properly - but hope to soon. Either way, my hope is to make an informed longer-term choice ahead of a potential migration.

nwitham · June 3, 2020, 2:32pm

No, we got it mainly for stability. I know a little crazy to throw that money but we were on a single node hosting grafana, influx and a bunch of other things, it was reaching a breaking point with users. We are now on a 4 data node cluster. We have yet to transition users as we want to get a backlog of data first without spending time writing a complex data migration process

Sorry I misquoted him, nansecond is correct. That said he seemed to only suggest postgresql as a viable alternative that supports this level of ingestion and data granularity. Do you know of any others you are considering or is it only postgresql in the running?

svetb · June 5, 2020, 1:46pm

Thanks for the clarifications @nwitham, that makes a lot of sense!

Regarding the potential alternatives, it is primarily Postgres that we’re looking at currently - potentially with Timescale on top. The Timescale “Community” licensing also makes a lot of sense to us, given that it would allows us to run it in production and scale with it for the foreseeable future (with HA, etc), without sinking a lot of $$.

One other alternative that’s been on the radar is CrateDB. Haven’t looked into it sufficiently to know if it’s right for us in terms of functionality (and/or better than Postgres/Timescale). But its licensing/functionality is structured in a way such that that we’d need to go for Enterprise off the bat in order to use it in production, which is a bit of a hurdle.

Would be interested to hear of others that we should maybe be looking at?

nwitham · June 12, 2020, 3:20pm

ShookIOT/Fusion, Inmation (MongoDB), TimeScaleDB.

Basically we are currently doing 500gb of data per day and need to really know what is right for us, we don’t want to transition for a few years anyways.

Met with Timescale folks yesterday and they think they can meet a lot of our requirements, the big plus for us being better tracing tools and native SQL. Biggest downside is “unknowns” in regards to performance and that we must have a schema since we are technically schema-less in Influx right now.

akulk · June 15, 2020, 10:33pm

Hi all,

Ajay from TimescaleDB here. One of our community members recently brought this post to our attention.

@svetb - We see a lot of migrations from InfluxDB. So many in fact that we built Outflux as a general tool to make this easier (as another poster references).

amotl. This list is entirely possible within TimescaleDB, as it supports everything inherent in PostgreSQL:

Relational through vanilla PostgresSQL
Geospatial through PostGIS
Timeseries through TimescaleDB
Document store through JSON types, functions and operators of vanilla PostgresSQL
Full-text search through GIN and GiST indexes of vanilla PostgresSQL

In particular TimescaleDB + PostGIS is used quite a bit for geo-temporal data.

@nwitham Yes - SQL is much more powerful than Flux, especially when you consider that there is 0 learning curve for most developers. I hope you were able to get in touch with someone from TimescaleDB, but please let me know if not.

akulk · June 15, 2020, 10:41pm

A couple more points that might be helpful:

For migrating from InfluxDB, Outflux offers offline migration from InfluxDB to TimescaleDB.

For live migration you can use Telegraf along with the output plugin we built (https://docs.timescale.com/latest/tutorials/telegraf-output-plugin). You can configure Telegraf to act as a InfluxDB endpoint which will then forward the metrics to TimescaleDB.

Hope this help!

Topic		Replies	Views
Postgres schema for IoT sensors PostgreSQL postgres	1	1778	February 17, 2019
Question about a DB for Grafana Grafana	14	1055	July 19, 2022
Joining two queries in Grafana Grafana	0	1037	January 28, 2020
Sensor data for industrial IoT - Architectural question InfluxDB	0	301	March 21, 2021
Transitioning from InfluxDB 1.x InfluxQL to 2.x Flux query language Grafana	4	7821	July 26, 2020

Experiences migrating between Influx and Timescale/Postgres?

Related topics