Timescaledb scaleback data saved for larger tests

mattyboz · November 10, 2022, 4:34pm

Hey all,

I’m currently playing around with the timescaledb extension and saving the testdata to report back in grafana. Currently I’m limiting these runs to smaller number of users and length of runs so as to cut down on the large amount of data needed to be saved/processed.

I’m wondering about the possibility of cutting down data saved while maintaining the same load generated. Just for example generate 250 VUs for 30 minutes that generates 500K requests but we only submit 10% of the data to timescaledb, so if a VU sent 100 requests over a time period only 10 of those requests in that time period would be submitted for storage. So we trade off some accuracy and clarity of data for the ability to have broad views of large tests. Essentially we can generate lots of test data (easy) while limiting data saved(difficult). Perhaps this can be done easier via data summary output then through the plugin.

I hope I’m explaining this well and would be interested to hear what others think.

mattyboz · November 10, 2022, 4:53pm

To be a little clearer.

Does anyone know of a way to accomplish what I want to do here, if so what is best approach
If not would the ability at least be useful to others

imiric · November 14, 2022, 3:48pm

Hi Matt, welcome to the forum

This is interesting, and I can’t say I’ve heard anyone mention it before.

There’s currently no native way in k6 to limit the data sent to outputs. It’s an all or nothing approach.

There’s a long open issue to avoid sending some data, but besides being slightly problematic to implement, as you can see from the discussion, it’s meant more for restricting certain metrics, not a percentage of data as you mention here.

The only way to do what you want that I can currently think of, is to implement a custom “wrapper” output that would conditionally submit only a fraction of the received data from k6 to the specific backend, depending on whatever logic you need. This is not quite trivial, and would require knowledge of Go and familiarity with our extension system, but it should be possible. If you want to give this a shot, see our guide for creating output extensions. Maybe other users would find this useful, so I encourage you to share it with the community in our Extensions forum.

The data summary output wouldn’t be of much help, since it’s essentially a built-in aggregator of the same data outputs see. But it might help you if you tag certain requests only, and then you can get aggregated stats for just those tags, but that still wouldn’t avoid the data from being processed, and everything would still be sent to Timescale.

Good luck!

mattyboz · November 14, 2022, 9:05pm

Thank you for the welcome! I’m somewhat surprised no one mentioned it before as control of the Data Granularity has been a concern in many other tools, though I guess less now as unlimited resources is kinda assumed.

Thank you for your suggestions and I’ll have a look at the guide, though if I will have bandwidth for this project in the near future is doubtful.

Thanks again!

Topic		Replies	Views
Grafana 'k6 app' only for Cloud data? OSS Support	3	147	November 15, 2022
Long term storage of k6 results? Grafana k6	2	198	January 20, 2022
Streaming k6 test results to AWS DB (Timestream) OSS Support	4	448	November 24, 2022
What report tool use when running large number of users Converters & Integrations	2	373	July 7, 2022
How to minimize the size of the Output data Converters & Integrations	4	945	December 19, 2022

Timescaledb scaleback data saved for larger tests

Related topics