I’m currently playing around with the timescaledb extension and saving the testdata to report back in grafana. Currently I’m limiting these runs to smaller number of users and length of runs so as to cut down on the large amount of data needed to be saved/processed.
I’m wondering about the possibility of cutting down data saved while maintaining the same load generated. Just for example generate 250 VUs for 30 minutes that generates 500K requests but we only submit 10% of the data to timescaledb, so if a VU sent 100 requests over a time period only 10 of those requests in that time period would be submitted for storage. So we trade off some accuracy and clarity of data for the ability to have broad views of large tests. Essentially we can generate lots of test data (easy) while limiting data saved(difficult). Perhaps this can be done easier via data summary output then through the plugin.
I hope I’m explaining this well and would be interested to hear what others think.
This is interesting, and I can’t say I’ve heard anyone mention it before.
There’s currently no native way in k6 to limit the data sent to outputs. It’s an all or nothing approach.
There’s a long open issue to avoid sending some data, but besides being slightly problematic to implement, as you can see from the discussion, it’s meant more for restricting certain metrics, not a percentage of data as you mention here.
The only way to do what you want that I can currently think of, is to implement a custom “wrapper” output that would conditionally submit only a fraction of the received data from k6 to the specific backend, depending on whatever logic you need. This is not quite trivial, and would require knowledge of Go and familiarity with our extension system, but it should be possible. If you want to give this a shot, see our guide for creating output extensions. Maybe other users would find this useful, so I encourage you to share it with the community in our Extensions forum.
The data summary output wouldn’t be of much help, since it’s essentially a built-in aggregator of the same data outputs see. But it might help you if you tag certain requests only, and then you can get aggregated stats for just those tags, but that still wouldn’t avoid the data from being processed, and everything would still be sent to Timescale.
Thank you for the welcome! I’m somewhat surprised no one mentioned it before as control of the Data Granularity has been a concern in many other tools, though I guess less now as unlimited resources is kinda assumed.
Thank you for your suggestions and I’ll have a look at the guide, though if I will have bandwidth for this project in the near future is doubtful.