Questions related to parameterizing data to not use same data

jeevananthank · April 27, 2020, 11:03am

This post is with reference to a solution described in : When parameterizing data, how do I not use the same data more than once in a test?

I have an array of data for ex., array length is 1000 and when i run the test with this command
k6 run -u 5 -i 1000 test.js

var maxIter = 200;
var data = [1000 elements];
var uniqueNum = ((__VU * maxIter) - (maxIter) + (__ITER));
console.log(data[uniqueNum]);

qn 1: how likely is that all 1000 elements in the array will be used ?
qn 2: how likely is that value of uniqueNum will not get value more than 999 ?
qn 3: does the command mean than each user will execute 200 iterations ? or it is dependant on the iteration duration? ex., VU1 completes 200 iterations but VU2 has completed 150 iterations at a time, shall VU1 proceed to run the next iteration until the maximum iteration is reached ?

imiric · April 27, 2020, 2:08pm

Hi,

to answer your third question: the number of iterations is currently shared across VUs, so if one VU completes iterations quicker than others, it will “steal” i.e. run more iterations than the 200 per VU you’d like to run here, which would lead to either collisions or undefined data lookups where uniqueNum > 999.

Answering “how likely” this will happen is more difficult, as it will depend on the iteration duration, as you mention. If all iterations took the same amount of time, it would be less likely to happen, but given that you can’t precisely control this, I’d say this method is not a reliable way of sharing data equally between VUs. Someone please correct me if I’m wrong.

In an upcoming release a per-VU-iterations executor will be introduced that allows executing an exact number of iterations per VU, see #1007. This is still a few weeks away from a public release, but you can compile and run on the new-schedulers branch to give it a try.

Hope this helps,

Ivan

mark · April 27, 2020, 4:26pm

Another workaround to maybe consider in the current implementation is just an if statement at the end of your script to prevent VUs from starting iteration 201, something like:

if (__ITER == 200) {
sleep(360); // Just some really long time that's longer than your test should take to finish
}

Or I guess you could get fancier and put it at the beginning of your default function:

if (__ITER > 200){
console.log("All iterations used up for" + __VU);
sleep(360);
}

1007 makes this easier as @imiric mentions, so only consider this for a short term thing.
Hope this helps!

jeevananthank · April 27, 2020, 5:12pm

Thanks for the reply.
@imiric I get the point that a VU may “steal” iterations. Am having a scenario where in the code in default funtcion() sends api calls to download files of different sizes. So, i think if i run the test as I mentioned k6 run -u 5 -i 1000 test.js, one or other VU might download files of lesser size and eventually the iteration duration might not be same. So, in my scenario and solution, a VU stealing iterations and a data that is used, is very likely to happen.

@mark thanks for the solution, but an if loop to pause an VU after a certain number of iterations, contradicts with the point of doing a load test on the system with 5 parallel users ? once the if condition is satisfied for 1 VU, then it will be only 4 VUs giving load to my system.

With these points, do you guys suggest any other reasonable way to run load test by using all values in data[] array and run parallely ?

And i can think of only one solution where the value of data[] array should be built such that, size of the file to be downloaded is equally distributed among the VUs. for ex., if VU1 is supposed to download 1 GB files in data[0 - 199], VU2 should also download 1 GB of files in data[200-399]

jeevananthank · May 8, 2020, 3:27pm

@mark is it not possible to kill a VU when a condition is satisfied?
If at all i use the sleep, this will affect the average iteration duration right ?

mark · May 8, 2020, 10:08pm

@jeevananthank No not currently - I believe 1007 makes this a little more straightforward though.

Yes it would impact that one metric. If you need to use that metric like that - you can work around it by creating a custom metric that just calculates iteration time between two points in your script and rely on that. e.g. get start time, then finish time - calculate the difference, add to custom metric.

Topic		Replies	Views
When parameterizing data, how do I not use the same data more than once in a test? OSS Support	18	18991	September 21, 2022
Want unique data per vus,and each loop sequentially in loop OSS Support	2	3251	March 31, 2020
How to identify if a Vuser has finished an iteration when using data parameterization OSS Support	5	688	November 16, 2021
How to use combination of "scenario.iterationInTest" by retrieving random unique rows from data set OSS Support	2	127	May 17, 2023
How to have unique data row from a CSV file to be used/assigned for each VU only in k6 Load Testing OSS Support	5	1745	July 12, 2021

Questions related to parameterizing data to not use same data

Related topics