Currently all vus create a copy of data file used in parameters of request ,eventually caused out of memory exception when I used many vus.Can I share the same data file between all vus?
Can I passed full csv file via env variable and access the field inside default function via env variable?
Hi @sanjay.bansal57, Sorry for the slow response
There is currently no way to share memory between VUs … This is planned to change but we currently have … other priorities and this also will require some … though as we need to think about distributed execution and how it will work in combination with everything else …
Currently my proposal is to:
- Lower the amount of test data you use - I have seen people with 20+mb of CSV files, I do think that there are probably good reasons for such things, but if you could lower the amount of data you need, that will be … the fastest way to solve it
- Don’t use CSV use JSON -
JSON.parse(open("data.json"))
doesn’t require papaparse and papaparse … does add additional memory usage as it needs to be loaded by all VUs, which also increases your load times. - Check that you don’t have global variable keeping data you don’t need
let rawData=open("data.json");
let data = JSON.parse(rawData);
In this code rawData
will not be garbage collected and will be kept around until the finish of the test.
If you need to do something to the rawData
or any … big objects that will not be needed I recommend putting them in a lambda like:
let data = function() {
let rawData=open("data.json");
// do something to rawData
let data = JSON.parse(rawData);
// do something to data
return finalData
}
- I recently(this Monday) tried to split data in the init context between VUs so each VU only takes 1/VUNUMBER(or at least some smaller part) amount of the data and keeps it in their memory. The problem turned out to be there is no
__VU
in the init context so … while it works I only managed to do it with random part - which did lower the memory usage from 13GB to 1GB so … this is a possibility you can do something like getting only 1/10th of the data in each VU on random so this should lower the memory usage by a lot as well
Thanks @mstoykov for your reply.
Could you please help me in understanding your point 4. How you did this? any code sample for the same?
I want to achieve 60K tps with 4 load generator machine only.
The code should’ve been (but isn’t because of technical limitation):
const maxVUs = 200;
var data;
if (typeof __VU === "undefined") { // there is an execution of the init context which is just so we know what files will be needed
var p = open("data.json");
} else { // we have __VU
data = function() {
var rawData = JSON.parse(open("data.json")); // we read and parse data.json which is just a big array
let partSize = Math.floor(rawData.length / maxVUs); // we get in how many steps we have to divide it so it is even (maybe use ceil instead of floor as this will possibly miss some values ... but with floor there will be overlap ...)
return rawData.slice(partSize*__VU, partSize*__VU+partSize); // we get only the parts for that VU
}
}
// do stuff with data
Unfortunately … __VU
is not defined in the init context even when we are actually in a VU which IMO is a bug but as previously stated there are other priorities currently and they will have effect on this so we will fix it when #1007 is merged :).
So we need to come up with some random number and this is what I propose
const maxVUs = 200;
// we don't check for __VU as it is never defined
var data = function() {
var rawData = JSON.parse(open("data.json")); // we read and parse data.json which is just a big array
let partSize = Math.floor(partSize.length / maxVUs); // we get in how many steps we have to divide it so it is even (maybe use ceil instead of floor as this will possibly miss some values ... but with floor there will be overlap ...)
let __VU = Math.floor(Math.random() * maxVUs) // just get a random VU number
return rawData.slice(partSize*__VU, partSize*__VU+partSize); // we get only the parts for that VU
}
// do stuff with data
In both cases maxVUs
needs to be defined by you as well and I would propose that given that only the second example currently works I would recommend that if you have 200 VUs on a machine to set maxVUs
to something like 20
so every VU gets 1/20 of the raw data. Obviously in this case, maxVUs
is … not correctly named so maybe rename it to dataParts
?
If you are going to separate between 4 machines and if this is applicable you can also divide the data into 4 parts between the machines
Something that I didn’t mention as it usually less of a problem when you have big data arrays that need to be loaded is that from k6 v0.26.0 there is compatibility mode option for k6 which will disable some syntax and niceties but also lowers the memory usage … significantly, for scripts that don’t use that much data.
Our benchmarks show a considerable drop in memory usage - around 80% for simple scripts, and around 50% in the case of 2MB script with a lot of static data in it.
Hope this helps you