K6 test run got killed prematurely

lito · August 30, 2019, 6:18pm

I want to run a sustainable 3 hour tests with 16000 VU. I have a very large EC2 instance (32 cpu) that host this k6 with influxdb + grafana.

k6 run --out influxdb=http://localhost:8086/testset1 --discard-response-bodies --stage 10s:10,25s:100,30s:500,1m:1000,1m:2000,2m:4000,3m:6000,5m:8000,5m:10000,5m:12000,5m:14000,180m:16000,10s:1000,10s:500,10s:10 k6script2-multihosts.js

I have repeated this test 2 times, and both failed at after 50 minutes with the following messages. I checked all the app servers that my k6 script will be calling and no 5XX found.

What could be the reason that k6 got killed prematurely? and where can I find any k6 log information?

execution: local
output: influxdb=http://localhost:8086/testset1 (http://localhost:8086)
script: k6script2-multihosts.js

duration: -,  iterations: -
     vus: 1, max: 16000

Killedng [=============>--------------------------------------------] 50m24.9s / 3h28m35s

imiric · September 2, 2019, 8:14am

Hi, welcome to the forum!

Which k6 version are you using (k6 version)? Also, try running with the verbose option enabled (-v), which should give you more information when the issue occurs.

At first glance this seems like an out-of-memory issue, where the process is killed by the OOM killer. Make sure swap is enabled and configured correctly on your system.

Unfortunately, there are currently a few known issues regarding memory usage in k6 for long-running tests: #1068 and #1113. Since you’re outputting to InfluxDB, you can try running with --no-thresholds --no-summary, as suggested in #1068.

Keep in mind that even if these issues didn’t exist, running a 16,000 VU test on a single machine is very memory intensive. From my personal tests, a single VU can take up around ~5MB of RAM, which would require around ~80GB for 16K VUs, so make sure you have plenty of RAM on your EC2 instance.

Besides the above, I would advise:

running InfluxDB on a separate EC2 instance from where k6 is running, since InfluxDB uses a lot of RAM by itself.
splitting the workload across EC2 instances, and running smaller chunks of the test across them. In upcoming k6 versions this type of clustering will be easier to achieve, but currently you’ll have to manage this manually. All instances could insert in the same InfluxDB, so you’d still have aggregated results.

Hope this helps and let me know if you have further questions.

Ivan

lito · September 3, 2019, 9:52pm

Hi Evan,

Will follow your advise to refactor our K6 to run on a separate large enough EC2 for 16000 users, and see how it goes.

lito · September 4, 2019, 6:16pm

Long running tests with trends is really a memory hog. My test shows it is easily using up 300GB memory with 16000 VU with trends. The only way to do really large scale test is to split up into multiple k6 without trends, and rely on external means such as influxdb to collect the metrics.

However, there is also quite a few influxdb errors I encountered. ERRO[2750] InfluxDB: Couldn’t write stats error=“{"error":"timeout"}\n”. I have already set influxdb.conf with max-body-size = 0.

imiric · September 5, 2019, 9:12am

The only way to do really large scale test is to split up into multiple k6 without trends, and rely on external means such as influxdb to collect the metrics.

You don’t necessarily have to turn trends off, but it’s definitely recommended to split up a resource intensive test across several k6 nodes, and yes, rely on InfluxDB for metric collection.

Regarding the InfluxDB errors, please try compiling and running k6 from the latest master (instructions here), since there were some optimizations introduced recently to address issues writing to InfluxDB and to lower memory consumption. See #1113. You can particularly try tweaking the new K6_INFLUXDB_PUSH_INTERVAL and K6_INFLUXDB_CONCURRENT_WRITES options, see the PR for details.

If you’re still having issues, consider evaluating Load Impact’s Cloud Execution service. With an upgraded subscription (contact sales@loadimpact.com) it can handle your load of 16K VUs without any issues.

Hope this helps,

Ivan

Topic		Replies	Views
K6 killed after ~50m of traffic OSS Support	3	900	December 9, 2019
What report tool use when running large number of users Converters & Integrations	2	373	July 7, 2022
Influxdb reported a large number of errors when run k6 OSS Support	3	321	March 28, 2023
K6 stops generating load for a short time during long test OSS Support	4	1317	December 22, 2021
Resource consumption on high load test OSS Support	7	1127	February 18, 2021

K6 test run got killed prematurely

Related topics