K6 test run got killed prematurely

I want to run a sustainable 3 hour tests with 16000 VU. I have a very large EC2 instance (32 cpu) that host this k6 with influxdb + grafana.

k6 run --out influxdb=http://localhost:8086/testset1 --discard-response-bodies --stage 10s:10,25s:100,30s:500,1m:1000,1m:2000,2m:4000,3m:6000,5m:8000,5m:10000,5m:12000,5m:14000,180m:16000,10s:1000,10s:500,10s:10 k6script2-multihosts.js

I have repeated this test 2 times, and both failed at after 50 minutes with the following messages. I checked all the app servers that my k6 script will be calling and no 5XX found.

What could be the reason that k6 got killed prematurely? and where can I find any k6 log information?

execution: local
output: influxdb=http://localhost:8086/testset1 (http://localhost:8086)
script: k6script2-multihosts.js

duration: -,  iterations: -
     vus: 1, max: 16000

Killedng [=============>--------------------------------------------] 50m24.9s / 3h28m35s

Hi, welcome to the forum!

Which k6 version are you using (k6 version)? Also, try running with the verbose option enabled (-v), which should give you more information when the issue occurs.

At first glance this seems like an out-of-memory issue, where the process is killed by the OOM killer. Make sure swap is enabled and configured correctly on your system.

Unfortunately, there are currently a few known issues regarding memory usage in k6 for long-running tests: #1068 and #1113. Since you’re outputting to InfluxDB, you can try running with --no-thresholds --no-summary, as suggested in #1068.

Keep in mind that even if these issues didn’t exist, running a 16,000 VU test on a single machine is very memory intensive. From my personal tests, a single VU can take up around ~5MB of RAM, which would require around ~80GB for 16K VUs, so make sure you have plenty of RAM on your EC2 instance.

Besides the above, I would advise:

  • running InfluxDB on a separate EC2 instance from where k6 is running, since InfluxDB uses a lot of RAM by itself.
  • splitting the workload across EC2 instances, and running smaller chunks of the test across them. In upcoming k6 versions this type of clustering will be easier to achieve, but currently you’ll have to manage this manually. All instances could insert in the same InfluxDB, so you’d still have aggregated results.

Hope this helps and let me know if you have further questions.

Ivan

Hi Evan,

Will follow your advise to refactor our K6 to run on a separate large enough EC2 for 16000 users, and see how it goes.

Long running tests with trends is really a memory hog. My test shows it is easily using up 300GB memory with 16000 VU with trends. The only way to do really large scale test is to split up into multiple k6 without trends, and rely on external means such as influxdb to collect the metrics.

However, there is also quite a few influxdb errors I encountered. ERRO[2750] InfluxDB: Couldn’t write stats error=“{"error":"timeout"}\n”. I have already set influxdb.conf with max-body-size = 0.

The only way to do really large scale test is to split up into multiple k6 without trends, and rely on external means such as influxdb to collect the metrics.

You don’t necessarily have to turn trends off, but it’s definitely recommended to split up a resource intensive test across several k6 nodes, and yes, rely on InfluxDB for metric collection.

Regarding the InfluxDB errors, please try compiling and running k6 from the latest master (instructions here), since there were some optimizations introduced recently to address issues writing to InfluxDB and to lower memory consumption. See #1113. You can particularly try tweaking the new K6_INFLUXDB_PUSH_INTERVAL and K6_INFLUXDB_CONCURRENT_WRITES options, see the PR for details.

If you’re still having issues, consider evaluating Load Impact’s Cloud Execution service. With an upgraded subscription (contact sales@loadimpact.com) it can handle your load of 16K VUs without any issues.

Hope this helps,

Ivan

1 Like