I have an open ended questions regarding network congestion I am seeing on client side from where k6 scripts are getting executed.
I am running a scenario for 10mins, 4500 users. Initially I am seeing bunch of requests hitting the server and server is responding properly. After sometime, even though server is responding within seconds the client is taking more than few minutes to get the response back. Because of this issue, I am not getting good RPS on the server side. There is lots of congestion happening on the client side and it’s not able to get the response quickly even though server is quick. I had to increase the request timeout from 1min to 7min to support this and not get request timeout errors.
Client is amazon R5.2xlarge and it supports network bandwidth of 10gbit network.
You can see the number of requests dropped once you hit the peak and it’s because of client network request throttling.
Do you have any thoughts how I can improve network performance. I have all the right settings and implemented whatever given in your network optimization document here Running large tests
Hi, this might be difficult to troubleshoot, but I’ll try to help you out.
If you followed the “Running large tests” guide then we can assume that both your client and server machines are setup correctly for these large-scale tests, which leads me to believe that either your k6 script is doing additional logic that interferes with the request timing, there’s something on your network (e.g. a proxy) that is limiting the overall throughput you’re expecting, or you’re reaching the limits of what your service can deliver.
Could you confirm the following?:
When you say “after sometime, even though the server is responding within seconds”, how are you measuring this? If this is from the HTTP server logs, the response time values there might be different from the overall client-server request path duration, depending on your service infrastructure and, as mentioned above, if any intermediary proxies are in use. Make sure that you’re connecting directly to the service instead of via a proxy or CDN.
Can you post a complete or edited out sample script that you’re using? I’m particularly interested in the default function your open_model scenario is using. With arrival-rate executors the executed function should be simple and preferably only make a single request. If you have some additional logic or use sleep() then that will impact the overall request rate.
For completeness, post the output of k6 version and the exact error messages you’re getting.
I had to increase the request timeout from 1min to 7min to support this and not get request timeout errors.
Hhmm if you’re getting request timeout errors then that implies that the server is unable to handle the amount of traffic you’re testing with, i.e. you’ve reached the limits of what the server can handle. In this case you’d have to optimize and scale the service itself, or test with a smaller amount of traffic (less VUs, duration, etc.).
@imiric Server is responding within seconds ( max < 10seconds ) but the client is taking more than 4-5ms sometimes to receive response. Let me look if there is a proxy or CDN which might be throttling requests.