Originally published at deepu.tech.
This is a multi-part series where I'll discuss concurrency in modern programming languages. I will be building...
For further actions, you may consider blocking this person and/or reporting abuse
Interesting results! I didn't expect everyone to perform about the same. This is great!
I would hypothesize that this is due to the fact that network requests are mostly I/O-bound. That is to say, the CPU remains idle most of the time as it waits for the network to respond.
Therefore, the underlying runtime—namely Node.js for JavaScript, Tokio for Rust and Deno, etc.—is experimentally irrelevant for this use case, as you have shown in your data. It seems that under the hood, all runtimes manage to process the requests faster than the network/hardware can provide the bytes, hence the insignificant differences in the various time-based metrics. TL;DR: the network may be the bottleneck, not the languages.
With that said, I would be very interested in a follow-up post where you go beyond time-based metrics since they don't paint the full picture. Namely, I would like to cite Discord's case study on why they switched from Go to Rust.
In their article, the major performance gains mostly came from the absence of garbage collection, which you also briefly mentioned in your conclusion. Go's garbage-collected runtime caused large spikes in latency and CPU usage every two minutes or so, which ultimately proved to be unacceptable at Discord's scale. I highly recommend reading their thoughts on it. 👌
Anyway, what I'm trying to say is that I look forward to an investigation into other metrics beyond "requests-per-second". As Discord's engineering team has shown, this does not always paint the full picture. Data on CPU and memory usage would definitely make your series more comprehensive.
Nevertheless, this is excellent write-up!
Aye, the findings about Go match my experience as well. It's very useful for static caches (e.g. ZIP code and address data), but horrible at LRU caches and the like. If you have an upper bound on your memory usage and know you can keep that in memory on your instance, it's great and super quick. If you need to free up memory and dynamically replace cache entries, it falls apart.
Yes, I fully agree and thats why I added a disclaimer. This is a very simple benchmark, for a real world usecase there are considerations beyond this and Rust has way more benefits than concurrency to win over Go. I would choose Rust over Go anyday. And thanks for the Discourd article, I didn't see that before, its very interesting
There are two things that look fishy to me in those results:
This suggests there was a common bottleneck outside of your server implementations, and you've measured the performance of that bottleneck, not the servers. Which also means the results are probably inconclusive and you can't interpret them as "Rust has won".
I looked quickly at your code and it seems you're opening a new connection for each request. This typically adds a large amount of latency and system load to each request and might become a problem, particularly at low concurrency levels like 100.
A few suggestions for better benchmarking:
For a throughput comparison you need to verify if the servers are really working at their full speed, so you should capture CPU load. It is also good to capture other system metrics like system CPU time, cache misses, context switches and syscalls, which are often a good indicator of how efficiently the server app interacts with the system.
Cache connections and leverage the HTTP keep-alive. That makes a tremendous difference in throughout.
Play with different concurrency levels. If concurrency is too low and latency is too high, you won't get the max throughput. The server would simply wait for requests, handle them quickly and go idle waiting for more. Also switching between idle and active is costly (context switch).
In latency tests, latency median is not as interesting as a full histogram. I'd expect large differences in P99 between GCed and non-gced servers. So even if medians are very close, it doesn't mean the servers would work equally well in production. Obviously you should do latency tests at lower throughout than max, so those should be separate experiments.
Anyway I'd love to see updated results, because you seem to have put a lot of work into multiple implementations and it would be a pity if you stopped now ;)
Never run client/server benchmarks on the same computer.
The process to generate loads will inevitably impact the process to serve the requests.
Best infra for benchmarking is two independent computer hardware. Not even VMs, as they also compete for CPU resources.
Depends on how efficient the load generation tool is vs how much work on the server side is required to handle the request. You can also pin those two processes to different CPU core sets. This way one computer is enough to get meaningful results. Obviously if your don't know what you're doing, it is better to use two separate machines.
Ya in this case the server is quite simple and doesn't need too much resource that might explain why I got similar results from both. I would be interested in learning more about pinning process to cores. Do you have any resource you can recommend?
man taskset
Honestly I didn't expect people to take this so seriously or even for the post to do well. I was just wrapping up a series that was taking a lot of effort and not much interest in terms of views. But man this blowed up. Now I think I have to rework this to something better 😂
When you add 2 second delay every 10 requests you make the comparison totally meaningless. You are mesuring delays, not the code.
Reading the file in every loop mesures reading from disk, not actual program performance.
Also, ab is not a good tool for measuring. Usually you are measuring the performace of ab, not the system that can be 20 times faster than what ab can measure. Use github.com/wg/wrk instead.
When I remove the delay then go program crashes when testing with wrk: Error reading:EOF.
I was about to comment on the topic, thank you for pointing this out and not blindly trusting the internet.
The massive hint is the extremely similar performance between all languages. I'm sure the intent of this article was to help the community, but I hope the author will understand their mistake and update the results accordingly.
I have updated the benchmarks with more data. WDYT now?
I did add a disclaimer that this is a simple concurrency benchmark. I don't agree that its meaningless as I'm comparing exact same impl across languages to see if the language/platform makes any difference, sleep was added to introduce a concurrency bottleneck. This is not a HTTP performance comparison, its a concurrency comparison and for that I think AB is as good as any other tool. I'll try wrk and post the results.
Instead of sleeping 2 sec or even 200 ms, you should sleep 3-8 ms to simulate access to fast SQL server. Then you would have meaningful request rate.
I have updated the benchmarks with more data. WDYT now?
I'm sorry but it's one more time a biased argumentation for rust. Speaking about the threading control is OK, but what can we see in the reality ? You have no native access to http library with rust, you have more lines of code to type and it's not as simple to read than the Go TCP version.
More, you say that you haven't as much as control for threading with go... But goroutines are made to use concurrency or threading without the need to develop the switch. Then if you want threading and control you can use C inside Go or there are packages for this, so you can avoid goroutines.
Rust is cool for memory management and why not low level development like in Linux kernel. But it's way more complicated than Go to develop such http service. Instead of switching to rust, I prefer to ask to Go creator to help on fixing garbage collection control.
And leaving go managing concurrency and threading with ease and efficiency.
Sorry for my answer but I see too many pro rust article with too much of criticism for Go.
The biggest selling point of Rust is IMHO fearless concurrency with guarantee for no data races. So while Go (and JS and Java) programs may appear initially simpler to write, because they give a bit more freedom to a programmer, at the end of the day they are often not as easy to reason about. It is trivial to guarantee that a piece of code won't be called concurrently in Rust, I can see explicitly what is allowed to run concurrently and what not and if I try to invoke a non-thread safe code accidentally in multithreaded context, it simply won't compile. Fixing a compile time error vs fixing code failing once a week in production under heavy load only - the choice is pretty obvious to me.
Exactly and that why I said Rust is better for multi-threading. Performance is just added bonus
There are tools in Go to check race condition.
The fact is that there are two paradigms, not a better one over the other.
Rust is not suitable for developing REST APIs, at least not as easily as with Go or even Python. Rust is very cool for developing low-level tools with increased control of memory management, in the case of low CPU cost applications.
But when we develop an HTTP application with a very complex management of coroutines to manage SSEs, with messages coming from different routines, Rust becomes purely and simply infernal.
Rust has its advantages, but you have to keep in mind that other languages are not outdone. I don't see myself developing machine learning in Go or Rust - I don't develop kernel modules in JS, and I definitely don't do REST APIs in Rust or C.
And for so many reasons, I'd need a multi-step article to demonstrate that Rust doesn't fit so well in many areas.
So, having made a few REST services with Rust, I can say that it's fine for it. I just generated the stub code with the OpenAPI CLI generator from a spec file, and then implemented the business logic as I would do in most languages.
The main disadvantage to Rust is that it's more difficult to learn. Ofcourse, that is just my opinion. But being more difficult, it is also more expensive to hire competent devs to maintain your application once you, the master programmer, have finished it.
I think I could have made the same services with Node or Python in 25% of the time, with no fear of data races, due to the nature of the services. Also, I/O to the cloud provider would be the bottleneck in most of the applications, not time spent in logic.
So my take away after an enjoyable 16 months of exclusively programming Rust is that it is not the tool for everything. If you are writing an MPEG encoder, or a scientific calculation library, it would be great, but if you are writing wrappers for other services, there are better languages with cheaper development costs.
I agree. It's definitely not suitable for everything
It might be interesting to also measure the resource use while the test is running. I did something like that earlier between Java and Rust. Where for cpu it was pretty comparable, but memory use with Rust was much lower.
Rust uses way less resources. Actually that would be an interesting metric to look at. I know from expereince that Rust uses way less memory than all others for same stuff. I worte KDash in Rust, which is way more graphically intensive than kubectl, but its still uses 6-7 times less memory than kubectl. For memory usage my bet would be Rust < Go < Deno < Node.js < Java
Not mentioned but Elixir parallelism is something else for it being based on the Erlang VM called BEAM.
Referenced from this article
The concurrency system has been tested with a single really buff machine to handle 2 millions concurrent websockets.
Also in this Erlang VM there's the OTP system you can do cron job internally, have caching without Redis and have processes restart when their parent process notice they did crash.
How did you compile each program?
Was compiled using their native compilers in production mode of available
When publishing benchmarks you have to give exact steps to repeat, so exact command line parameters and compiler flags used to compile should be given.
Also this:
Suggests that something is way off either with your setup or your code. Async Rust and async Go are capable of running hundred thousands concurrent connections.
All the commands can be found in the code repository mentioned. And for breakdown at 2000 concurrency, yes it's possible that the code is a problem. Do you see anything obvious?
First, I thought, It could also be the tool used itself which fails at those rates but then Node.js with multiple workers seems to work better so I'm not sure anymore
Updates to benchmark testing were great!
I would recommend also testing servers using one (or more) computer as client and another as server. For example, you were testing localhost connections only which doesn't represent real world performance with real sockets that well. In addition, wrk was running on 8 CPU cores so unless you had reserved additional identical number of physical cores for all the test servers, asynchronous servers would get extra boost compared to multi-threaded servers due not over-booking the CPU that badly.
With real sockets I'd expect the server with lowest latency (Rust with async + multi-threaded) to get the best results.
If your benchmark software supports it, usually a better way to test servers is to decide timeout for a request (say 50 ms) and then test how many request/s you can execute until you start to get timeouts. Some server software is really unfair and fails to serve older request first to keep worst case latency sensible. This kind of testing would preferably ramp the request rate slowly until the timeout is triggered for a request. Best output for this kind of test would be a graph with request/s on horizontal axis and worst case latency on vertical axis.
If you end the test on first timeout, I'd expect Java servers to fail early because those often stall during GC and if your timeout is pretty small, a single world-stop GC may be enough to ruin the run. It's possible to create Java servers that do not exhibit stalls but the most simple implementation often fails on that.
What was the actual process doing? It seems that every request had 200 ms baseline delay and for example Rust took 0.7 ms over that vs Node.js taking 4-7 ms. If you get rid of that 200 latency, Rust should be 5-10x faster than Node.js in this test.
I'm gonna look into that
I hope next time you will also include V-lang in this comparison chalange. Perhaps results will be similar but this Lang is really worth to look at.
V-lang is indeed interesting but last time I checked, the automatic memory freeing was really buggy (check the issues on GitHub for details) and if you don't free memory, RAM usage is obliviously going to explode pretty fast if you handle e.g. 100k requests.
Well many of you miss a concurrent async extension of PHP called swoole. Or workerman. It also shows impressive results closer to golang
Great post! I really like how easy is to right concurrent code in Go!
What about the memory footprint? It seems like it would be an important feature to consider
I have updated the benchmarks with more data. WDYT now?
These results are very interesting! I wonder how will they look like when the full version of Zig will come out. I'm pretty sure that it'll knock out Rust
We'll be publishing a post soon about comparing Rust and Go in 2024, stay tuned here - packagemain.tech/
Rust for Gophers with John Arundel packagemain.tech/p/rust-for-gophers
now redo this with java 24 and virtual threads