It's not a secret that I'm a big fan of Elixir, so when I started doing Rust development I tried to bring some ideas from Elixir to the world of Rust. This post describes some of the tools I'm building to bring the power of Elixir to Rust.
What makes Elixir so great?
It's hard to just pick a few of them, but I believe that Elixir's biggest advantage comes from using Erlang as the underlying virtual machine, specially from these 2 properties:
- Massive concurrency
- Fault tolerance
Massive concurrency
This is something hard to explain until you experience it yourself. I learned early in my career that you should never create a thread while handling a request. Threads are heavy, expensive and too many of them can bring your whole machine down. In most cases it's enough to use a thread pool, but this approach fails once the number of concurrent tasks outgrows the number of threads in the pool.
Let's look at an example: imagine a rust application that just creates 2000 threads that wake up every 100 ms and go right back to sleep.
use std::thread;
use std::time::Duration;
fn main() {
for _ in 0..2_000 {
thread::spawn(|| loop {
thread::sleep(Duration::from_millis(100));
});
}
thread::sleep(Duration::from_secs(1_000));
}
Even though the threads don't do anything, just running this on my MacBook forces it to reboot after a few seconds. This makes it impractical to have massive concurrency with threads. There are many solutions to this problem. The one chosen by Elixir is to abstract concurrent tasks with something called Processes. They are extremely lightweight, so even running 2 million of them doesn't present a challenge.
Massive concurrency in Rust
You can achieve amazing concurrency and performance using async Rust, but working with async Rust is not as simple as writing regular Rust code and it just doesn't provide you the same features as Elixir Processes do.
After thinking for a long time how I could make something that reassembles Elixir Processes in Rust I came up with the idea to introduce an intermediate step, WebAssembly. WebAssembly is a low level bytecode specification that Rust can target. The idea was simple, instead of compiling Rust for x86-64 you would compile it to the WASM target. From there I would build a set of libraries and a WebAssembly runtime that exposes the concept of Rust Processes. Contrary to operating system processes or threads, they are lightweight with small memory footprints, fast to create and terminate, and the scheduling overhead is low. In other languages they are also known as green threads and goroutines, but I will call them processes to stay close to Elixir's naming convention.
That was the first step towards Lunatic.
Let's look at the same Rust example, but now implemented with Lunatic. At the same time we will crank up the number of concurrent processes to 20k.
use lunatic::{Channel, Process};
fn main() {
let channel: Channel<()> = Channel::new(0);
for _ in 0..20_000 {
Process::spawn((), process).unwrap();
}
channel.receive();
}
fn process(_: ()) {
loop {
Process::sleep(100);
}
}
To run this you will need to compile this Rust code to a .wasm
file first:
○ → cargo build --release --target=wasm32-wasi
Then run it with:
○ → lunaticvm example.wasm
Contrary to the previous example this runs without hiccups on my Late 2013 Macbook and the CPU utilisation is minimal, even if we are using 10x more concurrent tasks. Let's examine what is exactly happening here.
The processes spawned by Lunatic are actually taking full advantage of the power provided by async Rust. They are scheduled on top of a work stealing async executor, the same used by async-std. Calling Process::sleep(100)
will actually invoke smol's at
function.
Wait a second! How does this work without the .await
keyword, you may ask yourself. Lunatic takes the same approach as Go, Erlang and the earlier implementation of Rust based on green threads. It creates a tiny stack for executing the process and grows it when your applications needs more. This is a bit less efficient than calculating the exact stack size during compile time as async Rust is doing, but a reasonable tradeoff I would say.
Now you can write regular blocking code, but the executor will take care of moving your process off the execution thread if you are waiting, so you never block a thread.
As we saw earlier, scheduling threads is a hard task for the operating system. To replace one thread that's being executed with another one, a lot of work needs to be done (including saving all the registers and some thread state). However, switching between Lunatic Processes does only the minimal amount of work possible. With an idea pioneered by the libfringe library and using some asm! macro magic, Lunatic lets the Rust compiler figure out the minimal number of registers to be preserved during context switches. This makes scheduling Lunatic processes zero-cost. On my machine usually 1ns, equivalent to a function call.
Another benefit of scheduling the Processes in user space instead of using threads is that other applications will continue running normally on your machine, even if your app misbehaves.
Now that we saw how Lunatic allows you to create applications with massive concurrency, let's look at fault tolerance.
Fault tolerance
Maybe the most known Eralng/Elixir philosophy is "let it crash". If you are building complex systems it's impossible to predict all failure scenarios. Inevitably something is going to fail in your application, but this failure should not bring down the whole thing.
Elixir Processes are completely isolated and can only communicate through messages with each other. This allows you to design your application in a way that failure stays contained inside one process and doesn't affect the rest of them.
Lunatic provides even stronger guarantees than Erlang here.
Each Lunatic process gets their own heap, stack and syscalls.
Let's look at an example of a simple TCP echo server in Lunatic:
use lunatic::{Process, net}; // Once WASI gets networking support you will be able to use Rust's `std::net::TcpStream` instead.
use std::io::{BufRead, Write, BufReader};
fn main() {
let listener = net::TcpListener::bind("127.0.0.1:1337").unwrap();
while let Ok(tcp_stream) = listener.accept() {
Process::spawn(tcp_stream, handle).unwrap();
}
}
fn handle(mut tcp_stream: net::TcpStream) {
let mut buf_reader = BufReader::new(tcp_stream.clone());
loop {
let mut buffer = String::new();
buf_reader.read_line(&mut buffer).unwrap();
tcp_stream.write(buffer.as_bytes()).unwrap();
}
}
This application listens on localhost:1337
for tcp connections, spawns a process to handle each incoming connection and just echoes incoming lines.
You can test it using telnet
:
○ → telnet 127.0.0.1 1337
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Hello world
Hello world
The first thing you will notice is that we don't use any async
or .await
keywords, even though this application will fully utilise Rust's async IO under the hood.
Also, the tcp connection becomes fully encapsulated in the Process,
even if we called into unsafe C code that crashes:
fn handle(mut tcp_stream: net::TcpStream) {
...
unsafe { crashing_c_function() };
...
}
The crash is only contained to one connection in this case. It's not possible to implement something like this in Elixir, because if a call to a C function crashes it will take the whole virtual machine with it.
Another feature exclusive to Lunatic is the possibility to limit processes' syscall access. If we replaced the previous spawn
call with:
// Process::spawn_without_fs is not implemented yet.
Process::spawn_without_fs(tcp_stream, handle).unwrap();
any code called from inside the handle
function would be forbidden from using syscalls for filesystem access. This works also for C dependencies, because the enforcement is happening at such a low level. It allows you to express the sandboxing requirements of a Process and to use any dependency without fear. I'm not aware of any other runtime that allows you to do this.
The future
This is just a teaser of the capabilities that Lunatic will provide. There are many more features coming. Once you have this foundation, a new world of possibilities opens up. Some of the features I'm excited about:
The ability to transparently move Processes from one machine to another. The programming model relies on processes communicating through messages and if these messages are sent locally or between different computers on a network it doesn't really matter.
Hot reloading. Now that we have the WASM bytecode as an in-between step it becomes possible to just generate new JIT machine code from it and replace it while the whole system is still running.
Running complete applications compiled to WASM as a process. One example would be redirecting file read/writes from the application to tcp streams, as we are in complete charge of syscalls. The advantage here is that you are modelling the execution environment with code.
Lunatic is still in its early days, so there is a lot of development left to do. If you are excited about it or have some ideas you would like to use Lunatic for, reach out to me over email me@kolobara.com or on twitter @bkolobara.
I also want to use this opportunity to say a big thank you to the teams working on Rust, Wasmer, Wasmtime, Lucet and waSCC. It would be impossible to build Lunatic without all the hard work put into this projects.
P.S. If you would like to learn more about the magic of Erlang and Elixir, this is one of my favorite talks about it by Saša Jurić: The Soul of Erlang and Elixir. Seriously, go and watch it!
Top comments (13)
While this code looks cool (I am a fan of Elixer + Erlang). The Rust reasons for not natively supporting green threads are valid (native interop, runtime complexities, not real threads, etc.).
Using green threads for parallel compute doesn't make sense as the OS can handle only a specific amount of work. “Massive” concurrency is only valuable if you are I/O bound. The mental demarcation between async/await and threads is very valuable when you consider this limitation.
Building a green thread implementation might encourage naive implementers to use them for compute.
Is there anything in the implementation that discourages massive paralleled compute in these green threads?
Again, I think this project is cool and is some interesting code :)
I know this article uses Rust but if you look closer at Lunatic project its about WASM not just Rust so in future it can support any language that compiles down to WASM. If you go to Lunatic GitHub repo first things written there is this "It is heavily inspired by Erlang and can be targeted from any language that can compile to WebAssembly. Currently there are only bindings for Rust available.".
I also find this project very interesting and maybe in future it can be alternative to Erlang VM.
You are right, I missed the "forest for the trees" :)
Personally I'm glad to see Erlang and Elexir ideas happening in Rust too — and even better — the within-processes-only crashes for example.
That sounds weird to me. If you look carefully in the article above: 2 000 real threads, Mac OS crash. But 10 x more lightweight threads: Works fine. That can be real-life useful when designing a web framework for example.
2,000 real threads doesn't make sense for parallel compute. i9 intel processors don't have 2,000 threads.
Now, 2,000+ "threads" or better yet async code DOES make sense for I/O. Because there is not real "work" being done.
But if you are doing compute bound work, 2k+ CPU threads makes no sense on a personal computer.
100%, if your work is I/O bound, it is exceedingly useful. That is the point of my comment.
The example of the TCP echo server is a great example of I/O bound work.
Ok, thanks for replying. Agreed that 2k CPU threads sounds a bit weird
To me, this looks amazingly promizing, B. Kolobara, and I'm looking forward to one day when maybe these things are available in an Elexir-Erlang style web framework. Such a framework could become the bost fastes, and the most scaleable (1st place shared with Erlang) in the world? And the most robust too? Because of any crashes happening only within the lightweight processes.
It looks like the the erlang baseline per process memory usage (stack + heap) is pretty low. Do you know what the Lunatic process size is / what is the future goal?
Lunatic's process size is a bit higher than Erlang's when a process is spawned. It's around 4KiB for the stack, if you don't use any heap data. On modern 64bit CPUs Lunatic will relay mostly on cheap virtual memory. The actual memory consumption during runtime should be lower than Erlang's in most cases just from the fact that Rust's data structures are more compact and memory efficient. This is something that can be optimised further if it ever becomes a bottleneck. Right now the development is focused on stability and correctness before performance.
If I had money to invest in things, then you would've just made an angel investor out of me. As I'm quick to say, I have never told someone, "I don't want it done right, I want it done fast!"
Also, this post has made me realize I've been too liberal in rating things as unicorns, because you, sir, are building a flerfing unicorn. My hats — all of my hats — off to you. 🎩🧢👒🎓⛑🪖👑
I love this! I have dabbled with Rust for a few years and have high hopes, and was disappointed when I saw Rust looking more like C# with the async/await stuff - so much complexity. And I'm a big Elixir fan and have used for years.
The discussion that the “OS” can handle only a specific amount of work - I think this means that the underlying HW can only handle a specific amount of work. The problem here is that HW is a moving target, so how do you know what you are running on? With the explosion of multi-core computing it seems every year we are running on computers with more cores - I don’t want to change my code every year.
Using “process” or “threads” is not just about performance optimizations - it’s also about the ease of programming. It’s much easier to write sequential code (do a, do b, do c) rather than break it up into async, await, etc., which brings in lots of complexities. I think Go understands this too.
I don’t understand the idea that massive concurrency only makes sense if you are I/O bound. Concurrency makes sense from 1) programming is easier, and 2) performance when you have more parallelism (which is becoming quite usual).
Just to add to this, I think Rust / Lunatic can do a better job at this than Java Akka. Java Akka and Play, etc., show the limitations and difficulties in isolation.
I'd consider using Pony if I need something of Rust+BEAM caliber.
Cool stuff though!