Hey, how's it going?
Recently, I've been studying concurrency and parallelism and came across a nice thing called Green Threads.
In this post, I'll explain what they are, and how I tried implementing them on Node.js and failed miserably, hope you enjoy it!
Prelude
Before explaining what are green threads, we must first understand what is a thread.
To be honest, to be able to explain what those are, I'll need to talk about Processes.
Processes
What is a process?
A process is the live version of a computer program.
As if the code you write were a recipe and the process you execute was the execution or process to make the recipe real.
So... What is a process?
It's the execution of a program. Your browser, text editor, image preview, and file explorer, they're all processes. More often than not, they start not one but multiple processes.
How to start a process?
If you're a Windows user, by double-clicking some application or executing them on the CMD (Command Prompt) or Windows Terminal.
If you're a Linux/MacOS user, you can also use the slow-but-common double-click to open processes or, you can pretend to be a series tv hacker and use the terminal.
Behind the hood, whenever you start a process:
The OS finds an unused section of main memory that is large enough for the application.
The OS makes a copy of the application and its data in that section of the main memory.
The OS sets up resources for the application.
Finally, the OS starts the application.
Good to know: A process consumes memory and has instructions that'll be executed by the, oh no, the Processor.
That's right, the Processor (aka CPU) processes instructions from a process and that requires memory.
Each process has its own space in memory.
A process cannot access another process's memory, even though it can communicate through messages sometimes.
Threads
What about threads?
Threads are lightweight processes.
A process can start many threads, and a thread shares the parent-process memory.
Meaning that two threads started by the same process could potentially access the same memory.
OS Threads have a cost: memory.
They're lighter than a new process but still require memory to be created, it may not seem too much, but if you take a look at this chart:
You can see that one approach (Apache) consumes a lot more memory than another (Nginx).
This slide was presented at NodeJS's first talk ever.
I strongly recommend you to watch, it's awesome!
Apache consumes more memory because it uses a one-thread-per-request approach while Nginx uses a non-blocking event-loop approach to handle new requests, i.e. no threads.
Ok then, threads are lightweight processes but still consume memory and when you need to scale (to multiple threads) they start to get costy.
How to solve that?
Well, if you don't want to use an event loop with non-blocking I/O, there is a way: Green Threads.
Green/Virtual Threads
Green threads are threads, but not OS threads.
They're threads managed by the application or runtime, that does not involve creating OS threads.
Therefore, they spend less memory as a single OS thread can have multiple virtual threads.
How to create a Green Thread?
Well, you need to replicate what the OS does inside your application/runtime, meaning that you'll need to create an orchestrator (or scheduler) to switch between your virtual threads.
There are two types of schedulers: Preemptive and Cooperative.
Cooperative: It will never block any thread/process. It's the job of the process/thread to give the control back to the scheduler.
Preemptive: It will handle blocking and switching between threads/processes, it's not the job of the application to know when to return the control.
Cooperative are harder to use for the end user, as they'll need to properly handle the stops and switches.
Preemptive is easier to use but harder to create, as it's the job of the scheduler to persist the state of the threads between switches and ensure consistency.
Golang implemented goroutines in it's runtime, by creating a [preemptive scheduler].(https://go.dev/src/runtime/preempt.go)
What about Node.js? We're getting there, hang tight!
Preemptive Scheduler
Ok, let's try to create a preemptive scheduler in Node.js
First, we need a way to add new virtual threads to be called, that's easy!
Let's create a class (I know, JS devs usually hate classes, but I think they're useful sometimes).
class PreemptiveScheduler {
#virtualThreads;
constructor() {
this.#virtualThreads = [];
}
addThread(func) {
this.#virtualThreads.push(func);
}
}
const longRunningTask = (taskId) => {
console.time(taskId);
console.log(`Started running task: ${taskId}`);
for (let i = 0; i < 100_000_000; i++);
console.log(`Finished running task: ${taskId}`);
console.timeEnd(taskId);
};
const scheduler = new PreemptiveScheduler();
scheduler.addThread(() => longRunningTask(1));
scheduler.addThread(() => longRunningTask(2));
Now, we need a way to be able to start the threads, but more important we need a way to be able to stop the running thread, and switch to another one.
And that's where JavaScript (Node.js) cannot help us.
We could add an interval of, let's say, 10ms to access different threads inside the threads
array, but, once the function is started, there is no way to stop it.
JavaScript does not provide this functionality.
Meaning that is impossible to create a preemptive scheduler, as this would require being able to stop a function.
Cooperative Scheduler
While we can't create a preemptive scheduler using Node.js or JavaScript, because there is no way to stop a function execution from outside of it, there is one way to cooperatively stop a function, and it is called Generator Functions.
The Node.js Event Loop itself is considered to be a Cooperative Scheduler for MultiTasking, as through callbacks/promises, the user can define when to return the control to the scheduler (Event Loop).
Generator Functions in JavaScript allow us to return the control to the parent function, which can choose whenever it wants to call the next
function to unpause the task.
Let's take a look at this code:
class CooperativeScheduler {
#taskQueue;
#running;
#completionResolver;
constructor() {
this.#taskQueue = [];
this.#running = false;
this.#completionResolver = null;
}
addTask(taskGenerator) {
this.#taskQueue.push(taskGenerator());
}
runNextTask() {
if (!this.#taskQueue.length) {
this.#running = false;
if (this.#completionResolver) {
this.#completionResolver();
}
return;
}
const currentTask = this.#taskQueue.shift(); // Get first and walk right
const { done } = currentTask.next(); //Execute next step
if (!done) {
// Push next execution to the end of the queue
this.#taskQueue.push(currentTask);
}
setImmediate(() => this.runNextTask());
}
start() {
if (!this.#running && this.#taskQueue.length > 0) {
this.#running = true;
this.runNextTask();
}
}
waitForCompletion() {
if (this.#taskQueue.length === 0 && !this.#running) {
return Promise.resolve(); // If no tasks are running or pending, resolve immediately
}
return new Promise((resolve) => {
this.#completionResolver = resolve;
});
}
}
function* cooperativeFunction(taskId) {
console.log(`Task ${taskId} started`);
yield;
console.log(`Task ${taskId} is processing...`);
yield;
console.log(`Task ${taskId} finished!`);
}
// Using the Cooperative Scheduler
(async () => {
const scheduler = new CooperativeScheduler();
const times = 10;
for (let i = 1; i <= times; i++) {
scheduler.addTask(() => cooperativeFunction(i));
}
scheduler.start();
await scheduler.waitForCompletion();
})();
I start by creating a function *cooperativeFunction(taskId)
that is a generator function and it has 2 yield
operators.
The yield
operator means: to stop the function and return to the caller.
With the CooperativeScheduler
class, I've created a mechanism where we can add tasks, start them all and wait for the completion, then, add 10 example tasks.
The scheduler cooperatively pauses and switches between tasks.
This is the main result after running the code:
node cooperative.js
Task 1 started
Task 2 started
Task 3 started
Task 4 started
Task 5 started
Task 6 started
Task 7 started
Task 8 started
Task 9 started
Task 10 started
Task 1 is processing...
Task 2 is processing...
Task 3 is processing...
Task 4 is processing...
Task 5 is processing...
Task 6 is processing...
Task 7 is processing...
Task 8 is processing...
Task 9 is processing...
Task 10 is processing...
Task 1 finished!
Task 2 finished!
Task 3 finished!
Task 4 finished!
Task 5 finished!
Task 6 finished!
Task 7 finished!
Task 8 finished!
Task 9 finished!
Task 10 finished!
And that's a wrap!
Conclusion
Throughout this blog post, we've navigated the intricate landscape of concurrency, shedding light on the distinctions between OS Threads and Virtual/Green Threads. We also delved into the realms of Preemptive and Cooperative Schedulers, exploring their unique characteristics and applications.
This exploration not only highlighted the versatility and challenges of implementing different types of threading models in Node.js but also provided a glimpse into the broader world of concurrent programming.
I hope this post has been enlightening and engaging, offering you valuable insights into the complexities and beauty of threading and concurrency.
Thank you for joining me on this adventure!
Top comments (5)
Nice one, Caio, really like it! As I know you like doing benchmarks and got impressed with Go a few days ago, Go has green threads as a native tool called goroutines, in case you'd like to check it take a look here
Hey man, thanks a lot for the recommendation, I'm in fact studying golang for the past few weeks, I find it impressive, it's complete and focused in concurrency from day 0.
They have both, preemptive and cooperative schedulers for multitasking, uses all CPU cores from default, I'll definitely learn more and write a couple articles about it in a near future.
Nice! I'll stay tuned :)
Nice @ocodista ! One question: Have you ever had an real use-case for generators and any kinda of expensive computation which using Cooperative Scheduler + Generators could fit?
Hey @lukas8219 I used Generators once when parsing and sending emails for trampardecasa.com.br
The benefit of generators is that you can stop the iteration whenever you want, leaving to the parent caller to be able to resume or wait something to happen.