You may be "Yeah, big deal, another article about Fibers", but NO! I mean, yes, you are partly right, I am also going to approach Fibers, but also other features you may not be aware of.
Before we dive into how to achieve concurrency and parallelism with PHP, we need to talk about what these two things are, and no, they are not exactly the same thing. If you are a computer science sorcerer that knows everything about this topic, you can skip this first part, you'll be forgiven 😛
Concurrency and parallelism are terms that are sometimes used interchangeably, however, they're conceptually different. Let's go to the kitchen and use an analogy to understand these terms.
First, a bit of theory
Suppose we are functional adults and we're going to prepare hamburgers for dinner, but you are like me, who can only do one thing well at a time. Our simplified "main process" would be something like this:
- Get the minced meat
- Get the buns
- Shape the hamburgers
- Cut the buns
- Separate the cheese
- Fry the hamburgers
- Add the cheese to melt
- Assemble our hamburgers
Following this sequential flow and having a single worker, namely "you", each task can only start once the previous one has finished.
This is basically how our PHP scripts work when they do not use parallelism and concurrency features - you can only iterate over entries you obtained from the database after the script has connected to the database, executed the query, waited for the results and closed the connection. You are blocked or, I/O blocked, as some literature puts it.
Back to the kitchen, now suppose you are not like me 🙂
- You shape the hamburgers
- Get the buns
- Separate the cheese
- Do until all the hamburgers and are assembled
- Fry a hamburger
- While one is frying, you can cut one or more buns
- Assemble the hamburgers as they are fried and the buns are cut
- Repeat until all hamburgers are done
Now we are capable of dealing with multiple things at once, checking visually if parts of our process have completed so we can start the other and, more importantly, we are not blocked while the hamburgers are fried!
This workflow demonstrates exactly what concurrency is. In concurrency we are not necessarily doing multiple things at once, indeed, multiple things are happening at the same time, but YOU are doing only one thing at a time, either cutting buns or taking care of the hamburgers while they fry.
Using concurrency features in your PHP script you would be, for example, capable of executing multiple queries into a database at once, do other processing and, when these queries return, you can then process them and keep your usual flow, for example. We are not executing each and every query sequentially and waiting for each one return results to be able to execute the next one. We are not that blocked anymore!
Now back once again to the kitchen, and I promise this will be the last time. This time you are not cooking alone anymore, you have a kitchen partner. I'll be generic and call you and your partner as worker 1 and worker 2, and both are working at the same time.
Worker 1
- Get the minced meat
- Shape one hamburger
- Start frying. Once one hamburger is in our frying pan:
- We can shape other hamburgers
- Take care of our hamburgers being fried
- Add the cheese that hopefully our worker 2 has separate
Worker 2
- Get the buns
- Cut the buns
- Separate the cheese
- Assemble the hamburgers as they are fried
Now we are effectively doing multiple things at once, thanks to the help of our second worker!
Notice now there is some coordinating work to do, such as checking if the other worker has finished one step before proceeding with your task. In a similar fashion, our PHP script could have two workers, for example, one performing one task while the other performs another. They can (actually they need to) communicate with one another and synchronise their tasks.
Parallelism and concurrency actually have two distinct objectives.
Concurrency aims at dealing with multiple things at once, not getting blocked whenever we need to wait on external outcomes. Parallelism, on the other hand, focuses on doing multiple things at once, potentially maximising the performance and use of our resources without (hopefully) compromising the end result.
Now going back to PHP
In this article I am only going to cover concurrency, parallelism will be explored in the second part. First, we will see what building blocks PHP itself provides to us and afterwards we will check out some libraries that can make our lives much easier.
Generators (and coroutines)
According to the docs, Generators "provide an easy way to implement iterators without the overhead of creating a class that implements the Iterator interface" (and the overall complexity of managing its state). Generators are not something necessarily new, it was added back in 2013, on version 5.5.
Conceptually, a Generator function is any function that contains a yield statement. The yield statement is similar to the return, except that instead of stopping the execution of the function and returning a result to the caller, it provides a value and pauses its execution.
Since Generator objects implement the Iterator interface, we can do things like this, for example (extracted from the Generators's RFC):
As getLinesFromFile
has a yield
statement, it is therefore a Generator function. $lines
, on the other hand, is a Generator object and, since the Generator
class implements the Iterator interface, it can be iterated like we did on line 14.
Generators work passing control back and forth between the generator function and the caller, in our example, the foreach
. It works this way:
When the Generator is created on line 13, it is not executed - it is only executed when we call the method next, something that our loop does implicitly. The execution happens as usual until it reaches the yield
statement, on line 7. At this point, it pauses the execution and its current value is handed over to the caller. In the next foreach
iteration, the method next is called again, resuming the generator at the point it had been paused.
At this point you may be thinking: So, what does generators have to do with concurrency? Turns out that, even though the articles cover more this particular aspect of Generators, they can do much more! “Hidden” in the RFC, there is the following paragraph:
Generators can also be used the other way around, i.e. instead of producing values they can also consume them. When used in this way they are often referred to as enhanced generators, reverse generators or coroutines.
That's right. Since generator functions are interruptible, we can use them as part of cooperative multitasking, also known as non-preemptive multitasking. In this approach, each executing unit (the generator or coroutine, in our case) voluntarily yields control periodically when blocked or idle. It is called cooperative because every generator must cooperate in order for the whole thing to work. With this approach, we need something that executes our generators and lets them return control back to it voluntarily.
Enough of chit-chat, let's check some code.
You can also check out this and all the next examples in this repository.
First let’s clone the repo and run a container with PHP and Apache:
docker run -p 9000:80 -v "$(pwd)"/src:/var/www/html -d php:8.2-apache
The class Task
is merely a wrapper for our tasks, it accepts an ID and a generator. The ID will be set by the Scheduler
, as follows.
The class Scheduler
is responsible for executing the tasks. We use its newTask
method to create the task, which stores it in a queue and in map, to control which tasks to run.
In each iteration, the method run
obtains one task from the queue, executes it and, if the task is terminated - that is, if the generator is not valid anymore - the task is removed from the queue.
The function fetchResource
, uses curl_multi*
functions to fetch multiple resources at the same time. The function curl_multi_exec
does not block our execution while it waits for its response; if instead the function curl_exec
had been used, this code simply would not have worked. This is a very important point, which we will come back to shortly.
We are using bare curl functions here on purpose, in order not to hide anything using libraries. As we will see later in this article, libraries such as Guzzle allow sending multiple concurrent requests, returning Promises similar to those we find in other languages.
We execute our do/while loop while our request is still running, but we interrupt the function's execution and return the control back to our scheduler once the execution reaches the yield
statement. This interruption makes that the next loop iteration only happens after we give another task a chance to execute. The next time that the first task has been given a chance to process, it will merely check if the resource has returned, if so, it will terminate, otherwise, it will pause itself once again and return the control back to the scheduler.
Executing this code you'd see something like this (shortened for brevity):
It is beyond the scope of this article to cover all the use cases of Generators, but at this point you may have noticed that they can do much more than allow you to iterate on stuff, right?
If you are interested in going deep in this subject, I strongly recommend this Nikita Popov's article. He is simply the Generators RFC's author and proponent of many other very cool features we all use every day. Part of the code we’ve just seen was borrowed from the aforementioned article.
Even though this Generator's side is quite obscure for most PHP developers, it has been extensively explored by well-known libraries, such as ReactPHP and AMPHP, which we will see later on.
A word about blocking and non-blocking functions
Unfortunately, most PHP functions are blocking, with some rare exceptions. Besides curl_multi* functions that we’ve just seen, we can set streams as non-blocking, with the function stream_set_blocking, and then use stream_select to wait for them to change status for some time.
We can also work with sockets in a similar way, setting the socket as non-blocking with socket_set_nonblock, and then using socket_select to wait for the sockets to change the status.
Fibers
As we’ve just seen, it is already possible to achieve concurrency in our applications via Generators, so… what’s the real utility of Fibers, and what’s the difference between Fibers and Generators?
The Fibers RFC was proposed in order to address one very common problem that other languages also suffer from, the problem known as the “What colour is your function”. Summarising, this problem is characterised by the distinction we've got between synchronous and asynchronous functions:
- Asynchronous functions need to be called in a different way (such as with the await keyword, in JavaScript)
- Synchronous functions cannot call asynchronous ones (even though the other way around is possible)
If you have ever worked with Promises/async/await in JavaScript, you surely noticed this effect. Whenever we use await
in a function, this function also needs to be asynchronous and, therefore, needs to be marked as async
.
Fibers represent full-stack, interruptible functions, which can be suspended from anywhere in the call-stack and have its execution paused from within the fiber until it is finally resumed at a later time. In a sense, they are very similar to Generators, but there are a few important distinctions:
- Fibers pause the entire execution stack, so the direct caller of the function does not need to change how it invokes the function
- Fibers have their own call-stack, whereas Generators do not. This allows them to be paused within deeply nested function calls
Important to say that, in the same way as Generators, we can use Fibers to achieve concurrency, but not parallelism, that is, even though you may have multiple fibers, only one is actively being executed. Fibers are also known as green-threads or coroutines in other languages, but don't be misled by its name - fibers exist within a single process thread, and that's the reason that only one Fiber is executed at a time.
Example 1 - our hello world.
Exactly, this is our hello world using Fibers. I admit, nothing fancy, and definitely no concurrency, but we are close, I promise. Here's what's happening:
Fibers are created passing a callable to it - it is not executed though, only created. Inside the fiber, during its execution, we suspend it with a value - the string "Hello". On line 4 we print the value we have passed as an argument to the resume method, which we use to resume the fiber’s execution on line 11.
On line 7 the fiber is actually executed, until we suspend it on line 2 with "Hello". The instruction on line 9 prints “Value from fiber suspending: Hello”. On line 11 we resume the fiber execution, which will execute instruction present on line 4, completing our Hello World.
The whole output of our example is the following:
Value from fiber suspending: Hello
Value used to resume fiber: World
Fibers can be started, suspended from anywhere in the call stack, throw an exception/error and terminate. We can also check for their current states, using the isSuspended()
, isStarted()
, isRunning()
and isTerminated()
, and that is basically its API.
Example 2 - sending concurrent requests
Let’s modify a bit our classes Task
and Scheduler
to use Fibers instead of generators.
The task now accepts a callable (not a Generator
object anymore) and, internally, it creates a Fiber out of the defined callable. In the run method, we first check if the fiber has been already started and, if not, we start it, otherwise, the fiber is just resumed.
Our simple scheduler now also accepts a callable rather than a Generator function as a parameter to the newTask
method. Other than that, it remains unchanged.
The function fetchResource
is also basically the same as previously presented, but instead of yielding the control to the caller, we suspend itself, on line 17.
As a result, we should see the same result as we’ve seen with the Generators example:
A final word on Fibers
You might have noticed that working with Fibers and Generators can add a good level of complexity to our code base if we use them directly.
Even though they are the building blocks that PHP provides to us, if you notice in the Fibers’ RFC, the author was quite explicit about its purpose and audience:
Fibers are an advanced feature that most users will not use directly. This feature is primarily targeted at library and framework authors to provide an event loop and an asynchronous programming API.
The Fiber API is not expected to be used directly in application-level code. Fibers provide a basic, low-level flow-control API to create higher-level abstractions that are then used in application code.
Having said that, let’s have a look at some libraries to make our lives a bit easier.
Guzzle
As mentioned before, Guzzle provides a very convenient way to send concurrent requests, which under the hood, as this article is being written, is powered by Generators. If this is your only need for concurrency, go for it. Let's see an example:
The method getAsync
, as well as their similar postAsync
, putAsync
, deleteAsync
and so on, return a Promise, which implements the Promises/A+ spec. You can chain their calls or wait for all of them concurrently, as we've done. Executing this script would result in something like:
AMPHP
AMPHP is a collection of libraries to deal with concurrency and, as of version 3.0, is powered by Fibers.
One of the nicest things about this project is that it also provides non-blocking alternatives for many PHP functions, such as for handling files, executing MySQL queries or communicating with Redis, without requiring the installation of extensions, like Swoole does. Even though this approach may have some performance limitations, it can fit well in many scenarios.
Let's have a look in an example that uses the non-blocking MySQL connector:
We start defining the connection pool and set up our example database. Both tables, tmp1 and tmp2 are created synchronously.
The foreach
defined in lines 17 to 20 we execute our INSERT
statements using the async
method. This method returns a Future
object, which represents the eventual result of an asynchronous operation. A Future
can be in pending, completed successfully or errored states. This is basically the same concept we find in other languages, such as JavaScript.
In line 22 we use the await
combinator to wait for all Future
s to complete, then we do the same process to query the results from line 24 to 27. In line 27 we can see that the async
function returns an array whose keys match the ones from the iterable provided to it and their completion values.
Later on we use the function async
and the fetchRow
to get the obtained results.
The example is quite simple and it probably would not make sense to execute them asynchronously when we have 10 insertions to each table, but can be useful when dealing with multiple databases, for example.
Similar to the await
combinator, we've got:
-
awaitAll
, which awaits all theFuture
objects and returns their results as[$errors, $values]
-
awaitFirst
, that returns the first completedFuture
object, whether successfully completed or errored -
awaitAny
, which returns the first successfully completedFuture
-
awaitAnyN
, which is similar to the await, but tolerates individual failures. It accepts the$count
parameter, that we can set the number ofFuture
objects that must be returned once successfully completed.
The next example will make use of the HttpClient library, which is similar to Guzzle in a sense that allows us to make asynchronous requests, but with some special powers. We will try to obtain an HTTP resource and, if it is too slow for our standards, we fallback on a secondary resource:
As we can see in line 12, we are providing a TimeoutCancellation
to the function
await. This causes a CancelledException
once we reach 2 seconds of execution, making us to fall back on the secondary resource, which could be read a possibly-outdated read-only database replica, for example. Executing this script would print the "I am waaay faster", echoed by the script fast.php
, instead of the 10 seconds that would take if we had insisted on the main resource - slow.php
.
There are other Cancellation exceptions we can use, such as the SignalCancellation
, which cancels the Future
after receiving a specified signal and DeferredCancellation
, which allows manual cancellation when calling the DeferredCancellation::cancel()
.
It is clear that it is way easier to work with Fibers consuming AMPHP than manually, right?
ReactPHP
ReactPHP is a low-level library for event-driven programming in PHP. It provides an event loop and utilities to work with streams, DNS resolver, network client/server, HTTP client/server and interaction with processes.
As AMPHP, it is highly modular, so we can install only the packages we need. Let's check our first example, similar to what we've done with AMPHP and Guzzle, consuming multiple resources simultaneously:
This example uses the HTTP package, which provides the ability to consume and serve HTTP resources. The method get returns a Promise
, an implementation of CommonJS Promises/A for PHP, which is provided by one of the ReactPHP packages - Promise.
Executing this file in our terminal, we can see this output:
That is, even though we have sent the request to the slow resource first, it was the promise associated with the second request that was resolved first.
We can use the combinator all
, similarly to what we've done with AMPHP:
Or use the race function, and guess who's the winner 😛:
Besides enabling us to send concurrent requests, ReactPHP's HTTP package enables us to write HTTP and HTTPS servers that can handle multiple concurrent requests without blocking. Let's see an example:
In this example we create an HttpServer object, which responds immediately to GET requests and, for any other HTTP methods, we defer the response for 2 whole seconds using the await function, provided by the package Async, along with the function sleep, provided by the package PromiseTimer. This way we simulate an asynchronous operation, such as obtaining external HTTP resources or executing database queries.
This function sleep
, unlike the one provided natively by PHP, is non-blocking. Executing this PHP script will launch an HTTP server that listens to the port 9090. With the server running, we can send multiple GET and POST requests and check that, even though there is a single process running, our script is capable of dealing with multiple requests at the same time.
ReactPHP is a great library and it can do much more things that I could cover with this introductory article, nor was this my intention. Should you need to process streams of any type, launch HTTP or Socket, consume HTTP resources, interact with cache, resolve DNS or any other asynchronous operation, you’ll be in good hands with ReactPHP.
Conclusion
As we’ve seen, it may not be as easy as with JavaScript or Python to achieve concurrency, but it is possible, even though it requires some external packages and additional effort. This means you don’t need to rewrite your entire application if you need it to deal with concurrency, isn’t that great?
There is still a lot of ground to cover, as it is not possible to explore all use cases and libraries available out in the wild in this introductory article. Only the aforementioned OpenSwoole deserves its own article!
In the next part of this article we will explore parallelism with PHP. Are you excited? Cause I am.
Top comments (0)