When using python web frameworks it's important to understand how they deal with asynchrony, otherwise you will surely run into some massive performance road bumps as your application scales up. In this blog, I explore some of the biggest differences you may need to be aware of for frameworks like Flask, FastApi, Django and many more. In particular, Flask and FastApi are compared by means of small example applications.
The basics of concurrency and parallelism in python
CPython, The most common implementation of Python inherently doesn't support running code in parallel. This is prevented by the GIL (global interpreter lock), which allows only one thread to run python code at a time. This is a rather big topic on its own, more information can be found here.
This is not to say that nothing can run in parallel, some packages like Numpy are mostly written in c, and while running they may release the GIL until they need to call the python C API or update python objects. IO-bound operations that involve system calls will also release the GIL (because in this case, the kernel is responsible for handling the request safely).
All in all you must keep in mind that, when writing pure python code, a single process can run your python code concurrently, but not in parallel. Note the emphasis, as you can create multiple processes to handle requests in parallel. This is done in web servers by spawning multiple worker processes to handle incoming requests. Typically you want to have a stateless application to support this pattern of deployment.
WSGI and ASGI
The Web Server Gateway Interface(WSGI) is a standard to decouple the choice of various web frameworks like Flask and Django from the choice of web servers like Gunicorn or Waitress. Without standardization, the best case scenario is shown below at the left, where each framework is compatible with each web server. Of course in reality, there would be missing links and some would be incompatible. On the right the WSGI standard is introduced. Developers of web frameworks will only have to implement this interface to automatically support all web servers and vice-versa.
This kind of indirection through standardisation is pretty common to make compatibility between different kinds of software components easier. Some other good examples are the LSP project from Microsoft and ONNX to represent machine learning models. The first provides a standard so that IDEs don't have to re-invent the weel for every programming language. The latter decouples training frameworks from inference frameworks. Going back to WSGI, you can find a pretty extensive rationale for the WSGI standard here if interested.
Asynchronous Server Gateway Interface(ASGI) is also a standardisation, focusing on... asynchronous behaviour. One shortcoming of WSGI is that applications must offer a single synchronous callable that takes a HTTP request and returns the response. A worker (typically multiple worker processes are available and each will handle requests) within WSGI will be blocked until the response is there. In ASGI, the worker may be freed up for other requests rather than actively waiting until it can continue. This allows for a higher throughput, especially when the number of IO bound requests is high. The following figure illustrates this key difference. For WSGI request 1 must be finished before the worker can take on request 2.
Note that in the explanation above, we have left out workers with multiple threads, as supported by many WSGI web servers. We will get back to that later on
Async behavior in WSGI frameworks
It may be somewhat confusing but Flask, which is a WSGI framework, also supports using async for request handlers. This works very differently from async request handlers in an ASGI framework! I'll illustrate this by implementing the same example in Flask(WSGI) and FastAPI(ASGI).
Consider the following web app in Flask. Two routes are offered to retrieve some resources. Let's assume the resources are on a different machine at a different location, and they may take a while to retrieve (simulated with a sleep call).
from flask import Flask
from flask import jsonify
import asyncio
app = Flask(__name__)
resource_ids = [1,2,3,4,5]
async def download_resource(id):
# Dummy method to simulate long running IO
await asyncio.sleep(5)
return f"dummy_result_id_{id}"
@app.route("/resource/<id>")
async def retrieve_resource(id : int):
"""Get all available resources"""
result = await download_resource(id)
return jsonify(result)
@app.route("/resources")
async def retrieve_resources():
"""Get a specific resource"""
async with asyncio.TaskGroup() as tg:
dl_tasks = [tg.create_task(download_resource(id)) for id in resource_ids]
# All downloads have completed
return jsonify([task.result() for task in dl_tasks])
Let's spin this one up using the WSGI web server Gunicorn (Note that the given options are defaults, but are given here to emphasize their value of 1):
gunicorn flask_async:app --workers 1 --threads 1
Now we will send 3 requests at the same time and measure the total response time
$ time curl --parallel --parallel-immediate --parallel-max 3 -X GET http://localhost:8000/resource/1 http://localhost:8000/resource/2 http://localhost:8000/resource/3 -H 'accept: application/json'
"dummy_result_id_3"
"dummy_result_id_1"
"dummy_result_id_2"
real 0m15,038s
user 0m0,006s
sys 0m0,000s
Notice how the async route didn't really do anything in this case. However, if we use the other route to get all resources in a single request:
$ time curl -X GET http://localhost:8000/resources -H 'accept: application/json'
["dummy_result_id_1","dummy_result_id_2","dummy_result_id_3","dummy_result_id_4","dummy_result_id_5"]
real 0m5,014s
user 0m0,005s
sys 0m0,000s
Here we see what we'd like, Multiple requests are handled concurrently. Before explaining this, let's look at the same program in FastAPI, which is an ASGI application,
from fastapi import FastAPI
from pydantic import Field
import asyncio
app = FastAPI()
resource_ids = [1,2,3,4,5]
async def download_resource(id):
# Dummy method to simulate long running IO
await asyncio.sleep(5)
return f"dummy_result_id_{id}"
@app.get("/resource/{id}")
async def retrieve_resource(id : int = Field(ge=1)):
"""Get all available resources"""
result = await download_resource(id)
return result
@app.get("/resources")
async def retrieve_resources():
"""Get a specific resource"""
async with asyncio.TaskGroup() as tg:
dl_tasks = [tg.create_task(download_resource(id)) for id in resource_ids]
# All downloads have completed
return [task.result() for task in dl_tasks]
To spin this one up, we use an ASGI web server named Uvicorn.
uvicorn fastapi_async:app --workers 1
For this one, we again run 3 simultaneous requests:
$ time curl --parallel --parallel-immediate --parallel-max 3 -X GET http://localhost:8000/resource/1 http://localhost:8000/resource/2 http://localhost:8000/resource/3 -H 'accept: application/json'
"dummy_result_id_3""dummy_result_id_1""dummy_result_id_2"
real 0m5,018s
user 0m0,003s
sys 0m0,003s
Note that, while the route is the same as in Flask (putting aside differences in the frameworks that are irrelevant to async behaviour) the response time here is only 5 seconds, compared to the 15 seconds from before!
This difference can be explained in where the async event loop is set up. For Flask, being WSGI, it is set up in the request handler itself. This means that if we await a single result in the request handler, control is given back to the event loop, but there is nothing else to do than wait. The requests have to be processed one by one as the only worker is blocked for 5 seconds each time. For the single route that requests all resources, The 3 downloads can be done immediately as they are done within the same request. In contrast, FastAPI, being ASGI, will set up the event loop one level higher. When a request awaits a result, the worker is immediately free to take on a new request.
Understanding this example is essential when working with different web frameworks, especially if you work on a larger project that mixes WSGI applications (for instance Dash as a frontend, which used Flask) and FastAPI as a backend.
A note on threads
Flask can run multiple worker threads, which is the default when running a development server using flask --app flask_async run
. When using threads with Flask there are two backends:
- Gthread: Threads in the regular sense, managed and scheduled by the OS. This is expensive and it directly ties the number of concurrent requests you can handle to
n_worker_processes * n_threads
- Greenlet: These are pseudo-threads implemented without any OS involvement. The user can decide when control is handed over to a different thread, and/or blocking operations may be altered automatically (using monkeypatch). You could say that with ASGI, async behavior is explicit, while WSGI with greenlets is implicit. Code may look exactly the same as synchronous code, while still gaining async benefits. Performance wise, it is difficult to say which one is better, as both rely on the same backbone of python co-routines. More info can be found here
Conclusion
It is fair to say there are a lot of possibilities when it comes to running async code in web servers. If you are having issues with responsiveness make sure you know the differences between your development setup and production setup (how many workers and threads are being used, what kind of worker is being used). Also be aware of the underlying standard (WSGI or ASGI) used by your framework when deciding on async routes.
Top comments (0)