Recently I delved deep into FastAPI docs and some other resources to understand how each route is processed and what can be done to optimise FastAPI for scale. This is about the learnings I gathered.
A little refresher
Before we go into optimising FastAPI, I'd like to give a short tour on a few technical concepts.
Threads & Processes
Threads share memory space and are easier to create.
Processes have separate memory space and thus require some overhead to create.
Multi-threading & Multi-processing
Multi-threading uses multiple threads within a single process.
Multi-processing utilizes multiple processes, leveraging multiple CPU cores.
Concurrency & Parallelism
Concurrency is managing multiple tasks together. Made possible using event loop.
Parallelism is executing multiple tasks simultaneously. Made possible using multiple CPU-cores.
Python & the GIL
Python's Global Interpreter Lock (GIL) allows only one thread to execute Python bytecode at a time. However, for I/O-bound operations, the GIL is released, allowing other threads to run while waiting for I/O completion. This makes Python particularly effective for I/O-heavy applications.
Quick FYI: In some time, GIL will probably be optional in Python. (PEP 703)
FastAPI
How route handlers are processed
- Regular route handlers run in an external threadpool
- Async route handlers run in the main thread
- FastAPI doesn't affect other (utility) functions
Optimizing FastAPI
Choose based on your task:
- I/O tasks with minimal or no CPU work: Use async route handler and await I/O tasks
- Non-async I/O tasks: use regular (def) route handler
- I/O tasks with significant CPU work: Use regular route handler or async route handler which queues the task for external worker (multi-processing)
- High compute tasks: Use multi-processing same as above
The reason I'm suggesting to use regular route handlers for most cases is because we want to keep the main thread available for receiving requests and managing them. If we have any blocking code in main thread (async handlers) it would affect the incoming requests.
Using multiple processes
- For containerized environments: Use Docker Swarm/Kubernetes to create workers and use a load balancer
- For traditional setups or just Docker: Use Gunicorn (for process management) with Uvicorn (workers)
Some resources I found to be of great help
- Building large APIs with FastAPI - PyCon SK
- Threading vs multiprocessing in python
- https://fastapi.tiangolo.com/async/ (official FastAPI docs, they're amazing)
Top comments (0)