In the previous posts of the series I recounted why the need for MockingBird arose and what we did to implement it.
To recap, it was implemented using Flask and Python and was started on the machine using the following command:
python mockingbird.py
This meant that Flask was running in its default mode - development, on its own server and was running a single process - ideally with a single thread but I have read conflicting documents that mention that flask is multi-threaded by default currently (i.e Threaded=true is the default way now).
If a single thread is around, then flask will serve only one request at a time. This combined with the 0.5s sleep we applied would mean that we were able to serve only 2 requests per second. Which is not ideal.
My target was 1200 requests per minute at the bare minimum.
My tests initially went well, we were able to handle the load thrown at the MockingBird pretty well (probably because of the aforementioned threading) for short test durations and low load. But I noticed something weird in the logs of my tests when I applied the full load.
Timeouts, lots of them.
The timeout in the originating MS was set at 60s. And Mockingbird was unable to send responses to a majority of the requests (>60%) in that time.
And that was how I killed the MockingBird.
Looking at the logs I started cursing at myself for not thinking the solution through. Redoing the whole thing would mean loss of time. And time is something we cannot afford at my current organization.
So off I went to read the flask documentation(great resource btw) which mentioned that the inbuilt webserver was not ideal for production loads (which I had thrown at it). It might have worked if I didn't have the blocking sleep call but that was needed. So what else could work?
I decided to take a look at some recommended servers. One I liked in particular was Gunicorn, with its support for different types of workers.
A worker is to put it simply, a copy of your code that is run in parallel to other workers. Read the design doc for more info.
pip install gunicorn
is how to install gunicorn on your system
I also decided to use the gevent worker for gunicorn to handle multiple requests. This decision was made since we needed a AsyncIO worker to optimize for our forced IO wait times. For more information I suggest reading the documentation linked above.
pip install gevent
is how to install gevent.
After doing these two things, it is a matter of just issuing the proper command (no code changes needed to existing files!) to start the application.
All that is needed is one extra file in the root directory of your project:
from flaskapi import api
if __name__ == "__main__":
api.run()
Name it whatever you want. I named it wsgi.py
The command:
gunicorn --worker-class gevent -w 4 flaskapi:api --bind 0.0.0.0:5000 --daemon
- 0.0.0.0 is important if you want intranet access to your API/server
- -w 4 means 4 workers : ideal value should be number of processors + 1
- --daemon means that processes will be spawned in the background
This can be seen to start 5 processes on the server, one master and 4 workers each serving multiple requests per second. This worked sorta fine for the load we were throwing at it, but we had to eventually upgrade our puny server (CPU was hitting 100% at peak load) to land consistent results, even under prolonged peak load.
And thus the MockingBird lives to sing another day.
So this brings to an end the MockingBird series of posts. One hopes it has been informative!
Top comments (0)