Two days ago we launched 10k+ code changes to production. Although it was intense, it was problem-free, unlike some other deployments. The deployment process lasted 4.5 hours, with us following a 10-step process:
- We organized everyone mission-critical on standby and were ready for the deployment to last much more than anticipated.
- We did it overnight to disturb as few users as possible. For us, this meant fewer variables to worry about.
- We created a Google spreadsheet with all the scenarios that had to work perfectly. For every scenario, we defined 3 phases: development, dev QA, product QA.
- We wrote a deployment guide, simulated deployment locally multiple times, and iterated over it.
- Before launch, we deployed a sandbox, restored production databases there, and verified that everything worked.
- We then deployed a new production environment next to the active one. We had to set up many things and wanted to verify everything was done correctly.
- Once verified, we created new backups on the old production and restored them on the new production.
- We rerouted the traffic to the new production.
- We kept the old production on standby if things went terribly wrong (you never know). Since everything worked, we turned it off after a day.
- We all went to sleep with our ringtones on maximum in case of any errors.
If you find my content interesting, follow me on twitter. We can share half-baked ideas and discuss engineering challenges.
Top comments (0)