In a set of upcoming blog articles, I will take you through some of the design decisions we have taken and the challenges we’re facing in the current and future architecture of our platform. This article outlines the main reasons why we’re moving away from our monolith application setup.
At Sendcloud we run a shipping platform built using technologies like Python, Django, our main database is Postgresql, Redis for caching, and Dramatiq for background tasks.
For many years we embraced the monolith pattern at Sendcloud. For a small start-up with a team of < 20 developers, this was a great way to develop an application. We carefully designed our monolith-first strategy. The simplicity of our stack allowed for fast iterations, trying to get a product-market fit at a rapid pace. The majority of our platform’s code is kept in a single repository, while different domains are divided into Django apps making sure developers didn’t step on each other’s toes. For a long time, a developer could keep the entire code-base in his head.
After all these years we’ve decided to move away from our monolith setup, breaking our platform into smaller services.
- Our main drivers to do so are:
- Decrease complexity and cognitive load for developers,
- Reduce deployment times and risks,
- Allow for different scaling and robustness requirements per service,
- Introduction of new technologies.
Our codebase has grown quite big, although well documented and with extensive test coverage, it’s hard to comprehend the entire code base. Especially for newcomers to our team, the project is getting overwhelmingly big. It’s hard to oversee the impact of changes you’re making or to keep up with changes your colleagues are making. Moving to smaller services allows our teams to move faster and keep an overview of the services they are working on. Implementing strict contracts (APIs), allows teams to efficiently work on these services in parallel.
Each deployment is a potential risk to disrupt our service. We have an extensive continuous integration and deployment setup, currently making 10–15 deployments per day mostly containing a wide range of code changes. Each time, however, we build and deploy our entire platform, even for small changes. With a growing team, the new releases will contain more changes, increasing the risk of deployments and making it harder to relate an error in production to a merge request. All in all, this is causing a waste of valuable computational resources, unnecessary long time to deploy, and this is slowly creating a bottleneck since more developers want to join the merging train or spend more time on triage of errors.
Sendcloud is also rapidly expanding internationally (we launched our product in Spain and Italy this year and will do so in the UK later this year as well), and our customers are active at extended hours. This makes maintenance without impacting our customers harder. Having separate services, databases, and infrastructure for various purposes will allow us to perform maintenance on these services with reduced customer impact. If these separate services are still tightly coupled, the impact of such maintenance will easily cascade through the platform. To improve this robustness it’s key that these services are loosely coupled.
Once the APIs are clearly defined between connecting services, new technologies can be introduced in our stack as well. Sendcloud highly depends on external services, where we make a lot of API calls to these services. Python isn’t necessarily the most efficient tool to handle a lot of blocking IO operations in parallel. Having strictly defined contracts, allows teams to experiment with new technologies, in our case we’re gradually introducing more Go into our stack.
In upcoming articles, more details will be shared on technical decisions we’ve taken. A good read is Sam Newman’s book — Monolith to Microservices, which outlines various patterns on moving from a monolith to a microservices architecture.
Top comments (1)