Backstory
I originally wanted to write about how I used Vercel to deploy multiple parts of the same website using a reverse proxy because it was truly a "Eureka moment" for me. I was able to split multiple parts of the same website and deploy them separately without authentication issues. And then I remembered that I had solved a similar problem a few years ago while working at Delivery Hero in Germany. I decided to combine these two stories into one post. In rapidly changing start-up and scale-up environments it is not uncommon to see big changes needed in the architecture. This post is about one such occasion and the technical decisions I had to make while delivering the project. I will also discuss how I approached the problem differently recently using the latest technologies like NextJS and Vercel for a client of mine.
The problem statement
As the story goes, I was tasked with an intriguing challenge. The business was expanding from food delivery to grocery delivery and I had to create a new website with a strict deadline without impacting the existing business. We had a successful e-commerce website running. Let's call it www.example.com for example. All the developers were working on this website. The ask was to create another business vertical in parallel with zero to minimal disruption to the existing business but part of the same website. That was it. This was how it was framed in the beginning. We knew what we wanted to build, but didn't know how it would look or what it would take. We also knew that there were downstream decisions that depended on the successful completion of this project in the given time. The CTO Rawad Traboulsi pulled me into a meeting and asked for my opinion on this. It was the first time I was put on the spot like that. I don't remember how the meeting went, but I remember coming out all blank and needing time to come up with an estimate of what it would take to build it. What followed was 3 months of tackling what would eventually be my proudest technical achievement so far.
Challenge #1 - Scoping
The first challenge was to understand the scope of the problem. The problem statement was ambiguous. Nobody knew what the actual requirement was. All there was was an idea and there was a deadline. I figured that the only way to develop that idea from that state was by asking questions. So I probed as many people involved in the business as possible in order to understand the true nature of the challenge at a high level. Soon enough, I had a reasonably good picture of what I had to deliver and the time I had to do it. To give you an idea, it was basically like Amazon moving from their shopping-only business to shopping + groceries. The shopping website was running fine and successful and a whole new groceries business was to be developed in 3 months, of which I was tasked with creating the website. Over the next few days, I had a plethora of decisions to make and I had to make it work.
Challenge #2 - Decisions, Decisions!
I had the following questions to be answered before jumping into implementation. The success of the project depended on answers to these pertinent questions. What never gets said is some seemingly less important questions that are difficult to answer. Naming conventions are a good example. Because everything feels right and wrong at the same time and everyone has an opinion about them. Anyway, the other most important questions were:-
- Will it be implemented in the same codebase or a new one?
- Will it be a subdomain like grocery.example.comor a path in the base domain, like example.com/grocery?
- If it's a subdomain, will the existing authentication system work?
- Will it be deployed as a microservice or on-premise like the existing one?
If it was a microservice, the following questions needed answering.
- How can data be shared between the 2 services?
- How can code be shared between the 2 codebases?
- How can features like serverside rendering and hydration be built in a short time?
- What about logging and monitoring?
- What will the CI/CD pipeline look like?
All these questions wreaked havoc in my mind. I went to the drawing board many a time. As a side note, I can't emphasize enough how some problems need a whiteboard to be solved. Things like architectural decisions need a proper mental model and I have always found drawing them up on a whiteboard helps take these decisions and explaining others the problem as well.
Codebase - new or existing?
Based on the above decisions to be made, choosing the microservice route posed a bigger challenge than the on-premise one. But, the fact also remained that I didn't want to touch the existing codebase given the state of its affairs. It was from the pre-cloud era, and there was a lot of tech debt to be fixed and I didn't want to burden the other developers already having a hard time working on it. Moreover, making drastic changes to a codebase serving thousands of customers was like operating a beating heart. If not done carefully, the results will be catastrophic. That meant it had to be a new codebase developed and deployed in isolation but working in conjunction.
Subdomain or path?
I chose to go the subdomain way here. Due to many reasons. I didn't want to have any conflicts with the base domain. I didn't want someone to deploy a new page in the base domain and have it collide with the microservice in production. That led me to go the subdomain way completely isolated from the main website. It was deployed as grocery.example.com. But thinking back now, it was the best decision. A path would have been a better solution because it gives the user the feeling of continuity and..read ahead.
Authentication problems
The user's authentication status did not carry over from the main domain to the subdomain. This was because the session cookie was set to persist only on the main domain and all its paths and hence it got dropped when the user moved from www to grocery.
To fix this, I had to change the way the cookie was set in the server once the user logged in. This had unexpected behavior for users who never logged out. They were logged in, but not really logged in. So we had to force a re-login for those users. It wasn't the best of user experiences.
Playing with session is dangerous. A lot of things could potentially go wrong if not done right.
On-premise or Cloud?
At the time, I had some experience deploying websites in the cloud. It was a time the world was moving towards microservices and it was clear to me that it will solve my problem of decoupling the 2 businesses. Only, I didn't know how to wire the two applications together. The problem wasn't having the technical expertise to do it, but it was not having a clear mental model of the whole architecture, and not knowing how we wanted the end product to look like. I did a spike to check the feasibility of deploying an app in the cloud and wiring it with the base domain and I eventually took the decision to go the microservices way with the data I had in my hand even though it meant that I had to solve more challenges to get the whole thing working. It was a dangerous decision at the time given the amount of time I had to build them all, but looking back now, it was the right decision.
Data sharing between services
Moving to a new web app meant that both session cookies and the global state of the user did not persist between the apps, and that was a major problem. As far as the session cookies are concerned, they required the change in the server that I explained above. As far as global state management was concerned, it was written to localstorage and read in the other app. Naive maybe, but worked for a first-time implementation. We had to settle with this tradeoff in the first iteration of the website.
Note: Using localstorage for this purpose was not wrong per se, but it was not the right solution. Because localstorage is blocking and it could cause performance implications.
Code sharing between services
We had a Design system we maintained in parallel where all the frontend components were published into npm separately. It was a simple web-only Design system, nothing complex. There were some hurdles in making the build systems work together. For this, I decided to publish the components as source rather than as built versions.
Even though sharing code using a component library was possible, some features like Serverside rendering and client-side hydration couldn't be shared and had to be rebuilt.
Rebuilding basic features like Serverside rendering and client hydration
Luckily for me, NextJS was available to solve exactly these problems. It was a lifesaver. Suddenly, with NextJS's Static Site Generation and Serverside rendering solutions, I didn't have to worry about reinventing the wheel of building. I gained a lot of time to focus on creating stuff rather than implementing features like tooling, ssr, etc which would have meant that I had to compromise on time or quality.
Logging and Monitoring
I used GCP's internal monitoring tools for server-side monitoring and Sentry for client-side logging.
CI/CD pipeline
We had other engineers who had the know-how of using Terraform in the organization. It was reasonably easy to dockerize the application and use Terraform to deploy the app to GCP.
Challenge #3 - Dividing the problem and conquering
With all the questions answered, I divided the work into pages and delegated that to my team who implemented all the features.
How would I have done it differently now?
The world of Microservices has moved dramatically over the past few years. From major cloud providers like GCP to AWS, we now have a lot of solutions to deploy frontend applications like Vercel, Netlify, Fly, Render, etc which have been paving the future of the web. Especially Vercel which I believe has changed the game for the better.
And so when I faced the same problem recently while working on a website for a client of mine, I knew exactly what I wanted to build and how. It had a homepage, a blog, a dashboard, an API service, a subscription page, etc. I wanted to:
- Keep complex routes isolated as Microservices
- Share as much code as possible
- Implement a reverse proxy to route requests
I used Turborepo, Vercel, and NextJS to solve all these problems. Turborepo is the awesome new monorepo solution by Vercel. It helps me keep all the shared code like Design System, TSConfig, and Eslint Config all in the same place to be shared with any app I build. I then deployed all the microservices as individual apps on Vercel. I then created a reverse proxy to point the base domain to it and all the routes to the respective apps like below. Easy peasy!
How I use Vercel as a reverse proxy
First of all, using third-party reverse proxies like Cloudflare, Fastly, etc is not recommended by Vercel when all the other apps are in Vercel. This is because these proxies might not be globally available as the Vercel apps are. So I had to create the proxy rules myself in vercel. Luckily, we have rewrite rules in vercel effectively emulating a reverse proxy. An example is given below:
"rewrites": [
{
"source": "/route1",
"destination": "https://route1.vercel.app/"
},
{
"source": "/route1/:match*",
"destination": "https://route1.vercel.app/:match*"
},
...
{
"source": "/",
"destination": "https://root.vercel.app/"
},
{
"source": "/:match*",
"destination": "https://root.vercel.app/:match*"
},
]
}
Repository: https://github.com/sreeramofficial/turborepo-starter
Conclusion
You don't usually get challenges like these on a daily basis. I took away a lot of learnings in those 3 months both technically and professionally. It is safe to say this was the toughest project I ever delivered due to many reasons. It was the first time I was assigned a task as big as that to be taken to completion, it was the first time for me to lead people both from a technical as well as from a managerial level, it was during the peak of the pandemic when I was personally struggling socially in a country where I was only beginning to integrate before the pandemic struck. But I am so glad I got to do it with great success.
Also, note that this isn't a post glorifying the use of Microservices. Microservices may not be the answer to all problems. It depends on the use case. Also, just because Amazon decided to go with monoliths, doesn't necessarily mean it's the best solution for us. Just like the answer to any question in Computer Science, it all depends on the requirements. I have explained why I took the decisions I took.
Finally, a special thanks to my team of Nikita Barabanov and Gladson Robinson without whom this wouldn't have been possible.
Top comments (0)