This talk will be/was presented at the AWS Community Day 2024 - Bacolod on November 16, 2024
Scenario
I love taking photographs and would like to create a website selling those prints of those photographs to my friends and family. I started out selling it on Facebook and business has since taken off. I'm selling 100 prints per day, and it's hard to keep track of each transaction. So I decided to do my custom website with the following features:
- Create Product: upload the image for preview, and indicate the initial number of prints available for sale.
- View Products: so buyers can see the photos for sale.
- Checkout: customers can only buy one print design at a time. They can indicate how many of those designs they want to buy, their email, and address.
- View Orders: view my orders.
To simplify things, we don't have any login for now. All users can see everyone's orders.
For this POC, we use the following tech stack:
- Python and Flask: for the development of the BE and FE
- MongoDB: to store information about the uploaded image
- PostgreSQL: to store data about orders
To give you a headstart, I already developed this application. You can access the Gitlab repository here
Lab 1: Simple deployment via LightSail
The simplest way to get this up and running is to deploy everything inside one server using Amazon LightSail. In one server, we run:
- eCommerce Application (Python/Flask)
- MongoDB database
- PostgreSQL database
The problem with the setup above is that it is a single point of failure. If the server goes down, our app goes down along with all of our data with it. Let's fix that. Let's create 3 servers, one for each of the components:
This solves our problem of losing everything when our eCommerce application goes down. But now, we have to think about the server that holds the MongoDB and PostgreSQL databases. We have to ensure that they don't go down. Or at least, if they do go down, we still have our data intact. In this case, we use the native replication of features of MongoDB and PostgreSQL to have a secondary DB instance that will hold the data. Data written on the primary server have to be written here as well before we return success. This way, we can have two copies of our data in case the primary goes down.
Lab 2: Deploy to Amazon VPC
The LightSail servers we are managing are starting to get a lot. And they are all publicly available. The problem with this is we have to make an effort to reinforce all of the servers since it is publicly available. Any hacker can reach those instances. We have to ensure there are no gaps in the security. We can reinforce each instance. But when our 5 servers become 500, we would have to do that process over and over. And I don't like to do that. There's just so much to miss.
To solve this, we will create our virtual network in AWS using a service called AWS VPC. The VPC has a private and public subnet. Servers inside the private subnet cannot be reached directly. We must add a server inside the public subnet that can route traffic to the instances inside the private subnet. This is the only server that needs to be hardened since it's the only one facing the public. Let's call this server our Proxy Server.
We will then move our LightSail server to a parallel AWS service called Amazon EC2 and deploy those servers to the private subnet. When our customer visits our website, they first connect to the Proxy Server, which then redirects their request to the application server. The app server then connects to the MongoDB and PostgreSQL server for the data it needs.
Lab 3: Use Managed Services to make your life easier
An improvement that can be made here is replacing the proxy server with an AWS service called AWS Application Load Balancer. It is deployed in the public subnet and redirects traffic to the application server. The beauty of this is we don't have to configure and maintain a Proxy Server. We can have this functionality with a press of a button.
The problem with Lab 2 is we have to manually set up the replication. And if you're an application developer, you'd have to take time to learn and set up how replication works for DynamoDB and MongoDB. This may take days or even weeks to do. Pushing deadlines even further down.
Thankfully, AWS has recognized that their users usually do this replication themselves. They created a simple way for users to deploy PostgreSQL and MongoDB database servers. Using Amazon RDS and Amazon DocumentDB, we can provision a database server. And with a few additional clicks, we can configure it to be highly available, and its data encrypted with Amazon KMS.
Lab 4: Prepare our Application for Scaling - Remove State from your Application Servers
Weeks passed and everything is good with your application. Your users are buying your prints like crazy and you can focus on being out in the field and taking more photos. To sell more photos, you made a B1T1 promo on the 15th of the month. Your customers all take to your website at the same time. And your application server gets overloaded and goes down. To keep your application up, you increase the size of your EC2 instance by 8x, from t2.micro to t3.large.
Your customers are now happy and your website works. You become busy fulfilling orders for the next 2 weeks. Then, when your AWS bill comes back, you have a small heart attack. You forgot that you left your EC2 instance in t3.large, 8x the cost! For the next sales, you now spend energy to remind yourself to scale down the instance every time there is a sale. But the problem is when there is no sale and the customers flock to your website, you have to react and increase the size again. And of course, there would be a gap between you scaling the server and when your customers first feel it. So there must be a better way.
Enter auto-scaling groups. What this does is it adds more EC2 instances when it is needed. And removes it when the demand is no longer there.
But before we get to that, we need to think about whether our application is ready for that. Currently, our application stores its data on separate servers—order data is stored in DocumentDB, and order data is stored in Amazon RDS. For these data, if we have two application servers, it would be okay because each can just contact the separate database servers for the data.
However, the images are still stored inside the first application server. If we create a second application server, the images originally uploaded on the 1st server will not be available to our users when their request happens to be served by the 2nd server. Hence, we would need to separate the storage of our images. In this goal, we would use Amazon EFS. We would mount Amazon EFS on both application servers. When a user uploads a file, it is uploaded to the directory that is connected to the EFS drive. It's like a shared drive because as soon as the upload is complete, the image becomes immediately available to the other application server.
Lab 5: Auto Scaling your EC2 instances (so you don't have to)
Our application servers are now completely stateless. It means that these servers don't hold data inside them. The data are delegated to EFS for the images, RDS for the orders data, and DocumentDB for the image metadata. Having our app servers to be stateless is critical for us to be able to add more app servers to serve our customers. This ensures that the customer's data will be available to them regardless of which app servers serve their traffic.
Now, let's go back to the original problem. We don't want to be the ones manually adding and removing application servers. For this, we use Amazon Auto Scaling Groups. First, we create an image of our application server. This is called an AMI. We can use this image to recreate the exact state of our application server - the code, the libraries inside, and the installed OS packages. This creates an identical twin of our first application server. Second, we configure the condition to which the auto-scaling group will add more application servers. The simplest is having the ASG take a look at the average CPU utilization of all the application servers currently in existence. Once it exceeds, say 80% average CPU utilization, the ASG uses the AMI to create more copies of the application server to serve the traffic.
Lab 6: Deploy your changes quickly with CI/CD pipelines
With this setup, you'd quickly run into the problem of updating your application server. Every time there is an update, you would have to update the image being used by your auto-scaling group. Then, you would terminate all existing app servers so the ASG can provision new app servers with the updated code. This step is very tedious to do and introduces downtime to your application.
In this final lab, we would create a CI/CD pipeline that deploys to the current set of EC2 instances in the auto-scaling group. We also update the settings of the auto-scaling group to execute commands to pull the latest code for every new application server it provisions.
We will create the CI/CD pipeline with the AWS CodePipeline service. Our code will be stored on a public GitHub repository, and we will configure the pipeline to trigger every time there is an update on the main branch of our repository. The second stage of our pipeline will be using CodeDeploy to deploy our code to the Auto Scaling Group.
The simplest setting in CodeDeploy would be to deploy the code change simultaneously on all servers using a technique called All-At-Once. The problem with this technique is that application servers usually go down when new code is being loaded. So when we do an all-at-once, there may be a few seconds to a minute of downtime as all the app servers are receiving and loading the updates.
An update we can do is to use Half-At-A-Time or One-At-A-Time deployment configurations in CodeDeploy. In this configuration, CodeDeploy chooses half of the app servers and deploys the update to it, leaving the rest of the app servers available to serve your customer's traffic.
Another problem you'd probably encounter is what if the deployment had a problem. As much as we try to guarantee poor code doesn't reach our production environment, there will still be instances when it would. In the current setting, it deploys to the first half. But then we must configure health checks on CodeDeploy to determine if the deployment was a success. We define success by having our app servers pass a series of checks called a health check. An example health check might be to ensure our website returns success when calling the homepage and the products page.
Hints
You will provision the EC2, RDS, and DocumentDB instances. Then, you will need to ensure the Flask application inside the EC2 can talk to RDS and DocumentDB. You'll most likely encounter problems with connectivity. Here are our tips:
- Ensure you deployed all those components in the same region and the same VPC
- Check the security groups of each of these components. Check to ensure inbound rules allow communication on the specific ports needed by each component (RDS - 5432, DocumentDB - 27017, EC2 - 5000)
- What helps me learn is to screenshot every step I did when I'm using the AWS service for the first time. So I can get back to them later on.
Top comments (0)