While working on Datalynx I found myself needing to create multiple test environments including Production. I started working on the proof-of-concept by building the infrastructure using the AWS console. That was less than ideal and I needed to mentally keep track of all the configurations I set up for each resource. At the beginning the app was simple, the deployment process was manual and we didn’t put much effort into security and scaling. But the more complex the application became the harder (and time-consuming) was replicating the infrastructure for new environments. Plus it was a process that was very much prone to errors.
Since I like to solve many problems of my life with code I wanted to solve this issue with code as well.
So how do I “code” my AWS infrastructure, and make it easy to edit and build different clones of environments and resources without suffering?
💡 THE MAGICAL AWS CDK
AWS CDK is a collection of tools that lets you treat AWS resources as objects. You can use your preferred programming language and execute from your local machine like you would use the AWS CLI. For the sake of simplicity, I’ll use Python for this guide.
CDK defines configurations in a Stack. A Stack is a collection of configurations that contains resources. In this guide, we’ll be using a Stack as an environment. Each stack will be a separate environment for the same application. This means that if you want to spawn a new environment you would simply copy-paste the existing Stack, change names, and deploy.
Let’s talk about resources and a real-life example.
USE CASE: I have a Python API and I want to host it on AWS.
Generally speaking, the most common use case is to get our app running and make it available on the internet using HTTPS.
To make this happen we will need to create the following resources on AWS:
- VPC: networking for your app
- Application Load Balancer: routes traffic to your application
- IAM Roles: manage permissions to execute and run the service’s tasks
- CloudWatch log group: contains logs of your app
- ECR repository: contains the Docker image of your app
-
ECS cluster: manages your app instances
- ECS Service: the process that keeps your app up and running
- EC2 Auto Scaling group capacity provider: takes care of scaling your app
- Route 53 record: connects your domain to your app
Now from my experience, some resources are straightforward to manage using CDK, and others are not. Those are usually resources that contain data that has been uploaded by the user or by a machine. I found myself having issues with ECR and S3 since editing those sometimes requires CDK to recreate them or ignore them. That means either losing data or not getting your change out at all. CDK lets you import resources that have already been created using the AWS Console letting you essentially manage an existing infrastructure and add things to it. This is a hybrid approach I’ll be executing here.
Another thing to consider is that some resources can be shared between Stacks and don’t have to be specific.
VPC
AWS comes with an existing VPC called the ‘default’ VPC. We’ll be using this one and this resource WILL NOT be managed by the instance Stack but it will be imported to it.
ECR
The Elastic Container Registry will contain the Docker images of your app. Managing this resource using CDK gave me all sorts of trouble. I will not want you to go through that so for simplicity this is another resource that WILL NOT be managed by the instance Stack but you’ll have to create it and import it.
IAM ROLES
Each Stack will be sharing the same roles since the permissions will be the same. Therefore those WILL NOT be managed by the instance Stack.
ROUTE 53 HOSTED ZONE
Another resource that is shared between Stacks is a hosted zone. This one needs to be configured to an existing domain that you purchased elsewhere. If you haven’t pointed your domain name servers to Route 53 and you purchased your domain on GoDaddy here is a guide that will help you with that. Again this is a shared resource so it WILL NOT be managed by your instance Stack.
HTTP CERTIFICATE
To make sure your Application Load Balancer can support HTTPS connection you need a certificate. AWS offers a service called AWS Certificate Manager. You can create a certificate in minutes and attach it to your load balancer in code. This too will be a shared resource and WILL NOT be managed by your instance Stack.
You are probably thinking “There are a lot of shared resources between stacks, this does not solve my problem entirely if I still have to log into the console and make those myself!”.
And you are right. A solution for this dilemma is to create a separate Stack that manages ONLY the shared resources. This way you never have to log into the console and can manage your entire project using CDK in your infrastructure project.
How do I start?
Setting up CDK and a project is trivial. I’ll leave this task to the official AWS guide here to install CDK on your machine and this one to set up the CDK project. Once that is done you can come back here. I like to stick to this naming convention when creating my Stack [name of your project][name of your environment]Stack.
⏩ Importing existing resources
As stated before we need to import existing resources first.
# Your defaul VPC
default_vpc = ec2.Vpc.from_lookup(self, 'vpc-xxxx', is_default=True)
# ECR repository
ecr_dl_backend = ecr.Repository.from_repository_name(
self, "ECRBackend",
repository_name="my_app_repository_name"
)
# IAM Role for the Docker instance
instance_role = iam.Role.from_role_arn(
self, 'EcsInstanceRole',
role_arn='arn:aws:iam::xxxx:role/ecsInstanceRole'
)
# IAM Role for the task execution
task_execution_role = iam.Role.from_role_arn(
self, 'EcsTaskExecutionRole',
role_arn='arn:aws:iam::xxxx:role/ecsTaskExecutionRole'
)
# Certificate imported from ACM
certificate = acm.Certificate.from_certificate_arn(self, "MyDomainCert", "arn:aws:acm:us-xxx:xxxxx")
# Route 53 Hosted zone
hosted_zone = route53.HostedZone.from_lookup(
self, "Route53HostedZoned",
domain_name="myapp.com"
)
you can see from here that we need 2 roles for ECS. Those can be found in IAM. From there you can grab the arn and insert it in the code. An AWS account comes with the EcsInstanceRole and the EcsTaskExecutionRole with those already included in the list of roles.
⏩ Create Cluster, Service, Auto Scaling Group
# Create Cluster
ecs_cluster = ecs.Cluster(
self, "MyCluster",
cluster_name="MyAppClusterName",
vpc=default_vpc,
)
# Security group for the Ec2 instance
ec2_security_group = ec2.SecurityGroup(
self,
"EC2SecurityGroup",
vpc=default_vpc,
allow_all_outbound=True,
description="Accepts all ELB traffic",
)
# Ec2 instance launch template
launch_template = ec2.LaunchTemplate(
self,
"MyAppLaunchTemplate",
instance_type=ec2.InstanceType.of(
ec2.InstanceClass.T3A, ec2.InstanceSize.SMALL),
machine_image=ecs.EcsOptimizedImage.amazon_linux2023(),
launch_template_name="MyAppLaunchTemplate",
user_data=ec2.UserData.for_linux(),
role=instance_role,
security_group=ec2_security_group
)
# Create Auto Scaling Group
asg = autoscaling.AutoScalingGroup(
self, "MyAsg",
launch_template=launch_template,
vpc=default_vpc,
min_capacity=1,
max_capacity=3,
ssm_session_permissions=True,
vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PUBLIC)
)
# Create Capacity Provider for Auto Scaling Group
asg_capacity_provider = ecs.AsgCapacityProvider(self, 'MyAsgCapacityProvider', auto_scaling_group=asg)
# Adding capacity provider to Cluster
ecs_cluster.add_asg_capacity_provider(asg_capacity_provider)
# Create Task definition for app
task_definition = ecs.Ec2TaskDefinition(self, "TaskDef", execution_role=task_execution_role)
# Add log group
log_group = logs.LogGroup(
self, "MyAppLogGroup"
)
# Add container info, attachs the ECR images to the container
container = task_definition.add_container(
"BackendContainer",
image=ecs.ContainerImage.from_ecr_repository(ecr_dl_backend),
memory_reservation_mib=600, # change needed RAM depending on how much memory your app uses on IDLE
essential=True,
health_check=ecs.HealthCheck(
command=["CMD-SHELL", "curl -f http://localhost/health-check/ || exit 1"],
interval=Duration.seconds(30),
timeout=Duration.seconds(3),
retries=3,
start_period=Duration.seconds(5)
),
logging=ecs.AwsLogDriver(
log_group=log_group,
stream_prefix="AppLogGroup",
mode=ecs.AwsLogDriverMode.NON_BLOCKING
)
)
# Use container port 80
container.add_port_mappings(
ecs.PortMapping(container_port=80)
)
# Create the service
service = ecs.Ec2Service(
self, "BackendService",
service_name="BackendService",
cluster=ecs_cluster,
task_definition=task_definition
)
- We chose to run 1 instance of your app for now (min_capacity=1). You can scale as you wish depending on the load of your app
- you could also edit your Auto Scaling Group to change the number of instances based on your parameters (here is a good guide for it)
- ECS needs a health check url to check if the app is alive. Make sure to edit that. In this case that is
/health-check
- We are only opening port 80 because the ALB will be connecting to it. Not users from the internet directly.
⏩ Create Application Load Balancer
Now we have created the system that keeps the app running and is extensible to auto scale based on your need. Let’s see how we can hit connect to our API by attaching our ECS to an Application Load Balancer (ALB)
# Create ALB Security group
alb_security_group = ec2.SecurityGroup(
self,
"ALBSecurityGroup",
vpc=default_vpc,
allow_all_outbound=True,
)
alb_security_group.add_ingress_rule(ec2.Peer.any_ipv4(), ec2.Port.tcp(80))
alb_security_group.add_ingress_rule(ec2.Peer.any_ipv4(), ec2.Port.tcp(443))
# Creat ALB
lb = elbv2.ApplicationLoadBalancer(
self, "ALB",
vpc=default_vpc,
internet_facing=True,
security_group=alb_security_group
)
# Add HTTP and HTTPS listener using the certificate we imported
http_listener = lb.add_listener(
"HTTPListener",
port=80,
open=True
)
https_listener = lb.add_listener(
"HTTPSListener",
port=443,
protocol=elbv2.ApplicationProtocol.HTTPS,
open=True,
certificates=[certificate]
)
# ALB Health check
health_check = elbv2.HealthCheck(
interval=Duration.seconds(60),
path="/health-check/",
timeout=Duration.seconds(5)
)
# Connects to the ECS Service
target_group = elbv2.ApplicationTargetGroup(
self, "TargetGroup",
vpc=default_vpc,
port=80,
targets=[service],
health_check=health_check
)
# Add target group to both listeners
http_listener.add_target_groups(
"HTTPListenerTargetGroup",
target_groups=[target_group]
)
https_listener.add_target_groups(
"HTTPSListenerTargetGroup",
target_groups=[target_group]
)
# Attaching the ALB URL to route 53 CNAME record
route53.CnameRecord(
self, "ALBHostedURLRecord",
zone=hosted_zone,
record_name='api.myapp.com',
domain_name=lb.load_balancer_dns_name
)
- The security group allows connection on ports 80 (HTTP) and 443 (HTTPS)
- The ALB also needs a health check. We’ll be using the same URL.
- Note
api.myapp.com
is going to be our API url. Change that as you wish- make sure that the record name has the hosted zone name
-
api
+your hosted zone name
is a very common standard if we are hosting an API.
⏩ Deploy our app and make it available on the internet
Once all the infrastructure is done and deployed we are ready to push our app on ECR and triggers a trigger a deployment on ECS. Let’s say you have your nice Python Flask app and your Dockerfile that builds your application and runs it on port 80 when started on a container.
Our deployment system will look something like this:
- Build your Docker image
-
docker build -t my-backend-app
-
- Upload our image on ECR using the AWS CLI
docker push [aws_account_id].dkr.ecr.[region].amazonaws.com/[my_app_repository_name]
- Force an ECS deployment using the same latest task definition
aws ecs update-service --cluster [cluster name] --service [service name] --force-new-deployment --region [region]
The deployment steps seem short but you’ll want to automate it at one point. Depending on your preferred software development life cycle you can get this done in one click by setup a CI pipeline on GitHub, Gitlab or Bitbucket. This way you get to the results faster!
We like getting results fast at Datalynx and we are building a platform that helps businesses getting their insights instantaneously, by cutting the middle man and using the latest LLM tech.
Top comments (2)
Thanks for this guide, Ric! AWS documentation related to different environments is really screwed up.
Thank you for this amazing guide!