This is the 7th part of the "Deploying Django Application on AWS with Terraform" guide. You can check out the previous steps here:
- Part 1: Minimal Working Setup
- Part 2: Connecting PostgreSQL RDS
- Part 3: GitLab CI/CD
- Part 4: Namecheap Domain + SSL
- Part 5: Celery and SQS
- Part 6: Connecting to Amazon S3
In this part, we'll make our Django web application scalable using ECS Autoscaling.
Autoscaling is the ability to increase or decrease the number of running instances. It allows you to handle traffic spikes and save money for low intensive periods of time.
When you enable autoscaling for ECS service, AWS creates Cloudwatch alarms to determine whether we need to add a new instance or remove a redundant one.
Let's see how it works in practice.
ECS Autoscaling configuration
First, create a new autoscale.tf
with the following content:
resource "aws_appautoscaling_target" "prod_backend_web" {
max_capacity = 5
min_capacity = 1
resource_id = "service/${aws_ecs_cluster.prod.name}/${aws_ecs_service.prod_backend_web.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
resource "aws_appautoscaling_policy" "prod_backend_web_cpu" {
name = "prod-backend-web-cpu"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.prod_backend_web.resource_id
scalable_dimension = aws_appautoscaling_target.prod_backend_web.scalable_dimension
service_namespace = aws_appautoscaling_target.prod_backend_web.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = 80
}
depends_on = [aws_appautoscaling_target.prod_backend_web]
}
Here we defined:
-
AWS AppAutoscaling Target. We want to scale the
prod_backend_web
service by instance count dimension from 1 to 5. -
AWS AppAutoscaling Policy. We want to scale
prod_backend_web
up when theECSServiceAverageCPUUtilization
metric exceeds 80%.
We are ready to apply changes, but first, let's think about load balancer health checks.
Load Balancer Health Checks
Now we have a quire aggressive health checks. If the container fails to respond twice with a timeout of 2 seconds, Load Balancer considers this container unhealthy and removes them.
It could be an okay solution for the little amount of traffic. But if many requests reach the container and CPU usage goes up to 100%, the container will fail to respond to Health Checks. So, Load Balancer kills them, and we will face an even worse situation: there will be no containers to handle the traffic at all
The possible solution is to increase health checks timeout
and unhealthy_threshold
. Thus, there will be more possibility for overloaded containers to survive.
I think it's not a perfect solution, but it will work for this test. If you know a more elegant way to keep overloaded containers running, feel free to leave a comment.
Go to the load_balancer.tf
and increase unhealthy_threshold
, timeout
, and interval
parameters.
# Target group for backend web application
resource "aws_lb_target_group" "prod_backend" {
...
health_check {
...
unhealthy_threshold = 5
timeout = 29
interval = 30
...
}
}
Let's apply our changes and check them at the AWS console.
CloudWatch Alarms
First, go to the ECS console and check the autoscaling policy for the prod_backend_web
ECS Service. Select prod
ECS cluster, select prod-backend-web
service and click "Update". Pass to the step "Set Auto Scaling" and click on the prod-backend-web-cpu
autoscaling policy.
Here we see that autoscaling becomes effective when average CPU utilization reaches 80%. But what is the condition for scaling down? Let's check CloudWatch alarms associated with this autoscaling policy.
Go to the CloudWatch console and look at the alarms.
Here we see that we scale up when the average CPU load exceeds 80% during 3+ minutes. And, we scale down when the average CPU load goes less than 72% for 15 minutes.
Such specific numbers, but how can we adjust them to our case? For this, you need to create and use custom metrics for alarms with customized_metric_specification param in aws_appautoscaling_policy
.
Also, you can change AlarmHigh
and AlarmLow
metrics manually in the console. It's not a preferable way to create a repeatable setup, but it's okay for our test. So, I'll change the AlarmLow
metric to 50% and 10 minutes.
Stress Testing
Let's move to the tests. I'll use ApacheBenchmark for stress testing. This tool can send a lot of requests to our service, so the CPU load goes up.
First, ensure that now the web service has only one container running.
Also, you need to increase the limit of open files with ulimit -n 10000
.
Now we are ready to run the benchmark. We'll use the health-check URL for this test:
$ ab -n 100000 -c 1000 https://api.example53.xyz/health/
Where -c 1000
concurrent number of requests, -n 100000
is the total number of requests.
Check the CloudWatch metrics and ECS Service for the next 10-15 minutes.
First, you should see the CPU spike in the charts. After 3 minutes, ECS autoscale starts to spawn new instances.
Then, the average CPU drops below 80%. There were 3 ECS tasks at this moment of time.
After some time, CPU load exceeds 80% again, and ECS autoscale creates the 4th instance. You can see them on the ECS console.
So, scale up works; let's check scale down. Stop ApacheBenchmark and wait for 10-15 minutes to wait for scale down.
You'll see how CPU load drops to zero and ECS scales down the web service to 1 instance.
Recheck the ECS console to ensure that we have only one web task running:
So, scale down works too. Let's commit and push our changes to the infrastructure
repository.
The end
Congratulations! In this part, we added ECS autoscaling for the web service. We increase Health Check timeout and period to prevent killing overloaded containers. Then, we run a stress test and verify that number of instances increases when CPU load goes up and decreases when CPU load goes down.
You can find the source code of backend and infrastructure projects here and here.
If you need technical consulting on your project, check out our website or connect with me directly on LinkedIn.
Top comments (0)