Hey everyone, I'd like to share my experience with Terraform and AWS. In this post I'll describe the resources I used to build a infrastructure on AWS and deploy a NodeJS application on it.
Resources
The application I needed to deploy is a monolithic NodeJS application, so, to deploy and make it scalable I decided to use containers with an autoscaling tool to scale the application based on CPU and Memory usage. To build this environment on AWS I used the services listed below:
- VPC and Networking (Subnets, Internet Groups...)
- Elastic Container Registry
- Elastic Container Service
- Application Load Balancer
- Auto Scaling
- Cloud Watch
Terraform Initial Configuration
The Terraform configuration I used was quite simple. The first step is create a Bucket on AWS S3 to store the Terraform State. It's not required but, it'll make our life easier if someone else needs to maintain this infrastructure. This is the main.tf
file with this configuration.
# main.tf | Main Configuration
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "2.70.0"
}
}
backend "s3" {
bucket = "terraform-state-bucket"
key = "state/terraform_state.tfstate"
region = "us-east-1"
}
}
provider "aws" {
region = var.aws_region
access_key = var.aws_access_key
secret_key = var.aws_secret_key
}
The provider
section is using some variables. We can define variables in a tfvars
. I'll explain it later in this post.
UPDATE: With this initial configuration, just run terraform init
.
VPC and Networking
Let's create a VPC and configure some Networking resources we're gonna use further. The sample code bellow will create a VPC.
# vpc.tf | VPC Configuration
resource "aws_vpc" "aws-vpc" {
cidr_block = "10.10.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.app_name}-vpc"
Environment = var.app_environment
}
}
For Networking, it is necessary to create Public and Private Subnets within the VPC, also a Internet Gateway and Route Tables for Public Subnets. The sample bellow will create these resources
# networking.tf | Network Configuration
resource "aws_internet_gateway" "aws-igw" {
vpc_id = aws_vpc.aws-vpc.id
tags = {
Name = "${var.app_name}-igw"
Environment = var.app_environment
}
}
resource "aws_subnet" "private" {
vpc_id = aws_vpc.aws-vpc.id
count = length(var.private_subnets)
cidr_block = element(var.private_subnets, count.index)
availability_zone = element(var.availability_zones, count.index)
tags = {
Name = "${var.app_name}-private-subnet-${count.index + 1}"
Environment = var.app_environment
}
}
resource "aws_subnet" "public" {
vpc_id = aws_vpc.aws-vpc.id
cidr_block = element(var.public_subnets, count.index)
availability_zone = element(var.availability_zones, count.index)
count = length(var.public_subnets)
map_public_ip_on_launch = true
tags = {
Name = "${var.app_name}-public-subnet-${count.index + 1}"
Environment = var.app_environment
}
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.aws-vpc.id
tags = {
Name = "${var.app_name}-routing-table-public"
Environment = var.app_environment
}
}
resource "aws_route" "public" {
route_table_id = aws_route_table.public.id
destination_cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.aws-igw.id
}
resource "aws_route_table_association" "public" {
count = length(var.public_subnets)
subnet_id = element(aws_subnet.public.*.id, count.index)
route_table_id = aws_route_table.public.id
}
Container Registry and ECS Cluster
Now, it's time to create the Container Registry and the ECS Cluster. First let's create the Container Registry with the code bellow:
# ecr.tf | Elastic Container Repository
resource "aws_ecr_repository" "aws-ecr" {
name = "${var.app_name}-${var.app_environment}-ecr"
tags = {
Name = "${var.app_name}-ecr"
Environment = var.app_environment
}
}
The ECR is a repository where we're gonna store the Docker Images of the application we want to deploy. It works like the Docker Hub, if you're familiar with Docker. You can build the Docker Image locally and push it to the ECR or use a CI/CD platform to do it.
Now we're going to create the ECS Cluster, Service and Task Definition.
A service is a configuration that enables us to run and maintain a number of tasks simultaneously in a cluster. The containers are defined by a Task Definition that are used to run tasks in a service.
Before we create the ECS Cluster, we need to create an IAM policy to enable the service to pull the image from ECR.
# iam.tf | IAM Role Policies
resource "aws_iam_role" "ecsTaskExecutionRole" {
name = "${var.app_name}-execution-task-role"
assume_role_policy = data.aws_iam_policy_document.assume_role_policy.json
tags = {
Name = "${var.app_name}-iam-role"
Environment = var.app_environment
}
}
data "aws_iam_policy_document" "assume_role_policy" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ecs-tasks.amazonaws.com"]
}
}
}
resource "aws_iam_role_policy_attachment" "ecsTaskExecutionRole_policy" {
role = aws_iam_role.ecsTaskExecutionRole.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
}
Now let's create what we need for ECS. First we create the ECS Cluster:
resource "aws_ecs_cluster" "aws-ecs-cluster" {
name = "${var.app_name}-${var.app_environment}-cluster"
tags = {
Name = "${var.app_name}-ecs"
Environment = var.app_environment
}
}
I created a Log Group on CloudWatch to get the containers logs.
resource "aws_cloudwatch_log_group" "log-group" {
name = "${var.app_name}-${var.app_environment}-logs"
tags = {
Application = var.app_name
Environment = var.app_environment
}
}
I created a Task Definition compatible with AWS FARGATE, I preferred to do so in order to have a better cost of this infrastructure.
data "template_file" "env_vars" {
template = file("env_vars.json")
}
resource "aws_ecs_task_definition" "aws-ecs-task" {
family = "${var.app_name}-task"
container_definitions = <<DEFINITION
[
{
"name": "${var.app_name}-${var.app_environment}-container",
"image": "${aws_ecr_repository.aws-ecr.repository_url}:latest",
"entryPoint": [],
"environment": ${data.template_file.env_vars.rendered},
"essential": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "${aws_cloudwatch_log_group.log-group.id}",
"awslogs-region": "${var.aws_region}",
"awslogs-stream-prefix": "${var.app_name}-${var.app_environment}"
}
},
"portMappings": [
{
"containerPort": 8080,
"hostPort": 8080
}
],
"cpu": 256,
"memory": 512,
"networkMode": "awsvpc"
}
]
DEFINITION
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
memory = "512"
cpu = "256"
execution_role_arn = aws_iam_role.ecsTaskExecutionRole.arn
task_role_arn = aws_iam_role.ecsTaskExecutionRole.arn
tags = {
Name = "${var.app_name}-ecs-td"
Environment = var.app_environment
}
}
data "aws_ecs_task_definition" "main" {
task_definition = aws_ecs_task_definition.aws-ecs-task.family
}
An observation about the Task Definition is that I'm using the Terraform data
function to set some environment variables that I defined in a JSON file (it needs an improvement to use AWS EKS or some other way to store secrets).
Ok, now let's create the ECS Service.
resource "aws_ecs_service" "aws-ecs-service" {
name = "${var.app_name}-${var.app_environment}-ecs-service"
cluster = aws_ecs_cluster.aws-ecs-cluster.id
task_definition = "${aws_ecs_task_definition.aws-ecs-task.family}:${max(aws_ecs_task_definition.aws-ecs-task.revision, data.aws_ecs_task_definition.main.revision)}"
launch_type = "FARGATE"
scheduling_strategy = "REPLICA"
desired_count = 1
force_new_deployment = true
network_configuration {
subnets = aws_subnet.private.*.id
assign_public_ip = false
security_groups = [
aws_security_group.service_security_group.id,
aws_security_group.load_balancer_security_group.id
]
}
load_balancer {
target_group_arn = aws_lb_target_group.target_group.arn
container_name = "${var.app_name}-${var.app_environment}-container"
container_port = 8080
}
depends_on = [aws_lb_listener.listener]
}
I also defined a Security Group to avoid external connections to the containers.
resource "aws_security_group" "service_security_group" {
vpc_id = aws_vpc.aws-vpc.id
ingress {
from_port = 0
to_port = 0
protocol = "-1"
security_groups = [aws_security_group.load_balancer_security_group.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
tags = {
Name = "${var.app_name}-service-sg"
Environment = var.app_environment
}
}
Application Load Balancer
The next step is to setup a Load Balancer. As you could notice on the ECS configuration is that there's a reference to a load_balancer
on it.
resource "aws_alb" "application_load_balancer" {
name = "${var.app_name}-${var.app_environment}-alb"
internal = false
load_balancer_type = "application"
subnets = aws_subnet.public.*.id
security_groups = [aws_security_group.load_balancer_security_group.id]
tags = {
Name = "${var.app_name}-alb"
Environment = var.app_environment
}
}
Now let's add a security group for the Load Balancer
resource "aws_security_group" "load_balancer_security_group" {
vpc_id = aws_vpc.aws-vpc.id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
tags = {
Name = "${var.app_name}-sg"
Environment = var.app_environment
}
}
We also need to create a Load Balancer Target Group, it will relate the Load Balancer with the Containers.
resource "aws_lb_target_group" "target_group" {
name = "${var.app_name}-${var.app_environment}-tg"
port = 80
protocol = "HTTP"
target_type = "ip"
vpc_id = aws_vpc.aws-vpc.id
health_check {
healthy_threshold = "3"
interval = "300"
protocol = "HTTP"
matcher = "200"
timeout = "3"
path = "/v1/status"
unhealthy_threshold = "2"
}
tags = {
Name = "${var.app_name}-lb-tg"
Environment = var.app_environment
}
}
One very important thing here is the attribute path
within health_check
. This is a route on the application that the Load Balancer will use to check the status of the application.
At last let's create a HTTP listener for out Load Balancer.
resource "aws_lb_listener" "listener" {
load_balancer_arn = aws_alb.application_load_balancer.id
port = "80"
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.target_group.id
}
}
Autoscaling
So, autoscaling is essential for the application I'm working on. To configure it on AWS I just needed to create an Autoscaling Target and two simple Autoscaling Policies. One to scale by CPU usage and another one for Memory usage.
# autoscaling.tf | Auto Scaling Group
resource "aws_appautoscaling_target" "ecs_target" {
max_capacity = 2
min_capacity = 1
resource_id = "service/${aws_ecs_cluster.aws-ecs-cluster.name}/${aws_ecs_service.aws-ecs-service.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
resource "aws_appautoscaling_policy" "ecs_policy_memory" {
name = "${var.app_name}-${var.app_environment}-memory-autoscaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.ecs_target.resource_id
scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
service_namespace = aws_appautoscaling_target.ecs_target.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageMemoryUtilization"
}
target_value = 80
}
}
resource "aws_appautoscaling_policy" "ecs_policy_cpu" {
name = "${var.app_name}-${var.app_environment}-cpu-autoscaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.ecs_target.resource_id
scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
service_namespace = aws_appautoscaling_target.ecs_target.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = 80
}
}
So, the application will scale up if the memory or the cpu usage reaches 80% of usage. When it comes bellow this value, the application will scale down.
Variables
I believe you noticed we used a lot of variables for the Terraform configuration files. To use variables I created a file called variables.tf
. This file only have the variables definitions.
# variables.tf | Auth and Application variables
variable "aws_access_key" {
type = string
description = "AWS Access Key"
}
variable "aws_secret_key" {
type = string
description = "AWS Secret Key"
}
variable "aws_region" {
type = string
description = "AWS Region"
}
variable "aws_cloudwatch_retention_in_days" {
type = number
description = "AWS CloudWatch Logs Retention in Days"
default = 1
}
variable "app_name" {
type = string
description = "Application Name"
}
variable "app_environment" {
type = string
description = "Application Environment"
}
variable "cidr" {
description = "The CIDR block for the VPC."
default = "10.0.0.0/16"
}
variable "public_subnets" {
description = "List of public subnets"
}
variable "private_subnets" {
description = "List of private subnets"
}
variable "availability_zones" {
description = "List of availability zones"
}
The values for each variable are defined in a file called terraform.tfvars
.
aws_region = "us-east-1"
aws_access_key = "your aws access key"
aws_secret_key = "your aws secret key"
# these are zones and subnets examples
availability_zones = ["us-east-1a", "us-east-1b"]
public_subnets = ["10.10.100.0/24", "10.10.101.0/24"]
private_subnets = ["10.10.0.0/24", "10.10.1.0/24"]
# these are used for tags
app_name = "node-js-app"
app_environment = "production"
This file is not committed in my repository. I created it locally and use S3 to manage access and control its versions. It needs some improvements as well that I'll do further.
UPDATE: Now, with all the configuration files properly written, run the command terraform plan
to check what changes are going to be done and terraform apply
to review and apply the changes.
You can also be asking about the Database. Well, in this project I created a Cluster on MongoCloud and put the credentials on the environment.
The full code can be found on my [Github].(https://github.com/thnery/terraform-aws-template)
Thanks
Thank you for reading this post. I hope it could be useful. If you have any feedback, please, let me know.
Top comments (11)
Thank you very much for this article. I got most things working except I am getting an error for the task to pull the ecr image.
service call has been retried 3 time(s): RequestError: send request failed caused by: Post https://api.ecr.ap-southeast-2.amazonaws.com/: dial tcp 99.82.184.189:443: i/o timeout
Anyone got the same issue?
The problem is that your service/task is in a private subnet, and is not able to access the ECR registry, to solve this you need to create a NAT gateway to provide such access... stackoverflow.com/a/66802973
You must use us-east-1 to uplaod the image, while pulling image doesn't necessary in us-east-1
I found this article very helpful when setting up my ECS cluster.
A several improvements I'd like suggest:
execution_role_arn
andtask_role_arn
(in order to follow least permissions principle)aws_access_key
andaws_secret_key
in .tfvarsHi brother.
data "template_file" "env_vars" {
template = file("env_vars.json")
i need the content of env_vars file. I don't know the content of that file also you can share me your github to minhphuong49th@gmail.com ? Thanks. I just start to learn terraform about three months
Github link is dead
Any idea on how to simplify your approach by creating the basics for aws.amazon.com/blogs/containers/au...?
The repo no longer exists.
Hi,
Any way to get the env_vars.json file?
Can't access the repository either is no longer avilable.
Cheers
Can someone tell me please, where is the best tutorial for learning Terraform.
Very nice article. Thanks for sharing.