On-demand preview environment is a strategy to spin up a temporary infrastructure and isolated environments on the fly. This helps us open a discussion with other teams such as Product and QA at an early stage of the release process and improves cross-team visibility. So, In this article, we'll see how we can achieve this with AWS ECS and Cloudflare.
All the code is available in this repository
Why do we need it?
Let's see how this can bring benefits to our release and team workflow processes. This is an example from my personal experience.
Usual Workflow
Currently, QA and product reviews are tightly coupled to releases and it's often hard to roll back changes once they're in the release itself.
New Workflow
This will provide a huge benefit to QA and the Product team as they will be able to do a soft review on the changes. The product Team will no longer need to wait till the changes have reached to the staging environment to review. Same with QA Team, they can test your changes right at the pull request level.
Challenges
Let's look at some challenges I faced while architecting this, and how offerings from Cloudflare helped.
SSL
One of the big challenges was setting up SSL, because we cannot use certificate generated by AWS ACM with our own custom Nginx proxy as ACM only works with AWS services like CloudFront, ALB, API Gateway, etc.
There are a few approaches I saw online while researching this:
One approach is to use Let's Encrypt to generate temporary SSL certs. Here's a good implementation for this. But this presents with other issues of managing all the certs we generate through Let's Encrypt.
Another approach is to just add a new Route 53 record and then forward it to ALB. The issue is not that we'll need to provision these resources but we'll have to provision and destroy them quite frequently!
Cloudflare Argo Tunnel to the rescue! With this, we can simply close all the ingress and instead expose our traffic through the tunnels. After that we can create a proxied DNS record and Cloudflare can handle the SSL stuff for us!
Here's my previous article where I cover argo tunnels in detail
Security
The most important part of this is the security, because what's stopping me from exposing a backdoor to our AWS infrastructure either intentionally or by mistake to the internet just by simply including risky changes in my pull request? We need a secure way to expose these temporary environments.
One solution that initially came to my mind was AWS VPN or something similar. So that we can only allow access to the environments to the people using our VPN. Sounds good? But ultimately this would've required us to onboard every team member to setup and use VPN.
Cloudflare Access is a game changer, and it's free up to 50 users! This is just what I needed to create a secure, faster, and zero trust access to the temporary environments without the VPN.
We'll talk more about all the Cloudflare usage in detail in Access section.
Architecture
Our architecture is pretty simple and intuitive. On the left, we can see how we build our app and provision our infrastructure when the developer opens a new pull request and labels it. An interesting component is the custom script which we'll implement. In the middle, we go into some detail about our infrastructure setup with AWS ECS. On the right, we see how we leverage Cloudflare Argo Tunnel and Cloudflare Access for securing access to our temporary environments endpoint.
view this in higher resolution
Implementation
I've divided the whole thing into three sections:
- Setup
- Infrastructure
- Access
Note: Grep the repository for todo-
to get all the things you need to provide (ie. keys, tokens)
Setup
In this step, we'll see how we can use Github Actions and our custom provisioning script, etc.
Github Action
We basically need to listen to the pull_request
event with the following types labeled
, unlabeled
, synchronize
, closed
. We have the following steps in our GitHub action:
-
Provision
We will create our preview environment once the pull request islabeled
, andsynchronize
once new commits are pushed.- Create a Cloudflare Argo tunnel, access policies, and access application.
- Store the credentials in
config.yml
- Copy the credentials to our docker image during the build, so we can create an outbound connection to Cloudflare at runtime.
- Fill and register the task definition.
- A script to process events from GitHub and provision temporary AWS and Cloudflare infrastructure.
-
Destroy
We will destroy our preview environment once the pull request isclosed
, orunlabeled
.- Destroy temporary AWS and Cloudflare resources (i.e argo tunnel, access policy, access apps)
Github action is already included in the repository .github/workflows/preview-environment.yml
. Here's a snippet.
name: Preview Environment
on:
pull_request:
types: [labeled, unlabeled, synchronize, closed]
branches:
- develop
env: ...
jobs:
provision:
name: Provision
if: ${{ github.event.action == 'labeled' && github.event.label.name == 'preview' && github.event.pull_request.state == 'open' || github.event.action == 'synchronize' && contains(github.event.pull_request.labels.*.name, 'preview') }}
steps: ...
destroy:
name: Destroy
if: ${{ github.event.action == 'unlabeled' && github.event.label.name == 'preview' || github.event.action == 'closed' }}
steps: ...
Provisioning Script
This script helps us provision or destroy our temporary infrastructure, it is located in scripts/preview
. Since we don't have any way to maintain about provisioned infrastructure, we simply use the branch name as our slug or a unique id throughout the process. This script is configurable via config.ts
as shown below.
import * as env from 'env-var';
const config = {
// Domain for Cloudflare access policy
domain: '<todo_your_domain>',
aws: {
region: 'us-east-1',
},
github: {
// Token and Pull request no. will be available in Github Action
token: env.get('GITHUB_TOKEN').required().asString(),
pull_number: env.get('PULL_NUMBER').required().asInt(),
},
vpc: {
securityGroups: {
filter: '<todo_your_security_group_tag>',
},
subnets: {
filter: '<todo_your_subnet_tag>',
},
},
ecs: {
cluster: '<todo_your_ecs_cluster_name>',
},
cloudflare: {
path: './outputs/tunnel',
auth_email: '<todo_your_cloudflare_email>',
api_key: env.get('CLOUDFLARE_API_KEY').required().asString(),
token: env.get('CLOUDFLARE_API_TOKEN').required().asString(),
accountId: env.get('CLOUDFLARE_ACCOUNT_ID').required().asString(),
zoneId: env.get('CLOUDFLARE_ZONE_ID').required().asString(),
domain: '<todo_your_cloudflare_domain>',
},
};
export default config;
It all comes together in preview.ts
:
import * as github from '@actions/github';
import slugify from 'slugify';
import CloudflareUtils from './utils/cloudflare';
import ECSUtils from './utils/ecs';
import * as GithubUtils from './utils/github';
import * as VPCUtils from './utils/vpc';
import log from './utils/log';
interface PreviewInterface {
provision(taskDefArn: string): Promise<void>;
destroy(): Promise<void>;
tunnel(): Promise<void>;
}
class Preview implements PreviewInterface {
private slug: string;
constructor(branch: string) {
const options = {
lower: true,
};
const suffix = `${branch}-preview`;
this.slug = slugify(suffix, options);
log.info(`Using slug "${this.slug}" for branch "${branch}"`);
}
async provision(taskDefArn: string): Promise<void> {
try {
log.info(`Provisioning resources for task definition arn: ${taskDefArn}`);
const subnets = await VPCUtils.getSubnets();
const securityGroups = await VPCUtils.getSecurityGroups();
const ecs = new ECSUtils(this.slug);
const cloudflare = new CloudflareUtils(this.slug);
await ecs.runTask(taskDefArn, subnets, securityGroups);
const comment = `Your preview environment should be up at https://${cloudflare.domain} in few moments! 🎉`;
if (github.context.payload.action === 'labeled') {
await GithubUtils.commentOnPR(comment);
}
log.success(comment);
} catch (error) {
log.error(error);
log.warn('Performing rollback!');
this.destroy();
process.exit(1);
}
}
async destroy(): Promise<void> {
try {
log.info(`Destroying resources`);
const ecs = new ECSUtils(this.slug);
const cloudflare = new CloudflareUtils(this.slug);
await ecs.stopTask();
await cloudflare.removeDNSRecord();
await cloudflare.deleteTunnels();
await cloudflare.removeAccess();
log.success('Resources destroyed');
} catch (error) {
log.error(error);
process.exit(1);
}
}
async tunnel(): Promise<void> {
try {
const cloudflare = new CloudflareUtils(this.slug);
const tunnelId = await cloudflare.createTunnel();
cloudflare.createConfigFile(tunnelId);
await cloudflare.addDNSRecord(tunnelId);
await cloudflare.createAccess();
log.success('Tunnel setup complete');
} catch (error) {
log.error(error);
process.exit(1);
}
}
}
export default Preview;
Here how we use it:
preview/commands/tunnel.ts
import Preview from '../preview';
import * as GithubUtils from '../utils/github';
async function run(): Promise<void> {
const branch = await GithubUtils.getCurrentBranch();
const preview = new Preview(branch);
await preview.tunnel();
}
run();
Usage:
$ yarn tunnel
This creates a CloudFlare credential config.yml
like below.
tunnel: <tunnel-id>
credentials-file: /root/.cloudflared/<tunnel-id>.json
ingress:
- hostname: subdomain.domain.com
service: http://localhost:4000
- service: http_status:404
preview/commands/provision.ts
:
import Preview from '../preview';
import { ArgumentParser } from 'argparse';
import * as GithubUtils from '../utils/github';
const parser = new ArgumentParser({
description: 'Provision preview environment',
});
parser.add_argument('-td', '--task-def-arn', {
required: true,
help: 'Task definition arn',
});
async function run(): Promise<void> {
const { task_def_arn } = parser.parse_args();
const branch = await GithubUtils.getCurrentBranch();
const preview = new Preview(branch);
await preview.provision(task_def_arn);
}
run();
Usage:
$ yarn provision --task-def-arn $TASK_DEFINITION
preview/commands/destroy.ts
:
import Preview from '../preview';
import * as GithubUtils from '../utils/github';
async function run(): Promise<void> {
const branch = await GithubUtils.getCurrentBranch();
const preview = new Preview(branch);
await preview.destroy();
}
run();
Usage:
$ yarn destroy
Infrastructure
Here's the infrastructure we need before we start running our temporary tasks. I've added a snippet here, for full implementation check the infrastructure
folder in the repository. I'm using terraform to provision this:
Note: If you're not familiar, you can learn more about terraform here
# ECR repository
resource "aws_ecr_repository" "ecr_repository" {
name = "app-repository"
image_tag_mutability = "IMMUTABLE"
image_scanning_configuration {
scan_on_push = true
}
}
# ECS task definition used by ECS service
resource "aws_ecs_task_definition" "task_definition" {
family = "app-task-definition"
network_mode = "awsvpc"
cpu = 4096
memory = 8192
requires_compatibilities = ["FARGATE"]
container_definitions = jsonencode([
{
"name": "app",
"image": "nginx:latest",
"essential": true,
"portMappings": [
{
"containerPort": 4000,
"hostPort": 4000
}
]
}
])
task_role_arn = aws_iam_role.task_execution_role.arn
execution_role_arn = aws_iam_role.task_execution_role.arn
}
# Security group
resource "aws_security_group" "security_group" {
name = "app-security-group"
vpc_id = var.vpc_id
ingress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# ECS cluster
resource "aws_ecs_cluster" "cluster" {
name = "ecs-cluster"
capacity_providers = ["FARGATE"]
}
resource "aws_cloudwatch_log_group" "log_group" {
name = "/ecs/app-log-group"
}
Access
Now that we have our application and infrastructure running, let's talk about access. More specifically how we can take advantage of Cloudflare Access.
As we discussed earlier, after we create a tunnel, we create a proxied CNAME
DNS record through the Cloudflare SDK like below.
Access Policy
Then, we can create an Access Policy to control who can access our secure endpoint. We can even enforce MFA!
Access Groups
This is more of a fine tune thing, but using access groups we can create teams such as Engineering
, Product
, QA
etc, and use these groups while configuring our access policies and much more. I'll leave this up to you.
Usage
Here's how we can use our preview environments:
Provision
Developer labels the pull request with the
preview
label. Once labeled, our GitHub action should build our application and provision the infrastructure.
When the GitHub action completes, it will leave a comment on the pull request like below and the environment will be available at
https://branch-slug.your-domain.com
.
Product or QA team uses a new environment to evaluate the pull request. Anyone who has access to Cloudflare (eg.
person@your_domain.com
), can login with the identity provider (In my case it was Okta) and access the preview environment.
Destroy
- To destroy, either we can close the pull request or unlabel it to get our destroy step started.
Improvements
For improvements, one idea can be to migrate the provisioning script to Go and make it a terraform provider.
Cost Estimations
Cost is pretty much translated into AWS ECS pricing (with Fargate) as we are using Cloudflare's free tier.
Conclusion
I hope this article was helpful, as always if you face any issues feel free to reach out.
Hopefully, this will bring some collaboration with the Product, QA, Solutions team at the early stages of the release process at your organization.
Top comments (4)
Fantastic post! I will considering replacing OpenVPN with this solution.
Thanks Michael!
Well done! You can have access to the Preview Environments in one click on AWS with Qovery
From what I see, Qovery runs on EKS (kubernetes). For people that want to avoid all kubernetes complexity and keep with ECS/Fargate, that's a blocker.