In a recent project for our customer Skillgym, I faced the challenge of serving content from different origins based on specific conditions, like the host subdomain. The goal was to leverage AWS CloudFront to distribute content efficiently while ensuring the flexibility to choose between multiple origins based on the host header and request path. This post will walk you through the solution, highlighting the differences between CloudFront Functions and Lambda@Edge, and how we used Lambda@Edge to dynamically select origins based on subdomains.
The Challenge
The primary challenge was to configure CloudFront to serve content from different origins based on:
- Specific domains or subdomains
- Different paths within each domain/subdomain
For example, we needed to route traffic to an S3 bucket for static assets on a specific path for a particular domain. This means that requests to www.example.com/images/test.png should be directed to the S3 object at s3://my-bucket/example.com-content/public/images/test.png. At the same time, we used an Application Load Balancer (ALB) to handle dynamic content. So, requests to www.company-website.com/images/dynamic-image.png should be directed to the API served by the ALB.
Solution Overview
The solution involves:
- Configuring an S3 bucket and CloudFront distribution.
- Writing Lambda@Edge functions to dynamically choose the origin.
- Setting up CloudFront Functions for lightweight request manipulation.
CloudFront Functions vs. Lambda@Edge
CloudFront Functions are designed for lightweight, short-duration tasks such as simple HTTP header manipulation. They execute very quickly and are cost-effective for simple tasks. CloudFront Functions are priced based on the number of function invocations.
As of the current pricing model:
- First 2,000,000 invocations each month are free.
- $0.10 per 1,000,000 invocations thereafter.
They are limited to basic transformations and do not support accessing external services or executing complex logic.
Lambda@Edge, on the other hand, offers more power and flexibility. It allows for more complex logic and can interact with external services, making it suitable for tasks such as dynamically choosing origins based on request attributes.
Lambda@Edge functions are priced based on:
- Invocations: $0.60 per 1,000,000 invocations.
- Duration: $0.00000625125 for every GB-second used.
If the requested content is already cached and the cache is still valid, the origin-request
Lambda@Edge function will not be invoked. This means Lambda@Edge only executes when CloudFront needs to go to the origin to fetch the content.
Implementation Details
We will define two origins, but only one cache behavior. In the function associated with the viewer-request trigger, we will manipulate the incoming request. Then, in the function associated with the origin-request trigger, we will change the origin based on the host header. This setup allows us to dynamically select the appropriate origin for each request while maintaining a consistent caching strategy.
Terraform Configuration
Here's the Terraform configuration to set up the necessary AWS resources.
It’s important to note that the provided Terraform code is far from complete. Additional configurations are required to fully set up the S3 bucket, Origin Access Control (OAC), and other distribution parameters. Specifically, detailed bucket definitions, complete OAC settings, and comprehensive CloudFront distribution parameters must be included to ensure a fully functional and secure content delivery setup. These additional configurations are crucial for properly managing permissions, securing access, and optimizing the performance of the CloudFront distributions.
Since the origin request function is executed on the S3 cache behaviour, CloudFront automatically replaces the host header with the S3 host header. To preserve the original host header, we need to save it in a custom header, such as x-original-host. When the origin request function determines that the origin should be the ALB, we can use the value stored in x-original-host to restore the original host header. This allows us to continue using the host header as a rule to select the target group in the ALB, ensuring proper routing and origin selection based on the original request.
resource "aws_cloudfront_distribution" "default" {
depends_on = [aws_s3_bucket.distribution_bucket]
origin {
domain_name = aws_s3_bucket.distribution_bucket.bucket_regional_domain_name
origin_access_control_id = aws_cloudfront_origin_access_control.default.id
origin_id = local.s3_origin_id
}
origin {
domain_name = var.default_target_origin
origin_id = local.main_origin
custom_origin_config {
origin_protocol_policy = "match-viewer"
http_port = 80
https_port = 443
origin_ssl_protocols = ["TLSv1.2"]
}
}
enabled = true
is_ipv6_enabled = true
aliases = var.host_subdomains
default_cache_behavior {
// ...
target_origin_id = local.s3_origin_id
cache_policy_id = var.default_cache_policy
viewer_protocol_policy = "redirect-to-https"
function_association {
event_type = "viewer-request"
function_arn = aws_cloudfront_function.replace-s3-view-path.arn
}
lambda_function_association {
event_type = "origin-request"
lambda_arn = aws_lambda_function.select-origin-function.qualified_arn
include_body = false
}
}
}
Lambda@Edge and CloudFront Functions
- Viewer Request Function: This CloudFront function rewrites the URL path based on subdomain mappings and saves the original host header in the x-original-host header.
exports.handler = async (event) => {
const request = event.Records[0].cf.request;
const uri = request.uri;
const host = request.headers.host.value;
request.headers['x-original-host'] = {
'value': host
};
// Add logic to manipulate URI based on subdomain
if (uri.startsWith('/old-path')) {
request.uri = uri.replace('/old-path', '/new-path');
}
return request;
};
- Origin Request Function: This Lambda@Edge function routes requests to different origins based on the subdomain and path.
exports.handler = async (event) => {
const request = event.Records[0].cf.request;
const headers = request.headers;
const hostHeader = headers.host[0].value;
const originalHost = request.headers['x-original-host'][0].value;
const subdomain = hostHeader.split('.')[0];
// You can add your logic here
if (subdomain === 'static') {
request.headers['host'] = [{ key: 'Host', value: 'your-s3-bucket.s3.us-east-1.amazonaws.com'}]
} else {
// Store the original host header in a custom header
const originalHost = request.headers['x-original-host'][0].value;
request.origin = {
custom: {
domainName: 'your-alb.amazonaws.com',
port: 443,
protocol: 'https',
path: '',
sslProtocols: ['TLSv1.2'],
readTimeout: 5,
keepaliveTimeout: 5,
customHeaders: {}
}
};
// Set the host header to the original request host
request.headers['host'] = [{ key: 'host', value: originalHost }];
}
return request;
};
Bucket security
This approach leverages the Origin Access Control (OAC) feature, as it is the only method that works with this configuration (during the implementation I tested OAI without success).
Conclusion
Selecting the right CDN origin and using Lambda@Edge with CloudFront Functions gives you the flexibility to select the origin with complex logic. Lambda@Edge enables advanced logic execution closer to users, while CloudFront Functions handle lightweight tasks like URL rewrites efficiently.
Top comments (0)