In this post I will show how I’ve added ‘tracking’ to my website using Lambda@Edge.
Although I’ve not written many posts yet, there’s a few that contain links to (hopefully) useful websites with more information or examples and such. Inspired by how various newsletter emails track how often people click on the links in the newsletter, I thought I’d implement the same for my website using Lambda@Edge.
I could of course include some client-side tracking libraries such as Google Analytics. However, people most of the time have some sort of ad-blocker installed which would make that solution work only for those that don’t. Doing it server-side works around that.
Context
In the end the solution looks more or less like this:
Explanation
To make this work I’ve implemented a /goto
endpoint that gets handled in the origin-request
Lambda@Edge function. This function checks the given URL against registered URLs in a DynamoDB table. Based on whether it is registered in this table it returns either a 302 response with the Location
header set to the target url, or it will pass on the request to the origin (S3) which will result in a 404 from S3 that in CloudFront returns a Custom Error Page for.
We set the Cache-Control
header to no-cache
so that CloudFront does not cache the response (which would prevent us from tracking clicks as then the origin-request event will not trigger).
The function looks like this:
def handle_event(event, context):
request = event["Records"][0]["cf"]["request"]
uri = request["uri"]
if uri == "/goto":
params = {k: v[0] for k, v in parse_qs(request['querystring']).items()}
if "link" not in params:
return request
# URL is passed urlencoded
url = unquote(params["link"])
if not is_registered_url(url):
# Will result in a 404
return request
store_click(url)
response = {
"status": 302,
"statusDescription": "Found",
"headers": {
"location": [{
"key": "Location",
"value": url
}],
"cache-control": [{
"key": "Cache-Control",
"value": "no-cache"
}]
}
}
return response
return request
The is_registered_url(url)
and store_click(url)
functions perform get_item
and put_item
calls on the DynamoDB table.
In a previous post I’ve mentioned that I provision my CloudFront distribution using the Serverless framework, so adding the above function to be triggered on origin-request
events is as easy as adding the following to my serverless.yaml
.
functions:
...
originRequest:
handler: src/functions/origin_request/handler.handle_event
events:
- cloudFront:
eventType: origin-request
origin: http://${self:custom.domainName}.s3-website-eu-west-1.amazonaws.com
iamRoleStatements:
- Effect: Allow
Action:
- dynamodb:GetItem
- dynamodb:PutItem
Resource: "arn:aws:dynamodb:eu-west-1:#{AWS::AccountId}:table/${self:custom.urlsTableName}"
Usage
To simplify adding external urls to blog posts that go through this /goto
endpoint I’ve written a custom Liquid tag that transforms a given text + url into a proper anchor tag.
module Jekyll
class ExtUrlTag < Liquid::Tag
include Liquid::StandardFilters
def initialize(tag_name, input, tokens)
super
@input = input
end
def render(context)
input_split = split_params(@input)
text = input_split[0].strip
url = input_split[1].strip
url_encoded = url_encode(url)
"<a href=\"https://jurriaan.cloud/goto?link=#{url_encoded}\" target=\"_blank\">#{text}</a>"
end
def split_params(params)
params.split("|")
end
end
end
Liquid::Template.register_tag('ext_url', Jekyll::ExtUrlTag)
Whenever I want to add a link to an external website, all I have to do is write
{% ext_url this|https://dev.to %}
which the custom Liquid tag turns into this.
To register the URLs I use, I’ve written a simple Python script that identifies all ext_url
occurrences in my posts and adds these to DynamoDB.
REGEX_PATTERN = re.compile("{% ext_url (.*?) %}")
def main():
path = os.path.realpath("../website/src/_posts")
for item in os.listdir(path):
full_path_to_item = os.path.join(path, item)
if not os.path.isfile(full_path_to_item):
continue
with open(full_path_to_item) as file:
file_contents = file.read()
for match in re.findall(REGEX_PATTERN, file_contents):
ext_url_split = match.split("|")
if len(ext_url_split) != 2:
continue
url = ext_url_split[1]
add_url(url, os.path.splitext(item)[0])
The add_url
function performs a put_item
on the DynamoDB table.
Trying it out
After registering the URL in the DynamoDB table and running curl
a few times, I can see the entries coming in.
Conclusion
Using Lambda@Edge is a simple solution to do this kind of ‘server-side’ tracking of clicks to external websites.
In the future I’m planning to extend this by building an API on top of the DynamoDB table so that I can show a list of URLs and the click count at the bottom of a post.
Top comments (0)