Jurriaan Proos

Posted on Feb 27, 2021 • Originally published at jurriaan.cloud on Jan 25, 2021

Track clicks on links using Lambda@Edge

#aws #serverless #python #lambdaedge

In this post I will show how I’ve added ‘tracking’ to my website using Lambda@Edge.

Although I’ve not written many posts yet, there’s a few that contain links to (hopefully) useful websites with more information or examples and such. Inspired by how various newsletter emails track how often people click on the links in the newsletter, I thought I’d implement the same for my website using Lambda@Edge.

I could of course include some client-side tracking libraries such as Google Analytics. However, people most of the time have some sort of ad-blocker installed which would make that solution work only for those that don’t. Doing it server-side works around that.

Context

In the end the solution looks more or less like this:

Explanation

To make this work I’ve implemented a /goto endpoint that gets handled in the origin-request Lambda@Edge function. This function checks the given URL against registered URLs in a DynamoDB table. Based on whether it is registered in this table it returns either a 302 response with the Location header set to the target url, or it will pass on the request to the origin (S3) which will result in a 404 from S3 that in CloudFront returns a Custom Error Page for.

We set the Cache-Control header to no-cache so that CloudFront does not cache the response (which would prevent us from tracking clicks as then the origin-request event will not trigger).

The function looks like this:

def handle_event(event, context):
    request = event["Records"][0]["cf"]["request"]
    uri = request["uri"]

    if uri == "/goto":
        params = {k: v[0] for k, v in parse_qs(request['querystring']).items()}

        if "link" not in params:
            return request

        # URL is passed urlencoded
        url = unquote(params["link"])

        if not is_registered_url(url):
            # Will result in a 404
            return request

        store_click(url)

        response = {
            "status": 302,
            "statusDescription": "Found",
            "headers": {
                "location": [{
                    "key": "Location",
                    "value": url
                }],
                "cache-control": [{
                    "key": "Cache-Control",
                    "value": "no-cache"
                }]
            }
        }

        return response

    return request

The is_registered_url(url) and store_click(url) functions perform get_item and put_item calls on the DynamoDB table.

In a previous post I’ve mentioned that I provision my CloudFront distribution using the Serverless framework, so adding the above function to be triggered on origin-request events is as easy as adding the following to my serverless.yaml.

functions:
  ...
  originRequest:
    handler: src/functions/origin_request/handler.handle_event
    events:
      - cloudFront:
          eventType: origin-request
          origin: http://${self:custom.domainName}.s3-website-eu-west-1.amazonaws.com
    iamRoleStatements:
      - Effect: Allow
        Action:
          - dynamodb:GetItem
          - dynamodb:PutItem
        Resource: "arn:aws:dynamodb:eu-west-1:#{AWS::AccountId}:table/${self:custom.urlsTableName}"

Usage

To simplify adding external urls to blog posts that go through this /goto endpoint I’ve written a custom Liquid tag that transforms a given text + url into a proper anchor tag.

module Jekyll
  class ExtUrlTag < Liquid::Tag
    include Liquid::StandardFilters

    def initialize(tag_name, input, tokens)
      super
      @input = input
    end

    def render(context)
      input_split = split_params(@input)
      text = input_split[0].strip
      url = input_split[1].strip

      url_encoded = url_encode(url)

      "<a href=\"https://jurriaan.cloud/goto?link=#{url_encoded}\" target=\"_blank\">#{text}</a>"
    end

    def split_params(params)
      params.split("|")
    end
  end
end

Liquid::Template.register_tag('ext_url', Jekyll::ExtUrlTag)

Whenever I want to add a link to an external website, all I have to do is write

{% ext_url this|https://dev.to %}

which the custom Liquid tag turns into this.

To register the URLs I use, I’ve written a simple Python script that identifies all ext_url occurrences in my posts and adds these to DynamoDB.

REGEX_PATTERN = re.compile("{% ext_url (.*?) %}")

def main():
    path = os.path.realpath("../website/src/_posts")

    for item in os.listdir(path):
        full_path_to_item = os.path.join(path, item)

        if not os.path.isfile(full_path_to_item):
            continue

        with open(full_path_to_item) as file:
            file_contents = file.read()

            for match in re.findall(REGEX_PATTERN, file_contents):
                ext_url_split = match.split("|")

                if len(ext_url_split) != 2:
                    continue

                url = ext_url_split[1]
                add_url(url, os.path.splitext(item)[0])

The add_url function performs a put_item on the DynamoDB table.

Trying it out

After registering the URL in the DynamoDB table and running curl a few times, I can see the entries coming in.

Conclusion

Using Lambda@Edge is a simple solution to do this kind of ‘server-side’ tracking of clicks to external websites.

In the future I’m planning to extend this by building an API on top of the DynamoDB table so that I can show a list of URLs and the click count at the bottom of a post.

DEV Community

Track clicks on links using Lambda@Edge

Context

Explanation

Usage

Trying it out

Conclusion

Top comments (0)

Read next

GTA San Andreas APK 2.11.264 New Update 2025 (Unlimited money) Download

A conversation with your architecture

Launching EC2 Instances with AWS CLI and Advanced Features

Announcements from Matt Garman Keynote at re:Invent 2024