DEV Community

Cover image for Real-time S3 File zipping With Lambda and WebSockets in NodeJS
Seth M
Seth M

Posted on

Real-time S3 File zipping With Lambda and WebSockets in NodeJS

It's always amazing to me that AWS S3 hasn't offered a bulk download service from their console. For the systems I'm working on, downloading a set of files is a common activity.

Our original design for bulk downloading was setting up an EC2 machine with the aws cli installed, rabbitmq installed, and a small nodejs service. The node service would consume the messages off rabbitmq, use the aws cli to retrieve files from s3, zip them, and return them to an s3 location. This solution worked ok, but what if there was a way to do it more reliably and more cost effectively?

AWS Labmda and API Gateway to the rescue

We settled on the following strategy:

  • AWS Lambda would be responsible for zipping
  • AWS API Gateway using WebSockets would be used for triggering and managing the zipping process
  • Upon completing creation of the .zip file, get a pre-signed url of the new .zip file and pass it back to the websocket client.

There are some terrific articles and examples that I'll list below, but we ended up using this example from s3-zip as our baseline.

To wrap the zip lambda function in a websocket we used the aws simple-websockets-chat-app as a blueprint.

Finally, to make deployment easier, we created the SAM yaml files for pushing our code to AWS Cloudformation.

S3 Zip Lambda Function

The payload to the Lamda function will be a set of files to zip and the destination instructions for the resulting .zip file

Source code

const payload = {
  region: 'us-east-1',
  bucket: 'demobucket', <-- all assets in same bucket
  folder: 'audio/', <-- must have trailing slash
  files: [
    'file1.mp3',  <-- files in above folder
    'file2.mp3'
  ],
  zipBucket: 'demobucket', <-- bucket for resulting .zip file
  zipFolder: 'temp/', <-- must have trailing slash
  zipFileName: 'demo1.zip',
  signedUrlExpireSeconds: 60 * 60 * 10 <-- expiration time of signedUrl of s3object
}
Enter fullscreen mode Exit fullscreen mode

API Gateway WebSocket

I've setup a simple API Gateway using CloudFormation

  DemoDevS3ZipWebSocket:
    Type: AWS::ApiGatewayV2::Api
    Properties:
      Name: DemoDevS3ZipWebSocket
      ProtocolType: WEBSOCKET
      RouteSelectionExpression: "$request.body.action"

  Deployment:
    Type: AWS::ApiGatewayV2::Deployment
    DependsOn:
    - ZipRoute
    Properties:
      ApiId: !Ref DemoDevS3ZipWebSocket

  Stage:
    Type: AWS::ApiGatewayV2::Stage
    Properties:
      StageName: demo
      Description: Demo Deployment
      DeploymentId: !Ref Deployment
      ApiId: !Ref DemoDevS3ZipWebSocket
Enter fullscreen mode Exit fullscreen mode

The important thing to note here is the RouteSelectionExpression value. The "action" parameter will trigger our "onzip" route.

To actually facilitate the zipping, we are using s3-zip .

When the 'onzip' route is invoked, the params are parsed from the event.body.params :

const params = JSON.parse(event.body).params
Enter fullscreen mode Exit fullscreen mode

Do a quick validation to make sure that there are actual files:

  if (!(files.length > 0)) {
    return {
      statusCode: 500,
      body: JSON.stringify({
        statusCode: 'error',
        message: 'No files to zip'
      })
    }
  }
Enter fullscreen mode Exit fullscreen mode

In order to communicate back through the WebSocket, we needed to use the ApiGatewayManagementApi. The constructor requires the deployed WebSocket address as the endpoint:

  const stage = event.requestContext.stage
  const domainName = event.requestContext.domainName


  // Allows the Lambda function to communicate with the websocket client
  const api = new AWS.ApiGatewayManagementApi({
    endpoint: 'https://' + domainName + '/' + stage
  })

  // event.requestContext.connectionId contains the Websocket id
  const apiParams = {
    ConnectionId: event.requestContext.connectionId,
    Data: null
  }

Enter fullscreen mode Exit fullscreen mode

Once the ApiGatewayManagementApi object is available, we can communicate back through the WebSocket by calling:

apiParams.Data = 'some msg -or- JSON.stringify(obj)'
await api.postToConnection(apiParams).promise()
Enter fullscreen mode Exit fullscreen mode

This call will trigger a WebSocket 'onmessage' event.

Next, the main try/catch block will invoke s3zip and AWS.S3.upload to stream the content into a zip file using the s3zip archive method. Using the archive method allows the process to bypass the need for downloading or storing the target files in any kind of temporary location.

  try {
    const body = s3Zip.archive({ region: region, bucket: bucket }, folder, files)
    const zipKey = zipFolder + zipFileName
    const zipParams = { params: { Bucket: zipBucket, Key: zipKey } }
    const zipFile = new AWS.S3(zipParams)

    const promise = new Promise((resolve, reject) => {
      zipFile.upload({ Body: body })
        .on('httpUploadProgress',
          async function (evt) {
            evt.statusCode = 'progress'
            evt.pctComplete = (100 * evt.loaded / totalBytes)
            apiParams.Data = JSON.stringify(evt)
            // communicate with the Websockets, returning the pctComplete
            await api.postToConnection(apiParams).promise()
          })
        .send(async function (e, r) {
          if (e) {
            e.statusCode = 'error'
            reject(e)
          } else {
            r.statusCode = 'success'
            r.Files = files
            r.Folder = folder
            r.SignedUrl = s3.getSignedUrl('getObject', {
              Bucket: zipBucket,
              Key: r.Key,
              Expires: signedUrlExpireSeconds
            })
            resolve(r)
          }
        })
    })

    // wait for s3zip process to complete
    const res = await promise

    // message push back through websocket with zip results
    apiParams.Data = JSON.stringify(res)
    await api.postToConnection(apiParams).promise()

    return {
      statusCode: 200,
      body: JSON.stringify(res)
    }
  } catch (e) {
    // send error messages back to websocket client
    apiParams.Data = JSON.stringify(e)
    await api.postToConnection(apiParams).promise()
    return {
      statusCode: 500,
      body: JSON.stringify(e)
    }
  }
Enter fullscreen mode Exit fullscreen mode

As the upload to s3 happens, the 'httpUploadProgress' event from the s3 upload method will emit and return a value for event.loaded. We push that event object back through the WebSocket using await api.postToConnection(apiParams).promise(). What's nice is that this allows our web frontend to give the user a real-time progress display on the zipping process.

So far, we haven't seen any issues. We've tested out this process on 5GB worth of target files and haven't seen any issue.

When the zip file is complete, the send() method is invoked. In the send() method we do a quick call to getSignedUrl():

r.SignedUrl = s3.getSignedUrl('getObject', {
  Bucket: zipBucket,
  Key: r.Key,
  Expires: signedUrlExpireSeconds
})
Enter fullscreen mode Exit fullscreen mode

On a successful zip, the signedUrl is generated, and the lambda function will send a message through the websocket. On the calling side, a simple parse can extract the signedUrl and trigger the browser to begin the physical download, as if the user clicked on the .zip file.

The javascript on the caller .html page looks something like this:

socket.onmessage = function (event) {
  if (event.data) {
    var data = JSON.parse(event.data)

    // zip progress
    if (data.statusCode === 'progress') {
      var pct = Math.round(data.pctComplete)
      progressElement.text('Processing ... ' + pct + '%')
    }

    // zip completed
    if (data.statusCode === 'success') {
      if (Object.prototype.hasOwnProperty.call(data, 'SignedUrl')) {
        window.location.assign(data.SignedUrl)
      } else {
        console.error('api did not returned SignedUrl')
      }
      socket.close()
    }
}

Enter fullscreen mode Exit fullscreen mode

So that's it. This satisfied our requirement for having a Lambda based zip function with a progress meter using NodeJS

Have a look at the README in the source. It will give instructions for loading both the AWS Api Gateway Websocket and Lambda function into AWS from the command line using SAM and CloudFormation.

There is also a test script for connecting to the websocket and confirming that the SignedUrl for the zipped assets is being returned correctly.

Plus there is a simple web page with the javascript to show you how to connect to the websocket and invoke the 'onzip' route.

Thank you for reading!

The source code is available from github.

https://github.com/openstepmedia/aws-s3zip-lambda-demo

Top comments (0)