Arpad Toth for AWS Community Builders

Posted on Aug 25, 2022 • Originally published at arpadt.com

Using streams when getting objects from S3

#aws #serverless #node #s3

We should change how we process the getObject requests in our Node.js AWS applications. Readable stream techniques come in handy when we handle the S3 response.

1. The classic problem

Say we are facing the classic problem: We have a Lambda function, which programatically receives objects from S3 with the AWS SDK in Node.js.

The application uses the getObject method to receive the object from the bucket.

2. Changes

But when we upgrade to version 3 of the SDK (or write a new application with that version), we will experience some changes in the method signature.

Version 3 is modular, so we only have to install what we need in the application. It will reduce the package size, which improves deployment time, so everything sounds good.

We should only install the @aws-sdk/client-s3 module instead of the whole aws-sdk package. The module contains the getObject method that helps us receive the objects from the bucket.

The S3 constructor is still available in the module, so it's nothing new up to this point.

2.1. No promise() method

The first change is that the getObject method will return a Promise.

In version 2, the getObject method returns an object, and we had to call the promise() method, which resolves to the S3 response. Because we always want to use the async/await syntax instead of callbacks, the promise() method has been part of our development life.

The good news is that AWS has simplified the signature in version 3, and the getObject method already returns a Promise. Therefore we don't have to call the promise() method if we want to await to get the resolved value.

2.2 Readable streams instead of Buffer

The promise the S3 getObject method resolves to an object, which extends the GetObjectOutput type. This object has the same properties as in SDK v2 but contains a breaking change.

In version 3 the Body property of the resolved S3 response object is a readable stream instead of Buffer. The modification implies that we should change how the application handles the object.

3. Some TypeScript code

Readable streams implement the Symbol.asyncIterator method, so the streams are also async iterables.

So we can use the for...of construct to iterate over the readable stream and get the chunks the stream provides.

In the following example, we will return the object we have downloaded from S3. The code example that handles the getObject requests can look like this:

async function getObject(params) {
  const s3ResponseStream = (await s3.getObject(params)).Body
  const chunks = []

  for await (const chunk of s3ResponseStream) {
    chunks.push(chunk)
  }

  const responseBuffer = Buffer.concat(chunks)
  return JSON.parse(responseBuffer.toString())
}

Each chunk is a Buffer. After we have received the last chunk of the S3 object, we can concatenate and convert them to a string, then finally to a JavaScript object.

The Lambda handler can look like this:

import { S3 } from '@aws-sdk/client-s3'

const s3 = new S3({ region: 'us-east-1' })

export async function handler(event) {
  try {
    const s3Object = await getObject({
      Bucket: 'ARN OF THE BUCKET',
      Key: 'NAME OF THE OBJECT TO FETCH',
    })

    return s3Object
  } catch (error) {
    console.error('Error while downloading object from S3', error.message)
    throw error
  }
}

We can wrap the stream handling logic to a function called getObject, and use it in a try/catch block as we usually do in the Lambda handler.

Please note that we still store the S3 object in memory in the above example. The real benefit of streams is that we process the chunk as they arrive. These use cases like transforming the data, saving it to a database, or returning the response as a stream are not part of this post, and I might cover them another time.

4. Summary

The getObject method's signature has changed in SDK version 3. The Body property of the response is now a readable stream instead of Buffer.

We can use the core Node.js stream logic to handle the return value in our Lambda functions.

5. References

AWS SDK for JavaScript v3

DEV Community

Using streams when getting objects from S3

1. The classic problem

2. Changes

2.1. No promise() method

2.2 Readable streams instead of Buffer

3. Some TypeScript code

4. Summary

5. References

Top comments (0)

Read next

Blog Server with JWT Authentication - Powered by Node.js & MySQL

Chunk-Busters: Don’t cross the Streams!

Amazon S3 Tables: A Game Changer in Analytics and Data Lake Space

Building Node.js modules in Rust with NAPI-RS