Recently, ran into a simple yet interesting problem (well not that interesting) about streaming file from one S3 location to another S3 location. You will argue that why can't you just use the aws-sdk's copyObject method. Well, that method takes in source bucket/key and destination bucket/key which is fine but not available all the time. Like when file is coming from an external vendor.
Consider a scenario, where you just have the url of the file. It can be anything like an end-point to an html page, OR a pre-signed url to another S3 location etc. How to copy that file? Couple of approaches could be,
- Manually download the file and then upload it to your S3 bucket.
- Have a UI interface to download the file and then upload it via some backend service call.
How about we write a simple lambda function which takes in your url and destination S3 location and copies the file directly? Let's see how we can achieve that using available resources.
TOC
PreRequisites
In order to try this out, you would need an AWS account (or can use localstack. See this if need to know how to work with localstack). For this post, we will use node environment, I am using Node 12.
Wrong Solution 1
Before arriving on the correct solution, we will walk through couple of wrong approaches. These are wrong mainly coz of incorrect usage of async/awaits. So, this goes like this,
const downloadFile = async (downloadUrl: string): Promise<any> => {
return axios.get(downloadUrl, {
responseType: 'stream',
});
};
const uploadFromStream = (fileResponse: AxiosResponse, fileName: string, bucket: string): PassThrough => {
const s3 = new S3();
const passThrough = new PassThrough();
s3.upload(
{ Bucket: bucket, Key: fileName, ContentType: fileResponse.headers['content-type'], Body: passThrough },
(err: Error, data: S3.ManagedUpload.SendData) => {
if (err) {
console.log(err);
throw err;
}
},
);
return passThrough;
};
export const handler = async (event: CopyEvent) => {
const response = await downloadFile(event.fileUrl);
response.data.pipe(uploadFromStream(response, event.fileName, 'test-bucket'));
return event.fileName;
};
So, now when you run this function, the function will end successfully. But when you go to your S3 bucket and check it, the file is not there.
So, what went wrong? We didn't await on the s3.upload
. The lambda function returned before the upload could finish. Let's add await and see what happens.
Wrong Solution 2
const uploadFromStream = async (fileResponse: AxiosResponse, fileName: string, bucket: string): Promise<PassThrough> => {
const s3 = new S3();
const passThrough = new PassThrough();
await s3
.upload({ Bucket: bucket, Key: fileName, ContentType: fileResponse.headers['content-type'], Body: passThrough })
.promise();
return passThrough;
};
export const handler = async (event: MovePdfEvent): Promise<string> => {
const response = await downloadFile(event.fileUrl);
const stream = await uploadFromStream(response, event.fileName, 'test-bucket');
response.data.pipe(stream);
return event.fileName;
};
So, now when you run this function, the function will end successfully. But when you go to your S3 bucket and check it, the file is still not there.
So, what went wrong, again? This time we had await. Even after having await, we resolved the upload stream function before we passed it to stream here: response.data.pipe
. So, how do we solve it? Thanks to @elthrasher
, for figuring out the missing pieces. Let's see what they are.
Correct Solution
const uploadFromStream = (
fileResponse: AxiosResponse,
fileName: string,
bucket: string,
): { passThrough: PassThrough; promise: Promise<S3.ManagedUpload.SendData> } => {
const s3 = new S3();
const passThrough = new PassThrough();
const promise = s3
.upload({
Bucket: bucket,
Key: fileName,
ContentType: fileResponse.headers['content-type'],
ContentLength: fileResponse.headers['content-length'],
Body: passThrough,
})
.promise();
return { passThrough, promise };
};
export const handler = async (event: CopyFileEvent): Promise<string> => {
const responseStream = await downloadFile(event.fileUrl);
const { passThrough, promise } = uploadFromStream(responseStream, event.fileName, 'test-bucket');
responseStream.data.pipe(passThrough);
return promise
.then((result) => {
return result.Location;
})
.catch((e) => {
throw e;
});
};
So, now when you run this function, the function will end successfully. But now when you go to your S3 bucket and check it, the file is there.
So, what did we change? This time we returned the upload promise and also the pass through stream from our function and then piped that stream to response stream. And in the function we returned the upload promise, so that way the lambda function will wait for the promise to be either resolved or rejected before quitting.
Conclusion
This looks simple but as you saw it was a bit tricky coz async
/await
are great but can become hairy and a dev's nightmare real soon. But anyways, correct usage of these helped us solve our problem at hand.
Full code can be found here.
Top comments (0)