Export DynamoDB to S3 and query with Athena using SQL, unlocking powerful, scalable, and serverless data analytics
This is the first part of a two-part series covering how to copy data from DynamoDB to S3 and query it with Athena.
- Part 1 (this article): Query DynamoDB with SQL using Athena - Leveraging DynamoDB Exports to S3
- Part 2: Query DynamoDB with SQL using Athena - Leveraging EventBridge Pipes and Firehose
We will build two solutions using AWS CDK and TypeScript, covering both single-table and multiple-table designs.
Unlocking New Possibilities
The DynamoDB is a NoSQL database, and that fact comes with both the good and the bad. Among others, the good is that it is ultimately scalable; the bad is that it has limited query capabilities. You can squeeze out some additional query capabilities with a single-table approach, but it is still not comparable to any SQL database. However, for reporting, the SQL query language is extremely useful or almost essential.
One option is to copy data to an SQL database and execute queries there. However, if the database is large, this can be impractical or too expensive. A more scalable solution is to transfer data to S3 and query it with Athena using SQL, which we will explore in this article.
This approach is ideal for serverless solutions where the amount of data or cost makes an SQL database impractical. Athena is a powerful alternative and can be very cost-effective. At $5 per terabyte of queried data, it is significantly cheaper than provisioning an SQL instance capable of handling that volume. However, if you're running many queries, Athena may become expensive. Athena is best for creating offline reports. It's usually too slow for direct synchronous responses to users.
There are several ways to copy data from DynamoDB to S3, but the most useful and serverless-friendly are:
- Using native full or incremental exports from DynamoDB to S3
- Using EventBridge Pipes and Firehose for a near real-time copy of the data
This article focuses on the first approach. The second approach will be covered in Part 2. We will look at both solutions in the context of single-table and multiple-table designs.
Top comments (0)