One of the reasons AWS CDK has me so intrigued is the promise of being able to spin up environments in minutes. If I can provision all my infrastructure, databases and applications with a single structure that is source controlled, I can do all kinds of things most engineering teams have only dreamed of:
- Run N test environments to avoid logjams/branch conflicts.
- Team or individual developer sandboxes spun up (and down) in minutes.
- Isolated environments for CI/CD and test automation strategies.
- Staging/demo/eval/load test environments on demand and discarded after use.
- Customer isolation into separate accounts or VPCs.
Managing data can be somewhat tricky when it comes to trying to pull off something like this so I really wanted to find out if I could use CDK to load the database I've just provisioned. A fresh developer account with all infrastructure and apps provisioned but NO DATA AT ALL is probably not going to deliver the smooth experience I'm striving for. So how can CDK help me meet this goal?
Table of Contents
- CDK and Tools Review
- tl;dr
- DynamoDB
- Create a Table
- AWS Custom Resource
- Fake Friends via Faker
- Call the API
- Make it Go Faster!
- And Faster!
- Unlimited Data!
- Next Steps
CDK and Tools Review
I explained my thoughts on how to set up CDK projects in my last article. If you want to know why I've changed some of the project setup or my ideas about how linting should be done, it's all there.
tl;dr
Skip the article and check out the code, if you prefer.
DynamoDB
DynamoDB is the managed nosql solution from AWS. I'm not going to do a deep dive into DynamoDB here. I chose DynamoDB for this example because it's serverless and fully managed. That'll make it cheap to play around with and fast to provision. I haven't done it yet, but I'm confident we could apply similar techniques to RDS.
Create a Table
There's no need to create schemas or define columns with DynamoDB. I only need to create a Table and specify its PartitionKey attribute.
Naturally this is simple to do in CDK.
import { AttributeType, Table } from '@aws-cdk/aws-dynamodb';
import { Construct, RemovalPolicy, Stack, StackProps } from '@aws-cdk/core';
export class CdkDynamoCustomLoaderStack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props);
const tableName = 'friends';
new Table(this, 'FriendsTable', {
tableName,
partitionKey: { name: 'id', type: AttributeType.STRING },
removalPolicy: RemovalPolicy.DESTROY,
});
}
}
I'm creating a table called friends
. Since the life of a developer is lonely, I will use an AWS Custom Resource to generate some friends.
AWS Custom Resource
It's a bit daunting at first to think I'm just learning CDK and I already want to go ahead and start creating custom resources, but actually they are pretty simple and straightforward to use. There are two strategies supported by CDK, Provider Framework and Custom Resources for AWS APIs.
Provider Framework lets me write my own custom lambda handler for resource lifecycle events while Custom Resources for AWS APIs lets me call AWS APIs during my deployment. This is going to be the simpler option so it's what I'll use in this article.
Fake Friends via Faker
I like using Faker to generate fake data. It has a lot of great options and is almost always good for a laugh. My plan is that I will use an AWS API to insert a fake friend record into the database I've just provisioned. To do that, I'll need a way to generate that data. In order to keep things simple, I'll just add a private method to my stack that knows how to do this.
import { commerce, name, random } from 'faker';
// now inside my stack constructor
private generateItem = () => {
return {
id: { S: random.uuid() },
firstName: { S: name.firstName() },
lastName: { S: name.lastName() },
shoeSize: { N: random.number({ max: 25, min: 1, precision: 0.1 }) },
favoriteColor: { S: commerce.color() },
};
};
Each attribute specifies the type, in this case S
for string and N
for number. If I were using mysql instead of DynamoDB, this would probably be a sql string.
My linter doesn't like the fact that the above method doesn't specify a return type and I like the idea of defining my data types so I'm going to create an interface.
interface IFriend {
id: { S: string };
firstName: { S: string };
lastName: { S: string };
shoeSize: { N: number };
favoriteColor: { S: string };
}
Note that the official TypeScript style guide says not to prefix your interface, but my linting rule expects it. I'm just not going to get into it right now.
Call the API
I'll use the AwsCustomResource
constructor to call the DynamoDB API. What CDK is going to do here is create a lambda function and use the SDK for JavaScript to make the call.
import { AwsCustomResource } from '@aws-cdk/custom-resources';
// inside constructor
new AwsCustomResource(this, 'initDBResource', {
onCreate: {
service: 'DynamoDB',
action: 'putItem',
parameters: {
TableName: tableName,
Item: this.generateItem(),
},
physicalResourceId: 'initDBData',
},
});
This code will create a lambda function that invokes the AWS JavaScript SDK. It will call putItem
on the DynamoDB
import and pass it my parameters. I can explore this API in the SDK docs, but unfortunately not in the CDK types as they are not narrow enough. Maybe some day.
Note that this creates a resource with the given ID and executes this API call when it's created. There are onUpdate
and onDelete
calls available too.
With the above code, I can npm run build
(or watch) and cdk deploy
and I'll find my table gets created and has a single friend in it.
Since I used onCreate
, the API call is only made on my first deploy - when the Custom Resource is created. If I changed that to onUpdate
, then I'd get a new one every time I deploy.
To break that down just a little more, when I npm run build
, that transpiles the TypeScript code into JavaScript. I now have JavaScript code that calls some faker methods and eventually produces a cloudformation template. If I'm putting programming structures like conditional statements and loops into my CDK code, it's really important to understand when those conditionals and loops will be evaluated, and that is when the template is generated.
Make it Go Faster!
Adding just one record on startup might work for some use cases, but what if that's just not enough data to be useful? DynamoDB has a batchWriteItem
method that might help. That lets me put 25 items into my table in a single API call. I'm going to add another private method that will help me generate data in batches of 25.
private generateBatch = (batchSize = 25): { PutRequest: { Item: IFriend } }[] => {
return new Array(batchSize).fill(undefined).map(() => {
return { PutRequest: { Item: this.generateItem() } };
});
};
Now I just need to swap putItem
with batchWriteItem
and update my parameters block to look like this:
parameters: {
RequestItems: {
[tableName]: this.generateBatch(),
},
},
batchWriteItem
allows writes to multiple tables, so the payload is just a little different - I specify the table per item I want to insert.
And Faster!
Now what if 25 items still aren't enough? I could put my resource in a loop.
for (let i = 0; i < 10; i++) {
new AwsCustomResource(this, `initDBResourceBatch${i}`, {
onCreate: {
service: 'DynamoDB',
action: 'batchWriteItem',
parameters: {
RequestItems: {
[tableName]: this.generateBatch(),
},
},
physicalResourceId: `initDBDataBatch${i}`,
},
});
}
This will generate 250 items. I could loop even more times, but eventually I will hit the limit of how large my cloudformation template can be. This technique can write hundreds of items, but likely not thousands and definitely not tens or hundreds of thousands.
Unlimited Data!
If I need to generate more than a few hundred items, I can use the Provider Framework and write my own lambda function to do exactly what I want. Maybe I'll give that a shot in a future post. For truly large amounts of data, I might need to start looking at Data Pipeline.
Next Steps
I wouldn't consider this example ready for wide use yet, but I've gained a pretty good understand of Custom Resources and their use. I think to get around template size limits, what I'd really want to do is upload some kind of csv or json payload to S3 and ingest that via lambda when I create my resources. I would also want to separate my concerns by publishing this as a separate construct or at least importing it into my main stack, not just adding private members to the class.
Hope this was helpful and informative. Would be glad to see others experiences with loading data via CDK or cloudformation (or even other means) in the comments!
Top comments (9)
Hi Matt,
I wrote an article on Importing data into DynamoDB as fast as possible with as little as possible effort. rehanvdm.com/serverless/dynamodb-i...
So you can put your Custom Resource on steroids if you rather pass the S3 path to the data you want to import as a param. Then stream from S3 and write to Dynamo in parallel.
I only started to play with CDK a week ago and absolutely love it, it is a must for anyone doing raw cloud formation.
Those are some great insights, thanks Rehan! I've been working on generating the data in a lambda and loading it - this should help a lot. I also think that teams might want to check a csv into source control representing different scenarios (for test automation, for example) that could get automatically provisioned and streamed to the DB.
Yes brilliant idea, new environments will then have consistent data after being created, great for testing scenarios.
Heads up for anyone happening across this article: The code is pretty much out of sync with the latest versions of the CDK packages and won't run as-is.
Hey Andrew, I just updated my repo to the latest cdk and everything still works fine. There aren't any deprecated constructs here. Can you be more specific about the issue you had?
I'm not sure how that's possible. With the latest version the CDK you literally cannot set
physicalResourceId: 'initDBData'
. You have to use one of the two static methods listed here: docs.aws.amazon.com/cdk/api/latest...So you didn't even try it? Well thanks for stopping by.
your snide reply aside, for anyone happening across this post, it's not an accurate source of truth.
Not sure if you just misread the docs or if something else is going on, but here is the official docs with that property being set:
docs.aws.amazon.com/cdk/api/latest...
Here is my code doing the exact same thing!
github.com/elthrasher/cdk-dynamo-c...
Notice the static method you referenced is on the right side of the assignment. Now is there an improvement you'd like to suggest?