I was reading Trilochn Parida's article on How to Schedule Backup of MySQL DB and Store it in S3 using Cron Job and it got me thinking that I should show how Silbo handles our own database backups. We utilize AWS RDS which can be configured to automate backups of our database, creating snapshots for the last 35 days.
A rolling 35 days of snapshots is great to quickly restore a short term backup. Yet there are a number of limitations with this system. The 35 days would be the major factor from an audit perspective. The automated backups are also not SQL exports that you can directly access - they are usable only within AWS RDS. When you deal with disaster recovery, you need to plan for regions going down. We want to be able to recovery in another region.
To achieve this, I will walk through some AWS CDK code to demonstrate how to schedule a such a backup policy.
The code for this is up on GitHub.
Overview
With AWS Lambda having predefined quotas: timeout limit of 15 minutes, memory limits etc, we know we will want to perform our backup on an Amazon EC2 instance. To reduce our costs we only want this instance to startup and shutdown just for the period of the backup. So we will have a Lambda create an EC2 instance and configure its commands via it's Instance User Data
EC2 User Data
The most important aspect of this work flow is contained within the EC2 Instance User Data. Here we need to install any dependencies we have, and execute the commands, and shutdown the instance afterwards.
For my example here, I created a PostgreSQL AWS RDS instance with the default version at the time of writing being PostgreSQL 11.6-R1
. PostgreSQL can perform an export of a database using pg_dump. Our first few commands will be to get this installed on an Amazon Linux 2:
yum install -y wget tar gcc make
wget https://ftp.postgresql.org/pub/source/v11.6/postgresql-11.6.tar.gz
tar -zxvf postgresql-11.6.tar.gz
cd postgresql-11.6/
./configure --without-readline --without-zlib
make
make install
We want to configure the username and password for pg_dump
, and PostgreSQL supports the Password File:
echo "hostname:port:database:username:password" > ~/.pgpass
chmod 600 ~/.pgpass
Finally we want to run pg_dump
, and upload the output to S3:
/usr/local/pgsql/bin/pg_dump -h hostname \
-U username \
-w \
-c \
-f output.pgsql \
databaseName
S3_KEY=S3BucketName/hostname/$(date "+%Y-%m-%d")-databaseName-backup.tar.gz
tar -cvzf output.tar.gz output.pgsql
aws s3 cp output.tar.gz s3://$S3_KEY --sse AES256
This is all basic logic we need here, you could add more commands in, or change this to use pg_dumpall - whatever is needed!
AWS CDK
With the main logic complete, lets utilize the CDK to stand this all up!
Our first step is to create a bucket where our exports will live:
const rdsBackupsBucket = new s3.Bucket(this, 'rdsBackupsBucket', {
blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
encryption: s3.BucketEncryption.S3_MANAGED
});
This is just a standalone bucket, but you could create replication logic across regions, archiving logic into Glacier etc.
Now that we have an Amazon S3 bucket, we need to think about how the EC2 instance will be able to access it. We will want to ensure that the EC2 has an Instance Profile with the appropriate permissions. This is broken into 3 parts:
- Create an IAM Role that can be used by an EC2 instance
- Create an IAM Policy for the role
- Create an Instance Profile and attach the role
const backupInstanceRole = new iam.Role(this, 'backupInstanceRole', {
assumedBy: new iam.ServicePrincipal('ec2.amazonaws.com')
});
backupInstanceRole.addToPolicy(
new iam.PolicyStatement({
effect: iam.Effect.ALLOW,
resources: [rdsBackupsBucket.bucketArn + '/*'],
actions: [
's3:PutObject',
's3:PutObjectAcl'
]
})
);
new iam.CfnInstanceProfile(
this,
'backupInstanceProfile',
{
roles: [
backupInstanceRole.roleName,
],
}
);
Now the User Data we reviewed up above was pretty static, hard coded values etc. We want this stack to be more configurable in regards to the database configuration. I'll configure this as parameters within AWS CDK and pass them as environment variables to the AWS Lambda for the purpose of this simple setup.
<IMPORTANT SEGUE>
When dealing with actual environments, these values should be stored in AWS Secrets Manager. We would then pass the secret name as a parameter via the AWS CDK and AWS Lambda, down into the User Data for the EC2 Instance. Granting the EC2 Instance Profile access to the secrets, it would retrieve them directly and we wouldn't have username/passwords bouncing around.
</IMPORTANT SEGUE>
Back to the CDK! With our variables being passed in via the CDK Context, we will want to retrieve them an set them as environment variables for the lambda. Also, important, we will need to grant the Lambda the required permissions to launch an EC2 instance!
const ec2_region = this.node.tryGetContext('ec2_region');
const ec2_type = this.node.tryGetContext('ec2_type');
const db_host = this.node.tryGetContext('db_host');
const db_user = this.node.tryGetContext('db_user');
const db_pass = this.node.tryGetContext('db_pass');
const db_database = this.node.tryGetContext('db_database');
const launchingLambda = new lambda.Function(this, 'Lambda', {
runtime: lambda.Runtime.PYTHON_3_7,
handler: 'function.lambda_to_ec2',
code: lambda.Code.asset('./resources'),
description: 'Backup Database to S3',
timeout: core.Duration.seconds(30),
environment: {
INSTANCE_REGION: ec2_region,
INSTANCE_TYPE: ec2_type,
INSTANCE_ROLE: backupInstanceRole.roleName,
DB_HOST: db_host,
DB_USER: db_user,
DB_PASS: db_pass,
DB_DATABASE: db_database,
S3_BUCKET: rdsBackupsBucket.bucketName
}
});
launchingLambda.addToRolePolicy(
new iam.PolicyStatement({
effect: iam.Effect.ALLOW,
resources: ['*'],
actions: ['ec2:*']
})
);
launchingLambda.addToRolePolicy(
new iam.PolicyStatement({
effect: iam.Effect.ALLOW,
resources: [backupInstanceRole.roleArn],
actions: ['iam:PassRole']
})
);
Finally we want to schedule our lambda. In a real environment, a lot of consideration will need to go into the frequency of your updates. What level of reliability do you require? Have you worked out your Disaster Recovery strategy and defined your RTOs (Recovery Point Objective)?
Again, to make this configurable, lets set the cron settings via our CDK Context, pulling out the required values and passing them into the Cloudwatch Event Rule:
const lambdaTarget = new eventstargets.LambdaFunction(launchingLambda);
const cron_minute = this.node.tryGetContext('cron_minute');
const cron_hour = this.node.tryGetContext('cron_hour');
const cron_day = this.node.tryGetContext('cron_day');
const cron_month = this.node.tryGetContext('cron_month');
const cron_year = this.node.tryGetContext('cron_year');
new events.Rule(this, 'ScheduleRule', {
schedule: events.Schedule.cron({
minute: cron_minute,
hour: cron_hour,
day: cron_day,
month: cron_month,
year: cron_year
}),
targets: [lambdaTarget],
});
We can then configure all these settings within cdk.json
:
{
"app": "npx ts-node bin/aws-rds-nightly-backup.ts",
"context": {
"@aws-cdk/core:enableStackNameDuplicates": "true",
"aws-cdk:enableDiffNoFail": "true",
"@aws-cdk/core:stackRelativeExports": "true",
"ec2_region": "us-east-1",
"ec2_type": "t2.large",
"db_host": "database-1.ct5iuxjgrvl6.us-east-1.rds.amazonaws.com",
"db_user": "postgres",
"db_pass": "exampledb",
"db_database": "testdb",
"cron_minute": "0",
"cron_hour": "8",
"cron_day": "1",
"cron_month": "*",
"cron_year": "*"
}
}
Enjoy some happy backups!
Top comments (0)