Amazon RDS is great. It does some truly incredible things with almost 0 things to worry about for the developer. However, like most good things in life :) RDS is not very cheap. Also, there are a number of other good reasons to setup your own database inside a compute instance (like EC2) instead of using RDS. Yes, if you use RDS, AWS takes full responsibility for the administration, availability, scalability and backups of your database but you do loose some manual control over your database. If you are the kind of person that prefers the manual control over everything and want to explore the idea of manually setting up your own database, the first important issue you need to deal with is make sure your data survives any potential disasters :) . In this article, we would first setup our own database backups and then automate the process using bash and python scripting. We will be using a MySQL docker container for our database but, the process is generic and you should be able to set it up for any database you prefer.
Prerequisites
- docker installed on system
- docker-compose installed on system
- python3 installed on system
Steps
1. Setup MySQL docker container
If we have docker and docker-compose installed in the system, we can quickly spin up a MySQL container using the following docker-compose.yml file.
Docker-Compose
version: '3.7'
services:
db:
image: mysql:5.7
ports:
- "3306:3306"
restart: always
volumes:
- mysql_data_volume:/var/lib/mysql
env_file:
- .env
volumes:
mysql_data_volume:
Now, to start the container:
docker-compose up --build
Now, note down the container name from:
sudo docker ps
In my case the command outputs:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d2f5b2941c93 mysql:5.7 "docker-entrypoint.s…" About a minute ago Up About a minute 0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp db-backup_db_1
So, our container name is db-backup_db_1, this is follows the following convention, {folder-name}_{docker-compose-service-name}_{count-of-containers-of-this-service}
Now, our database is ready. We assume, this database is connected to some application that generates some data and our job is to periodically make backups of that data. So, if necessary, we can simply restore the database with the data from a specific point in time.
Notice we have a environment variables file called .env in our docker-compose, we would get to that soon.
2. Setting up S3 bucket
We cannot just keep our generated data dumps lying in our machine's file storage. Because, if the machine goes down we would loose all our backups. So, we need to store the backups on a persistent file storage like Amazon S3. S3 is widely considered to be one of the best file storage services out there and its very cheap. In this article, we would not go through the process of creating S3 buckets but in case you dont already know, its very easy and can be done from the aws console using just a couple of clicks. You can also get an access_key_id and secret_access_key by setting up programmatic access from the IAM console.
Now, we keep our secrets on the .env file like so,
AWS_ACCESS_KEY_ID=*********
AWS_SECRET_ACCESS_KEY=******************
AWS_S3_REGION_NAME=*********
AWS_STORAGE_BUCKET_NAME=*********
MYSQL_DATABASE=*********
MYSQL_ROOT_PASSWORD=*********
MYSQL_USER=*********
MYSQL_PASSWORD=*********
Secrets include AWS secrets and the database secrets.
3. Generating Backups/Dumps
In order to generate mysql data dumps we have to first connect into our database container then run the mysqldump command.
We can do this using the following one liner:
sudo docker exec db-backup_db_1 sh -c 'mysqldump -u root -p${MYSQL_ROOT_PASSWORD} ${MYSQL_DATABASE} > dump.sql'
This will create a data dump called 'dump.sql' inside the database container. Now, we have to copy the dump from inside the container.
sudo docker cp db-backup_db_1:dump.sql .
Now, we just have to upload the file to our S3 bucket. We will do this using the boto3 python package.
4. Uploading generated dumps to S3 Bucket
We create a python script called upload_to_s3.py like so,
upload_to_s3.py
import sys
from botocore.exceptions import ClientError
import boto3
import os
from datetime import datetime
S3_FOLDER = 'dumps'
def upload_s3(local_file_path, s3_key):
s3 = boto3.client(
's3',
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY")
)
bucket_name = os.getenv("AWS_STORAGE_BUCKET_NAME")
try:
s3.upload_file(local_file_path, bucket_name, s3_key)
except ClientError as e:
print(f"failed uploading to s3 {e}")
return False
return True
def main():
if len(sys.argv) == 0:
print("Error: No File Name Specified !")
return
if not os.getenv("AWS_ACCESS_KEY_ID") or not os.getenv("AWS_SECRET_ACCESS_KEY") or not os.getenv("AWS_STORAGE_BUCKET_NAME"):
print("Error: Could not Find AWS S3 Secrets in Environment")
return
upload_s3(sys.argv[1] + ".sql", S3_FOLDER + "/" + sys.argv[1] + "-" + str(datetime.now()) + ".sql")
if __name__ == '__main__':
main()
To run the script,
# make sure you have boto installed in your python venv
pip install boto3
and then,
python3 upload_to_s3.py dump
This script expects a command line argument with the name of the dump file without the '.sql' extension and the aws secrets in the system environment variables. Then, it uploads the dump file to the s3 bucket under a folder called 'dumps'.
Final Bash Script
backup_db.sh
while read -r l; do export "$(sed 's/=.*$//' <<<$l)"="$(sed -E 's/^[^=]+=//' <<<$l)"; done < <(grep -E -v '^\s*(#|$)' $1)
sudo docker exec db-backup_db_1 sh -c 'mysqldump -u root -p${MYSQL_ROOT_PASSWORD} ${MYSQL_DATABASE} > dump.sql'
sudo docker cp db-backup_db_1:dump.sql .
python3 upload_to_s3.py dump
sudo rm dump.sql
The bash script expects the name of the .env file as command line argument.
The first line is a handy little one liner that parses the .env file and exports the environment vars in the system. (P.S: i didnt come up with it myself obviously o.O)
Then, it generates the dump and uploads the dump to the s3 bucket as we discussed. Finally, we remove the local copy of the dump, since we dont need it anymore.
Now each time we run the script,
bash backup_db.sh .env
We would see a new data dump in our s3 bucket,
5. Doing it periodically
We can easily do it periodically using a cron job. We can set any period we want using the following syntax,
sudo crontab -e
1 2 3 4 5 /path/to/script # add this line to crontab file
where,
1: Minutes (0-59)
2: Hours (0-23)
3: Days (1-31)
4: Month (1-12)
5: Day of the week(1-7)
/path/to/script - path to our script
e.g: we can generate a data dump each week at 8:05 am Sunday using the following,
5 8 * * Sun /path/to/script
Top comments (0)