Intro
In my company, we build a platform for developers to help them deploying easily their apps on AWS. One major feature that we have is the Preview Environment - which let any developer to create a full replica environment from the production for every pull request. It's convenient and we had to find a way to clone the apps and the databases with the data included. That's why I created RepliByte - an open-source tool written in Rust to synchronize cloud databases and hide sensitive data 🔥
Backup your prod Postgres DB into S3
source:
connection_uri: $DATABASE_URL
encryption_key: $MY_PRIVATE_ENC_KEY # optional
bridge:
bucket: $BUCKET_NAME
access_key_id: $ACCESS_KEY_ID
secret_access_key: $AWS_SECRET_ACCESS_KEY
To run the backup
replibyte -c prod-conf.yaml backup run
To list your backups
replibyte -c prod-conf.yaml backup list
type name size when compressed encrypted
PostgreSQL backup-1647706359405 154MB Yesterday at 03:00 am true true
PostgreSQL backup-1647731334517 152MB 2 days ago at 03:00 am true true
PostgreSQL backup-1647734369306 149MB 3 days ago at 03:00 am true true
Clean sensitive data
RepliByte provides the Transformers to clean up the sensitive data from your database.
# Transformers
Here is a list of all the transformers available.
| id | description | available |
| --------------- | -------------------------------------------------------------------------------------------------- | --------- |
| transient | Does not modify the value | yes |
| random | Randomize value but keep the same length (string only). [AAA]->[BBB] | yes |
| first-name | Replace the string value by a first name | yes |
| email | Replace the string value by an email address | yes |
| keep-first-char | Keep only the first char for strings and digit for numbers | yes |
| phone-number | Replace the string value by a phone number | yes |
| credit-card | Replace the string value by a credit card number | yes |
| redacted | Obfuscate your sensitive data (>3 characters strings only). [4242 4242 4242 4242]->[424**********] | yes |
To use the Transformers, you need to edit your configuration file and add them:
source:
connection_uri: $DATABASE_URL
encryption_key: $MY_PRIVATE_ENC_KEY # optional
transformers:
- database: public
table: employees
columns:
- name: last_name
transformer_name: random
- name: birth_date
transformer_name: random-date
- name: first_name
transformer_name: first-name
- name: email
transformer_name: email
- name: username
transformer_name: keep-first-char
- database: public
table: customers
columns:
- name: phone
transformer_name: phone-number
bridge:
bucket: $BUCKET_NAME
access_key_id: $ACCESS_KEY_ID
secret_access_key: $AWS_SECRET_ACCESS_KEY
Then your sensitive data will be hidden while seeding your dev Postgres DB 👌
Seed your dev Postgres DB
To restore a backup, you first need to declare a destination
in your YAML config file.
bridge:
bucket: $BUCKET_NAME
access_key_id: $ACCESS_KEY_ID
secret_access_key: $AWS_SECRET_ACCESS_KEY
destination:
connection_uri: $DATABASE_URL
decryption_key: $MY_PUBLIC_DEC_KEY # optional
Then, you need to run replibyte backup list
to list all the backup available
replibyte -c prod-conf.yaml backup list
type name size when compressed encrypted
PostgreSQL backup-1647706359405 154MB Yesterday at 03:00 am true true
PostgreSQL backup-1647731334517 152MB 2 days ago at 03:00 am true true
PostgreSQL backup-1647734369306 149MB 3 days ago at 03:00 am true true
and replibyte restore
to seed your dev database
replibyte -c prod-conf.yaml restore -v latest
OR
replibyte -c prod-conf.yaml restore -v backup-1647706359405
What else?
RepliByte is written in Rust and all operations are made on the fly. Meaning no extra disk space is consumed and there is no data leak risk. ⚡️
RepliByte also supports MongoDB (Thanks to Benny - contributor) 🔥
Complete data synchronization 💪🏼
Work on different any cloud providers 🌍
You can use multiple transformers to hide your sensitive data 🙈
Designed to backup TB of data 🏆
Skip data sync for specific tables 👌
On-the-fly data (de)compression (Zlib) and de/encryption (AES-256)🛡
Conclusion
RepliByte is a command line tool that makes database seeding super easy and convenient. I am working on a way to restore a database locally with Docker in one command. More is coming so stay tuned and feel free to share your feedback.
RepliByte GitHub: https://github.com/Qovery/replibyte
Top comments (0)