What is the problem we are trying to solve here?
Often you may want to simulate the transactions from production environment in your PreProd/Dev environment to see if things like database upgrades, configuration changes, etc won't cause any regression. Essentially, We want to record the behavior in one environment and replay it in another environment.
What are we doing here?
Step by Step guide to capture statements from one YugabyteDB database cluster and replay them on another YugabyteDB database cluster. We are going to use pgreplay. This guide uses YugabyteDB. But similar steps can be utilized for Postgres database as well.
- I will run a workload simulator to create 3 tables and seed them with 10K rows. You can run your App to carry bunch of database transactions.
- Then we will capture postgres log file which will have all the statements (select, inserts, etc)
- We will install a pgreplay on an app server and will push this file to that server.
- Spin a new cluster. Run the pgreplay with the postgres file and replay the transactions on the new database cluster.
High level flow
Capture statement logs from database Cluster 1
- We need to ensure that all statements are logged.
- Create a tserver.flagfile file like this:
--ysql_pg_conf_csv=log_destination=csvlog,log_statement=all,log_min_messages=error,log_min_error_statement=log,log_connections=on,log_disconnections=on
- Start the yugabyted with the flag file (You can set these as gflags from YBA UI)
yugabyted start --tserver_flags=flagfile=/Users/amitchauhan/Downloads/pgreplay/tserver.flagfile
- Now logs in this directory /var/logs/tserver/postgres* will start showing/capturing the statements.
- Run some transactions. I ran a workload simulator which created 3 tables and added 10K records.
- For simplicity, I combined couple of postgresql-*.csv file into single amit.csv file.
Create a new database cluster and use pgreplay to replay the logs from the previous cluster.
- I will go ahead and create a new cluster in AWS.
- Spin a new machine (or use existing VM) and install pgreplay (you can build the pgreplay from source. I am going to use their docker image):
git clone https://github.com/laurenz/pgreplay.git
cd pgreplay
docker build -t laurenz/pgreplay -f Dockerfile .
- I have copied over my postgres log file to this machine and will now run the pgreplay container, pass in the new database connection details.
sudo docker run --rm -ti -v /home/centos:/app -w /app laurenz/pgreplay pgreplay -h DATABASE-IP-ADDRESS -p 5433 -W XXXXPASSXXXX -d 3 -c /app/amit.csv
- Pgreplay will replay all the statements in my log file.
- Verify that same number of rows are inserted in table:
Top comments (0)