When working with database transactions
, one crucial thing we must do is choosing the appropriate isolation level
for our application. Although there’s a well-defined standard, each database engine might choose to implement it in a different way, and thus may behave differently in each isolation level
.
Today we will explore deeply how each level of isolation work in MySQL
and Postgres
by running some concrete SQL queries. We will also learn how each isolation level
prevents read phenomena
such as dirty read
, non-repeatable read
, phantom read
, and serialization anomaly
.
Here's:
- Link to the full series playlist on Youtube
- And its Github repository
Table of contents:
- Transaction isolation and read phenomena
- Isolation levels in MySQL
-
Isolation levels in Postgres
- Get current isolation level in Postgres
- Change isolation level in Postgres
- Read uncommitted isolation level in Postgres
- Read committed isolation level in Postgres
- Repeatable read isolation level in Postgres
- Serialization anomaly in Postgres
- Serializable isolation level in Postgres
- How MySQL handle serialization anomaly
- Summary about relationship betwen isolation levels and read phenomena
Transaction isolation and read phenomena
First, let’s talk a bit about its theory.
ACID property
As we’ve already learned in the previous lecture, a database transaction must satisfy the ACID
property, which stands for Atomicity
, Consistency
, Isolation
, and Durability
.
Isolation
is one of the four property of a database transaction, where at its highest level, a perfect isolation ensures that all concurrent transactions will not affect each other.
There are several ways that a transaction can be interfered by other transactions that runs simultaneously with it. This interference will cause something we called read phenomenon
.
4 read phenomena
Here are some read phenomena that might occurs if a database is running at a low level of transaction isolation:
- First,
dirty read
phenomenon. It happens when a transaction reads data written by other concurrent transaction that has not been committed yet. This is terribly bad, because we don’t know if that other transaction will eventually be committed or rolled back. So we might end up using incorrect data in case rollback occurs. - The second phenomenon we might encounter is
non-repeatable read
. When a transaction reads the same record twice and see different values, because the row has been modified by other transaction that was committed after the first read. -
Phantom read
is a similar phenomenon, but affects queries that search for multiple rows instead of one. In this case, the same query is re-executed, but a different set of rows is returned, due to some changes made by other recently-committed transactions, such as inserting new rows or deleting existing rows which happen to satisfy the search condition of current transaction’s query. - Another phenomenon that involves the separation of a group of transactions is
serialization anomaly
. It’s when the result of a group of concurrent committed transactions could not be achieved if we try to run them sequentially in any order without overlapping each other.
Don’t worry if you don’t fully understand these phenomena right now. We’re going to replicate each and every one of them in MySQL and Postgres in a few minutes.
4 isolation levels
Now in order to deal with these phenomena, 4 standard isolation levels were defined by the American National Standard Institute or ANSI.
- The lowest isolation level is
read uncommitted
. Transactions in this level can see data written by other uncommitted transactions, thus allowingdirty read
phenomenon to happen. - The next isolation level is
read committed
, where transactions can only see data that has been committed by other transactions. Because of this,dirty read
is no longer possible. - A bit more strict is the
repeatable read
isolation level. It ensures that the same select query will always return the same result, no matter how many times it is executed, even if some other concurrent transactions have committed new changes that satisfy the query. - Finally the highest isolation level is
serializable
. Concurrent transactions running in this level are guaranteed to be able to yield the same result as if they’re executed sequentially in some order, one after another without overlapping. So basically it means that there exists at least 1 way to order these concurrent transactions so that if we run them one by one, the final result will be the same.
Relationship between isolation levels and read phenomena
Alright, now it’s time to find the connection between isolation levels and read phenomena. We’re gonna run some transactions with different levels of isolation in MySQL and Postgres to figure out which phenomena are possible in each level. Then we will fill that information into this summary chart:
Here I have 2 running docker containers, one is Postgres version 12 and the other is MySQL version 8. Inside those containers, I have also prepared the simple bank database schema with some initial data like what we’re working on in previous lectures.
❯ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
35f16aed1206 mysql:8 "docker-entrypoint.s…" 8 days ago Up 50 seconds 0.0.0.0:3306->3306/tcp, 33060/tcp mysql8
f9cdf61fcb0a postgres:12-alpine "docker-entrypoint.s…" 3 weeks ago Up 54 seconds 0.0.0.0:5432->5432/tcp postgres12
Isolation levels in MySQL
Let’s connect to MySQL console and access that simple_bank
database.
❯ docker exec -it mysql8 mysql -uroot -psecret simple_bank
mysql>
Get current isolation level in MySQL
To get the transaction isolation level of the current session, we can run
mysql> select @@transaction_isolation;
+-------------------------+
| @@transaction_isolation |
+-------------------------+
| REPEATABLE-READ |
+-------------------------+
1 row in set (0.00 sec)
By default, it is repeatable read
as we can see here. This level is only applied to this specific MySQL console session.
There’s also a global isolation level, which is applied to all sessions when they first started. We can get its value by adding global to the previous select query.
mysql> select @@global.transaction_isolation;
+--------------------------------+
| @@global.transaction_isolation |
+--------------------------------+
| REPEATABLE-READ |
+--------------------------------+
1 row in set (0.00 sec)
By default, it is also repeatable read
.
Change isolation level in MySQL
Now to change the isolation level of current session, we can use this query:
-- Tx1:
mysql> set session transaction isolation level read uncommitted;
Query OK, 0 rows affected (0.00 sec)
You can replace read uncommitted
with the name of the isolation level you want to set.
After this, if we run select transaction isolation again, we will see that it has been changed to read uncommitted
.
-- Tx1:
mysql> select @@transaction_isolation;
+-------------------------+
| @@transaction_isolation |
+-------------------------+
| READ-UNCOMMITTED |
+-------------------------+
1 row in set (0.00 sec)
Note that this is change will only have effects on all future transactions of this current session, but not on transactions that runs on another session of MySQL console.
Read uncommitted isolation level in MySQL
Alright, now in order to demonstrate the interference between 2 concurrent transactions,
I’m gonna open another terminal window, put it side by side with this one, and start a new MySQL console inside it.
Then let’s set the isolation level of this session to read uncommitted
as well.
-- Tx2:
mysql> set session transaction isolation level read uncommitted;
Query OK, 0 rows affected (0.00 sec)
mysql> select @@transaction_isolation;
+-------------------------+
| @@transaction_isolation |
+-------------------------+
| READ-UNCOMMITTED |
+-------------------------+
1 row in set (0.00 sec)
OK, now both sessions are running at read uncommitted
isolation level. We can now start a new transaction.
In MySQL, we can either use start transaction
statement, or simply use begin
statement as an alternative.
-- Tx1
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
-- Tx2
mysql> begin;
Query OK, 0 rows affected (0.01 sec)
OK 2 transactions have started. Let’s run a simple select from accounts query in transaction 1
.
-- Tx1
mysql> select * from accounts;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 100 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
At the moment, there are 3 accounts with the same balance of 100 dollars. Then in transaction 2
, let’s select the first account with id 1.
-- Tx2
mysql> select * from accounts where id = 1;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 100 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
OK we’ve got that account with 100 dollars balance. Now let’s go back to transaction 1
and run this update statement to subtract 10 dollars from account 1.
-- Tx1
mysql> update accounts set balance = balance - 10 where id = 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
Query OK, so if we select account 1 in transaction 1
, we will see that the balance has been changed to 90 dollars.
-- Tx1
mysql> select * from accounts where id = 1;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 90 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
1 row in set (0.00 sec)
But what if we run this same select statement in transaction 2?
-- Tx2
mysql> select * from accounts where id = 1;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 90 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
1 row in set (0.00 sec)
It also sees the modified value of the balance: 90 dollars. Note that transaction 1
is not committed yet, but transaction 2
still sees the change made by transaction 1
.
So this is a dirty-read
, and it happens because we’re using read-uncommitted
isolation level.
Alright, now let’s commit these 2 transactions and try a higher isolation level.
-- Tx1:
mysql> commit;
Query OK, 0 rows affected (0.00 sec)
-- Tx2:
mysql> commit;
Query OK, 0 rows affected (0.00 sec)
Read committed isolation level in MySQL
This time, we will set the isolation level to read committed
for both transaction:
-- Tx1 + Tx2
mysql> set session transaction isolation level read committed;
Query OK, 0 rows affected (0.00 sec)
mysql> select @@transaction_isolation;
+-------------------------+
| @@transaction_isolation |
+-------------------------+
| READ-COMMITTED |
+-------------------------+
1 row in set (0.00 sec)
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
OK now let’s select all records to see the current state of accounts table in transaction 1
:
-- Tx1:
mysql> select * from accounts;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 90 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
3 rows in set (0.01 sec)
Account 1’s balance is now 90 dollars, and the other 2 accounts both have 100 dollars.
Just like what we did before, in transaction 2
, let’s select account with ID 1.
-- Tx2:
mysql> select * from accounts where id = 1;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 90 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
1 row in set (0.00 sec)
Then in transaction 1
, we update that account’s balance by subtracting 10 dollars from it.
-- Tx1:
mysql> update accounts set balance = balance - 10 where id = 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from accounts where id = 1;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 80 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
1 row in set (0.00 sec)
OK, the balance has successfully been changed to 80 dollars in this transaction. Let’s see if this change is visible to transaction 2
or not.
-- Tx2:
mysql> select * from accounts where id = 1;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 90 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
1 row in set (0.00 sec)
Now you can see that when we select account 1 in transaction 2
, its balance is still the same: 90 dollars as before.
This is because we’re using read-committed
isolation level, and since transaction 1 hasn’t been committed yet, its written data could not be seen by other transactions.
So read-committed
isolation level prevents dirty read
phenomenon. How about non-repeatable
and phantom read
?
In transaction 2
, let’s run another select from accounts where balance is greater than or equal to 90 dollars. Then go back to transaction 1
and commit it.
-- Tx2:
mysql> select * from accounts where balance >= 90;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 90 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
3 rows in set (0.00 sec)
-- Tx1:
mysql> commit;
Query OK, 0 rows affected (0.00 sec)
OK now if we read account 1 again in transaction 2
, we can see that the balance has been changed to 80 dollars.
-- Tx2:
mysql> select * from accounts where id = 1;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 80 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
1 row in set (0.00 sec)
So the same query that get account 1 returns different value. This is non-repeatable read
phenomenon.
Also if we rerun the query to get all accounts with balance of at least 90 dollars:
-- Tx2:
mysql> select * from accounts where balance >= 90;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
2 rows in set (0.00 sec)
This time we only get 2 records instead of 3 as before, because the balance of account 1 has decreased to 80 after transaction 1
was committed.
The same query was executed, but a different set of rows is returned. One row has disappeared due to other committed transaction. This is called phantom-read
phenomenon.
So now we know that read-committed
isolation level can only prevent dirty read
, but still allows non-repeatable read
and phantom-read
phenomena.
Let’s commit this transaction and move to a higher level to see what will happen.
mysql> commit;
Query OK, 0 rows affected (0.00 sec)
Repeatable read isolation level in MySQL
Now I’m gonna set both sessions' transaction isolation level to repeatable read
. And begin 2 new transactions
.
-- Tx1 + Tx2
mysql> set session transaction isolation level repeatable read;
Query OK, 0 rows affected (0.00 sec)
mysql> select @@transaction_isolation;
+-------------------------+
| @@transaction_isolation |
+-------------------------+
| REPEATABLE-READ |
+-------------------------+
1 row in set (0.00 sec)
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
Now let’s select all accounts in transaction 1
. Then select account with ID 1 in transaction 2
. Also select all accounts with balance of at least 80 dollars. This will be used to verify if the phantom read still occurs or not.
-- Tx1:
mysql> select * from accounts;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 80 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
3 rows in set (0.00 sec)
-- Tx2:
mysql> select * from accounts where id = 1;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 80 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
1 row in set (0.00 sec)
mysql> select * from accounts where balance >= 80;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 80 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
3 rows in set (0.00 sec)
OK now back to transaction 1
and subtract 10 from its balance. Then get all accounts to see their current state in transaction 1
.
-- Tx1:
mysql> update accounts set balance = balance - 10 where id = 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from accounts;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 70 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
3 rows in set (0.00 sec)
Now we can see the balance of account 1 has decreased to 70 dollars.
We know that dirty read has already been stopped at the lower isolation level: read committed. So we don’t need to check it in this level because of this rule:
Any phenomena that have been prevented at lower isolation level won’t have a chance to occur at higher level.
So let’s commit this transaction 1, then move to transaction 2 to see if it can read the new changes made by transaction 1 or not.
-- Tx1:
mysql> commit;
Query OK, 0 rows affected (0.01 sec)
-- Tx2:
mysql> select * from accounts where id = 1;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 80 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
1 row in set (0.01 sec)
Now this select query returns the old version of account 1, with 80 dollars in balance, although transaction 1 has changed it to 70 and was committed successfully.
That’s because the repeatable-read
isolation level ensures that all read queries are repeatable, which means, it always returns the same result, even if there are changes made by other committed transactions.
Having said that, let’s rerun the query that select accounts with at least 80 dollars:
-- Tx2:
mysql> select * from accounts where balance >= 80;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 80 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
3 rows in set (0.00 sec)
As you can see, it still returns the same 3 records as before. So phantom read
phenomenon is also prevented in this repeatable-read
isolation level. That’s great!
However, I wonder what will happen if we also run the update query to subtract 10 from account 1’s balance in transaction 2
? Will it change the balance to 70, 60, or throw an error? Let’s try it!
-- Tx2:
mysql> update accounts set balance = balance - 10 where id = 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from accounts where id = 1;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 60 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
1 row in set (0.00 sec)
There’s no error, and the account balance is now 60 dollars, which is the correct value because transaction 1
has already committed the change that modified the balance to 70 dollars before.
However, from this transaction 2
’s point of view, it doesn’t make sense because in the last select query, it saw a balance of 80 dollars, but after subtracting 10 dollars from the account, now it get 60 dollars. The math doesn’t work here because this transaction is still being interfered by concurrent updates from other transactions.
I don’t know why MySQL chooses to implement repeatable read
isolation level this way, but it would make more sense to just refuse the change by raising an error in this case to ensure the consistency of the transaction data. Later we will see that’s exactly the way Postgres handles this type of concurrent updates in this isolation level.
For now, let’s just rollback this transaction, and try to move on the highest isolation level to see if this issue can be prevented or not.
-- Tx2:
mysql> rollback;
Query OK, 0 rows affected (0.00 sec)
Serializable isolation level in MySQL
Alright, let’s set both sessions' isolation level to serializable
and begin the transactions.
-- Tx1 + Tx2:
mysql> set session transaction isolation level serializable;
Query OK, 0 rows affected (0.00 sec)
mysql> select @@transaction_isolation;
+-------------------------+
| @@transaction_isolation |
+-------------------------+
| SERIALIZABLE |
+-------------------------+
1 row in set (0.01 sec)
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
OK, both transactions are started. Now let’s select all accounts in transaction 1
, and select just account 1 in transaction 2
.
-- Tx1:
mysql> select * from accounts;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 70 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
3 rows in set (0.00 sec)
-- Tx2:
mysql> select * from accounts where id = 1;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 70 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
1 row in set (0.00 sec)
Next, go back to transaction 1
and subtract 10 more dollars from its balance.
-- Tx1:
mysql> update accounts set balance = balance - 10 where id = 1;
_
This time, interestingly, the update query is blocked. So basically the select query in transaction 2
is blocking this update query in transaction 1
.
The reason is, in serializable
isolation level, MySQL implicitly converts all plain SELECT
query to SELECT FOR SHARE
. And a transaction that holds SELECT FOR SHARE
lock only allows other transactions to READ
the rows, but not UPDATE
or DELETE
them.
So with this locking mechanism, the inconsistent data scenario that we’ve seen before is no longer possible.
However, this lock has a timeout duration. So if the second transaction doesn’t commit or rollback to release the lock within that duration, we will see a lock wait timeout exceeded error like this:
mysql> update accounts set balance = balance - 10 where id = 1;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
So when you use serializable
isolation level in your application, make sure that you have implemented a transaction retry strategy in case timeout occurs.
OK now I’m gonna restart this transaction 1
, run the select query, then update account 1’s balance.
-- Tx1:
mysql> rollback;
Query OK, 0 rows affected (0.00 sec)
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from accounts;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 70 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
3 rows in set (0.00 sec)
mysql> update accounts set balance = balance - 10 where id = 1;
_
But this time, I’m not gonna let the lock wait timeout occur. Let’s see what happen if transaction 2
also try to update the same account 1’s balance.
-- Tx2:
mysql> update accounts set balance = balance - 10 where id = 1;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
-- Tx1:
mysql> update accounts set balance = balance - 10 where id = 1;
Query OK, 1 row affected (23.59 sec)
Rows matched: 1 Changed: 1 Warnings: 0
Here we go, a deadlock has occurred, because now transaction 2
also needs to wait for a lock from transaction 1
’s select query.
So be aware that, beside lock wait timeout
, you also need to take care of possible deadlock
situation.
Now let’s try restarting both transaction, then select account 1.
-- Tx1 + Tx2:
mysql> rollback;
Query OK, 0 rows affected (0.00 sec)
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from accounts where id = 1;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 70 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
1 row in set (0.00 sec)
Now let's update account 1’s balance in transaction 1
, then commit transaction 2
.
-- Tx1:
mysql> update accounts set balance = balance - 10 where id = 1;
_
-- Tx2:
mysql> commit;
Query OK, 0 rows affected (0.00 sec)
As you can see, after we commit transaction 2
, the lock is released right away, and the account 1’s balance has been updated successfully:
-- Tx1:
mysql> update accounts set balance = balance - 10 where id = 1;
Query OK, 1 row affected (3.34 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from accounts where id = 1;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 60 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
1 row in set (0.00 sec)
mysql> commit;
Query OK, 0 rows affected (0.01 sec)
Alright, so far we have experienced all 4 isolation levels in MySQL and how they help preventing some read phenomena.
Now let’s see how they work in Postgres! The effects will be quite similar, but there will also be some differences.
Isolation levels in Postgres
First, let’s start 2 PostgreSQL consoles on these 2 terminal windows.
# Tx1 + Tx2
❯ docker exec -it postgres12 psql -U root -d simple_bank
psql (12.3)
Type "help" for help.
simple_bank>
Get current isolation level in Postgres
In postgres, to get the current isolation level, we run this command:
simple_bank> show transaction isolation level;
transaction_isolation
-----------------------
read committed
(1 row)
By default, it is read committed
. So 1 level lower than the default isolation level in MySQL.
Change isolation level in Postgres
The way we change the isolation level is also different. In MySQL, we set the whole session isolation level before starting the transactions.
But in Postgres, we can only set the isolation level within the transaction, and it will only have effects on that 1 specific transaction.
So let’s begin the transaction 1
, and set its isolation level to read uncommitted
.
-- Tx1:
simple_bank> begin;
BEGIN
simple_bank> set transaction isolation level read uncommitted;
SET
Now if we show transaction isolation level, we can see that it has been changed to read uncommitted
.
-- Tx1:
simple_bank> show transaction isolation level;
transaction_isolation
-----------------------
read uncommitted
(1 row)
Read uncommitted isolation level in Postgres
Let’s do the same thing on the other console for transaction 2
:
-- Tx2:
simple_bank> begin;
BEGIN
simple_bank> set transaction isolation level read uncommitted;
SET
simple_bank> show transaction isolation level;
transaction_isolation
-----------------------
read uncommitted
(1 row)
OK, now in transaction 1
, let’s select all accounts.
-- Tx1:
simple_bank> select * from accounts;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
1 | one | 100 | USD | 2020-09-06 15:06:44.666424+00
2 | two | 100 | USD | 2020-09-06 15:06:44.666424+00
3 | three | 100 | USD | 2020-09-06 15:06:44.666424+00
(3 rows)
At the moment, there are 3 accounts with the same balance of 100 dollars. In transaction 2
, let’s select only account with ID 1.
-- Tx2:
simple_bank> select * from accounts where id = 1;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
1 | one | 100 | USD | 2020-09-06 15:06:44.666424+00
(1 row)
Then go back to transaction 1
and update its balance.
-- Tx1:
simple_bank> update accounts set balance = balance - 10 where id = 1 returning *;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
1 | one | 90 | USD | 2020-09-06 15:06:44.666424+00
(1 row)
UPDATE 1
The balance of account 1 has been changed to 90 dollar here. Now we select that account again in transaction 2
:
-- Tx2:
simple_bank> select * from accounts where id = 1;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
1 | one | 100 | USD | 2020-09-06 15:06:44.666424+00
(1 row)
Strangely, it’s still 100 dollars! This is unexpected, because we’re using read-uncommitted
level, so transaction 2
should be able to see uncommitted data of transaction 1
, right?
Well, in fact, if we look at the documentation of Postgres, we can see that read uncommitted
in Postgres behaves exactly the same as read committed
.
So basically, we can say that Postgres only have 3 isolation levels, and the lowest level is read committed
. It makes sense because normally we would never want to use read uncommitted
in any circumstances.
OK, so let’s go ahead and commit transaction 1
. Then select account 1 in transaction 2
one more time.
-- Tx1:
simple_bank> commit;
COMMIT
-- Tx2:
simple_bank> select * from accounts where id = 1;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
1 | one | 90 | USD | 2020-09-06 15:06:44.666424+00
(1 row)
Now it sees the committed balance: 90 dollars, as expected. Alright, let’s commit this transaction and move to the next isolation level.
-- Tx2:
simple_bank> commit;
COMMIT
Read committed isolation level in Postgres
I’m gonna start 2 new transactions, and set their isolation level to read committed
:
-- Tx1 + Tx2:
simple_bank> begin;
BEGIN
simple_bank> set transaction isolation level read committed;
SET
simple_bank> show transaction isolation level;
transaction_isolation
-----------------------
read committed
(1 row)
Now just like before, let’s select all accounts in transaction 1
, then select just account 1 in transaction 2
.
-- Tx1:
simple_bank> select * from accounts;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
2 | two | 100 | USD | 2020-09-06 15:06:44.666424+00
3 | three | 100 | USD | 2020-09-06 15:06:44.666424+00
1 | one | 90 | USD | 2020-09-06 15:06:44.666424+00
(3 rows)
-- Tx2:
simple_bank> select * from accounts where id = 1;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
1 | one | 90 | USD | 2020-09-06 15:06:44.666424+00
(1 row)
simple_bank> select * from accounts where balance >= 90;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
2 | two | 100 | USD | 2020-09-06 15:06:44.666424+00
3 | three | 100 | USD | 2020-09-06 15:06:44.666424+00
1 | one | 90 | USD | 2020-09-06 15:06:44.666424+00
(3 rows)
Beside dirty read phenomenon, we also want to see how it handle phantom read, so let’s find all accounts where balance is greater than or equal to 90 dollars. At the moment, all 3 records satisfy this search condition.
Now let’s go back to transaction 1
and subtract 10 dollars from account 1’s balance.
-- Tx1:
simple_bank> update accounts set balance = balance - 10 where id = 1 returning *;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
1 | one | 80 | USD | 2020-09-06 15:06:44.666424+00
(1 row)
-- Tx2:
simple_bank> select * from accounts where id = 1;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
1 | one | 90 | USD | 2020-09-06 15:06:44.666424+00
(1 row)
If we select account 1 in transaction 2
, it will still be 90 dollars because transaction 1 is not committed yet. So dirty read
is not possible in read-committed
isolation level.
Let’s see what happen if we commit transaction 1
.
-- Tx1:
simple_bank> commit;
COMMIT
-- Tx2:
simple_bank> select * from accounts where id = 1;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
1 | one | 80 | USD | 2020-09-06 15:06:44.666424+00
(1 row)
This time transaction 2
can see the updated balance of 80 dollars. Now if we run the query to search for accounts with at least 90 dollars again, we will only see 2 records instead of 3 as before.
-- Tx2:
simple_bank> select * from accounts where balance >= 90;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
2 | two | 100 | USD | 2020-09-06 15:06:44.666424+00
3 | three | 100 | USD | 2020-09-06 15:06:44.666424+00
(2 rows)
The updated account 1’s balance no longer satisfies the search condition, so it has disappeared from the result set. So phantom read
has occured in this read-committed
isolation level.
That’s the same behaviour as in MySQL. Let’s commit this transaction and move up 1 level.
-- Tx2:
simple_bank> commit;
COMMIT
Repeatable read isolation level in Postgres
I’m gonna begin 2 new transactions, then set their transaction isolation level to repeatable read
.
-- Tx1 + Tx2:
simple_bank> begin;
BEGIN
simple_bank> set transaction isolation level repeatable read;
SET
simple_bank> show transaction isolation level;
transaction_isolation
-----------------------
repeatable read
(1 row)
Alright, now let’s select all accounts in transaction 1
, then select just account with ID 1 in transaction 2
. Also search for the accounts with balance of at least 80 dollars.
-- Tx1:
simple_bank> select * from accounts;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
2 | two | 100 | USD | 2020-09-06 15:06:44.666424+00
3 | three | 100 | USD | 2020-09-06 15:06:44.666424+00
1 | one | 80 | USD | 2020-09-06 15:06:44.666424+00
(3 rows)
-- Tx2:
simple_bank> select * from accounts where id = 1;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
1 | one | 80 | USD | 2020-09-06 15:06:44.666424+00
(1 row)
simple_bank> select * from accounts where balance >= 80;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
2 | two | 100 | USD | 2020-09-06 15:06:44.666424+00
3 | three | 100 | USD | 2020-09-06 15:06:44.666424+00
1 | one | 80 | USD | 2020-09-06 15:06:44.666424+00
(3 rows)
Now go back to transaction 1
and subtract 10 more dollars from its balance.
-- Tx1:
simple_bank> update accounts set balance = balance - 10 where id = 1 returning *;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
1 | one | 70 | USD | 2020-09-06 15:06:44.666424+00
(1 row)
UPDATE 1
The balance has been updated to 70 dollars in this transaction. Let’s commit it and see what will happen in transaction 2
.
-- Tx1:
simple_bank> commit;
COMMIT
-- Tx2:
simple_bank> select * from accounts where id = 1;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
1 | one | 80 | USD | 2020-09-06 15:06:44.666424+00
(1 row)
Now if we select account 1 in transaction 2
, it’s still 80 dollars as before, although transaction 1
has committed its change.
That’s because we’re using repeatable read isolation level, so the same select query should always return the same result. Non-repeatable read
phenomenon cannot happen in this case.
Also, if we rerun the query to search for accounts with at least 80 dollars:
-- Tx2:
simple_bank> select * from accounts where balance >= 80;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
2 | two | 100 | USD | 2020-09-06 15:06:44.666424+00
3 | three | 100 | USD | 2020-09-06 15:06:44.666424+00
1 | one | 80 | USD | 2020-09-06 15:06:44.666424+00
(3 rows)
We still get the same 3 records as before. So phantom read
is also prevented in this repeatable read
isolation level.
Now I’m gonna try to run this update account balance query to see how it behaves:
-- Tx2:
simple_bank> update accounts set balance = balance - 10 where id = 1 returning *;
ERROR: could not serialize access due to concurrent update
In MySQL’s repeatable read
isolation level, we have seen that it allows the balance to be updated to 60 dollars. But here, in Postgres, we’ve got an error:
ERROR: could not serialize access due to concurrent update
I think throwing out an error like this is much better than allowing the balance to be modified, because it avoid a confusing state, where the transaction saw subtracting 10 from 80 produces 60. So kudos to Postgres!
-- Tx2:
simple_bank> rollback;
ROLLBACK
Serialization anomaly in Postgres
Until now we have encountered 3 types of phenomena: dirty read
, non-repeatable read
, and phantom read
. But we haven’t run into serialization anomaly
yet. So this time, let’s see how it’s gonna look like.
Let’s start 2 new transactions, and set their isolation level to repeatable-read
.
-- Tx1 + Tx2:
simple_bank> begin;
BEGIN
simple_bank> set transaction isolation level repeatable read;
SET
simple_bank> show transaction isolation level;
transaction_isolation
-----------------------
repeatable read
(1 row)
Then in transaction 1
, let’s select all accounts record.
-- Tx1:
simple_bank=# select * from accounts;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
2 | two | 100 | USD | 2020-09-06 15:06:44.666424+00
3 | three | 100 | USD | 2020-09-06 15:06:44.666424+00
1 | one | 70 | USD | 2020-09-06 15:06:44.666424+00
(3 rows)
Now imagine that we have a use-case, where we have to compute the sum of all accounts’ balance then create a new account with that total balance.
So let’s run this command this transaction 1
:
-- Tx1:
simple_bank> select sum(balance) from accounts;
sum
-----
270
(1 row)
It's 270 dollars. Then we insert into accounts table a new record, where owner
is "sum", balance
is 270, and currency
is "USD".
-- Tx1:
simple_bank=# insert into accounts(owner, balance, currency) values ('sum', 270, 'USD') returning *;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
5 | sum | 270 | USD | 2020-09-15 14:18:31.612735+00
(1 row)
INSERT 0 1
simple_bank=# select * from accounts;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
2 | two | 100 | USD | 2020-09-06 15:06:44.666424+00
3 | three | 100 | USD | 2020-09-06 15:06:44.666424+00
1 | one | 70 | USD | 2020-09-06 15:06:44.666424+00
5 | sum | 270 | USD | 2020-09-15 14:18:31.612735+00
(4 rows)
OK now we can see the new record in this transaction 1
. However, what if transaction 2
also wants to perform this operation?
Since we’re using repeatable-read
isolation level, the select query in transaction 2
will only see the original list of accounts, without the new record that transaction 1 has just inserted.
simple_bank> select * from accounts;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
2 | two | 100 | USD | 2020-09-06 15:06:44.666424+00
3 | three | 100 | USD | 2020-09-06 15:06:44.666424+00
1 | one | 70 | USD | 2020-09-06 15:06:44.666424+00
(3 rows)
simple_bank> select sum(balance) from accounts;
sum
-----
270
(1 row)
simple_bank> insert into accounts(owner, balance, currency) values ('sum', 270, 'USD') returning *;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
6 | sum | 270 | USD | 2020-09-15 14:14:15.677416+00
(1 row)
INSERT 0 1
Therefore, it will get the same value for the sum of accounts balance, 270 dollars. And thus, end up inserting the same record to the accounts table.
OK now let’s commit both transactions to see what will happen.
-- Tx1:
simple_bank> commit;
COMMIT
-- Tx2:
simple_bank> commit;
COMMIT
simple_bank> select * from accounts;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
2 | two | 100 | USD | 2020-09-06 15:06:44.666424+00
3 | three | 100 | USD | 2020-09-06 15:06:44.666424+00
1 | one | 70 | USD | 2020-09-06 15:06:44.666424+00
5 | sum | 270 | USD | 2020-09-15 14:18:31.612735+00
6 | sum | 270 | USD | 2020-09-15 14:14:15.677416+00
(5 rows)
They were both committed successfully. And there are 2 duplicate sum records with the same balance of 270 dollars.
This is a serialization anomaly!
Why?
Because if these 2 transactions are run serially, one after another, then there’s no way we can have 2 records with the same sum of 270 like that.
It doesn’t matter if transaction 1 or transaction 2 runs first, we should have 1 record of 270 dollars, and another record of 540 dollars.
OK so that’s how serialization anomaly
occurs in repeatable-read
isolation level.
Now let’s try the highest level: serializable
to see if this anomaly can be stopped or not.
Serializable isolation level in Postgres
I’m gonna start 2 new transactions, then set their isolation level to serializable
.
-- Tx1 + Tx2:
simple_bank> begin;
BEGIN
simple_bank> set transaction isolation level serializable;
SET
simple_bank> show transaction isolation level;
transaction_isolation
-----------------------
serializable
(1 row)
Now let’s select all accounts in transaction 1
, calculate the sum of all balances, and insert a new account with balance equals to this sum.
-- Tx1:
simple_bank> select * from accounts;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
2 | two | 100 | USD | 2020-09-06 15:06:44.666424+00
3 | three | 100 | USD | 2020-09-06 15:06:44.666424+00
1 | one | 70 | USD | 2020-09-06 15:06:44.666424+00
5 | sum | 270 | USD | 2020-09-15 14:18:31.612735+00
6 | sum | 270 | USD | 2020-09-15 14:14:15.677416+00
(5 rows)
simple_bank> select sum(balance) from accounts;
sum
-----
810
(1 row)
simple_bank> insert into accounts(owner, balance, currency) values ('sum', 810, 'USD') returning *;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
7 | sum | 810 | USD | 2020-09-15 14:25:20.091212+00
(1 row)
INSERT 0 1
simple_bank> select * from accounts;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
2 | two | 100 | USD | 2020-09-06 15:06:44.666424+00
3 | three | 100 | USD | 2020-09-06 15:06:44.666424+00
1 | one | 70 | USD | 2020-09-06 15:06:44.666424+00
5 | sum | 270 | USD | 2020-09-15 14:18:31.612735+00
6 | sum | 270 | USD | 2020-09-15 14:14:15.677416+00
7 | sum | 810 | USD | 2020-09-15 14:25:20.091212+00
(6 rows)
Now as you can see, a new sum record of 810 dollars has been inserted in transaction 1
. Let’s go to transaction 2
and run the same series of query.
-- Tx2:
simple_bank> select * from accounts;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
2 | two | 100 | USD | 2020-09-06 15:06:44.666424+00
3 | three | 100 | USD | 2020-09-06 15:06:44.666424+00
1 | one | 70 | USD | 2020-09-06 15:06:44.666424+00
5 | sum | 270 | USD | 2020-09-15 14:18:31.612735+00
6 | sum | 270 | USD | 2020-09-15 14:14:15.677416+00
(5 rows)
simple_bank> select sum(balance) from accounts;
sum
-----
810
(1 row)
simple_bank> insert into accounts(owner, balance, currency) values ('sum', 810, 'USD') returning *;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
8 | sum | 810 | USD | 2020-09-15 14:25:33.060027+00
(1 row)
INSERT 0 1
simple_bank> select * from accounts;
id | owner | balance | currency | created_at
----+-------+---------+----------+-------------------------------
2 | two | 100 | USD | 2020-09-06 15:06:44.666424+00
3 | three | 100 | USD | 2020-09-06 15:06:44.666424+00
1 | one | 70 | USD | 2020-09-06 15:06:44.666424+00
5 | sum | 270 | USD | 2020-09-15 14:18:31.612735+00
6 | sum | 270 | USD | 2020-09-15 14:14:15.677416+00
8 | sum | 810 | USD | 2020-09-15 14:25:33.060027+00
(6 rows)
After this select query, we can see that the list of accounts in both transactions are almost identical (except the ID).
Let’s try to commit both of them.
-- Tx1:
simple_bank=# commit;
COMMIT
-- Tx2:
simple_bank> commit;
ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
HINT: The transaction might succeed if retried.
The transaction 1
is committed successfully. However, the transaction 2
throws an error:
ERROR: could not serialize access due to read/write dependencies among transactions
And Postgres gives us a hint that the transaction might succeed if we retry it.
So this is good! The serializable anomaly
is completely prevented. The 2 concurrent transactions no longer create duplicate records like they did before.
We can conclude that Postgres uses a dependencies detection
mechanism to detect potential read phenomena
and stop them by throwing out an error.
How MySQL handle serialization anomaly
MySQL, on the other hand, chooses to use locking mechanism
to achieve similar result. Let’s see how it handle serialization anomaly
!
Let's open the 2 MySQL console sessions that we were working on, and set their transaction isolation level to serializable
.
-- Tx1 + Tx2:
mysql> set session transaction isolation level serializable;
Query OK, 0 rows affected (0.00 sec)
mysql> select @@transaction_isolation;
+-------------------------+
| @@transaction_isolation |
+-------------------------+
| SERIALIZABLE |
+-------------------------+
1 row in set (0.01 sec)
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
Now in transaction 1
, let's select all accounts record, compute the sum of all accounts’ balance, and insert a new record with that sum into the accounts table.
-- Tx1:
mysql> select * from accounts;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 60 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
+----+-------+---------+----------+---------------------+
3 rows in set (0.00 sec)
mysql> select sum(balance) from accounts;
+--------------+
| sum(balance) |
+--------------+
| 260 |
+--------------+
1 row in set (0.00 sec)
mysql> insert into accounts (owner, balance, currency) values ('sum', 260, 'USD');
Query OK, 1 row affected (0.00 sec)
mysql> select * from accounts;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 60 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
| 4 | sum | 260 | USD | 2020-09-15 14:36:20 |
+----+-------+---------+----------+---------------------+
4 rows in set (0.00 sec)
Then switch to transaction 2
and run the query to select all accounts.
-- Tx2
mysql> select * from accounts;
_
As you can see, this query is blocked and it needs to wait for transaction 1
to release the lock before continue.
But as soon as we commit transaction 1
,
-- Tx1:
mysql> commit;
Query OK, 0 rows affected (0.00 sec)
-- Tx2:
mysql> select * from accounts;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 60 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
| 4 | sum | 260 | USD | 2020-09-15 14:36:20 |
+----+-------+---------+----------+---------------------+
4 rows in set (46.29 sec)
The lock is released, and transaction 2
get the result of its query immediately.
Now we can continue running the sum and insert query in this transaction, and finally commit it.
-- Tx2:
mysql> select * from accounts;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 60 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
| 4 | sum | 260 | USD | 2020-09-15 14:36:20 |
+----+-------+---------+----------+---------------------+
4 rows in set (46.29 sec)
mysql> select sum(balance) from accounts;
+--------------+
| sum(balance) |
+--------------+
| 520 |
+--------------+
1 row in set (0.00 sec)
mysql> insert into accounts (owner, balance, currency) values ('sum', 520, 'USD');
Query OK, 1 row affected (0.00 sec)
mysql> select * from accounts;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 60 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
| 4 | sum | 260 | USD | 2020-09-15 14:36:20 |
| 5 | sum | 520 | USD | 2020-09-15 14:39:21 |
+----+-------+---------+----------+---------------------+
5 rows in set (0.00 sec)
mysql> commit;
Query OK, 0 rows affected (0.01 sec)
There’s no duplicate sum records. So MySQL has also successfully prevented the serialization anomaly
with its locking mechanism
.
Now I want to try different order of the queries in these 2 transactions.
First begin transaction 1, select all accounts record, select sum of all accounts’ balance.
-- Tx1:
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from accounts;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 60 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
| 4 | sum | 260 | USD | 2020-09-15 14:36:20 |
| 5 | sum | 520 | USD | 2020-09-15 14:39:21 |
+----+-------+---------+----------+---------------------+
5 rows in set (0.00 sec)
mysql> select sum(balance) from accounts;
+--------------+
| sum(balance) |
+--------------+
| 1040 |
+--------------+
1 row in set (0.00 sec)
Then begin transaction 2, also select all accounts, and calculate the sum of all accounts’ balance.
mysql> begin;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from accounts;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 60 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
| 4 | sum | 260 | USD | 2020-09-15 14:36:20 |
| 5 | sum | 520 | USD | 2020-09-15 14:39:21 |
+----+-------+---------+----------+---------------------+
5 rows in set (0.00 sec)
mysql> select sum(balance) from accounts;
+--------------+
| sum(balance) |
+--------------+
| 1040 |
+--------------+
1 row in set (0.00 sec)
This time, both transactions are having the same sum of 1040 dollars. Let’s insert a new account with this sum in transaction 1
.
-- Tx1:
mysql> insert into accounts (owner, balance, currency) values ('sum', 1040, 'USD');
_
It’s blocked because transaction 2
is holding a share lock that prevent other transactions from updating.
Now if we try to insert a new sum account in transaction 2
,
-- Tx2:
mysql> insert into accounts (owner, balance, currency) values ('sum', 1040, 'USD');
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
We will get a deadlock, because in this case, both transactions have to wait for each other.
And since transaction 2
failed due to the deadlock, the lock is immediately released, which allow transaction 1
to complete its insert query.
mysql> commit;
Query OK, 0 rows affected (0.01 sec)
mysql> select * from accounts;
+----+-------+---------+----------+---------------------+
| id | owner | balance | currency | created_at |
+----+-------+---------+----------+---------------------+
| 1 | one | 60 | USD | 2020-09-06 15:09:38 |
| 2 | two | 100 | USD | 2020-09-06 15:09:38 |
| 3 | three | 100 | USD | 2020-09-06 15:09:38 |
| 4 | sum | 260 | USD | 2020-09-15 14:36:20 |
| 5 | sum | 520 | USD | 2020-09-15 14:39:21 |
| 6 | sum | 1040 | USD | 2020-09-15 14:41:26 |
+----+-------+---------+----------+---------------------+
6 rows in set (0.00 sec)
So after commiting transaction 1
, we can see that a new sum account is successfully inserted. The database stays consistent with no serialization anomaly
.
Summary about relationship betwen isolation levels and read phenomena
Now before we finish, let’s do a quick summary of the relationship between isolation levels and read phenomena in MySQL and Postgres.
In MySQL
In MySQL, the lowest isolation level, read uncommitted
allows all 4 phenomena to occur.
While the next level: read committed
only prevents dirty read
. The rest 3 phenomena are still possible.
The repeatable read
level in MySQL stops the first 3 phenomena: dirty read
, non-repeatable read
, and phantom read
. But it still spares serialization anomaly
, and even some inconsistent concurrent updates
.
The highest isolation level: serializable
is the most strict. It prevents all 4 phenomena. Thanks to the locking mechanism
.
In Postgres
The isolation levels in Postgres produces quite similar result. However, there are still some major differences.
First, the read uncommitted
isolation level behaves the same as read committed
. So basically Postgres only has 3 isolation levels instead of 4 as in MySQL.
And second, Postgres doesn’t use locking mechanism
as MySQL, but it uses a better dependencies detection
technique to stop non-repeatable read
, inconsistent concurrent updates
, and serialization anomaly
.
Also the default isolation level in Postgres is only read committed
, while it is repeatable read
in MySQL.
Keep in mind
The most important thing you should keep in mind when using high isolation level is that there might be some errors, timeout, or even deadlock. Thus, we should carefully implement a retry mechanism for our transactions.
Also, each database engine might implement isolation level differently. So make sure that you have read its documentation carefully, and tried it on your own first before jumping into coding.
References
Here are links to the official documentation about isolation level of MySQL and Postgres, in case you want to have refererence:
And that’s all for today’s article. I really hope it’s helpful for you.
Thanks a lot for reading and see you in the next lecture!
If you like the article, please subscribe to our Youtube channel and follow us on Twitter for more tutorials in the future.
If you want to join me on my current amazing team at Voodoo, check out our job openings here. Remote or onsite in Paris/Amsterdam/London/Berlin/Barcelona with visa sponsorship.
Top comments (10)
This article series is really great. I wished I read this when I started working 2 years ago. I had to learn transactions and DB deadlocks the hard way. PostgreSQL has very extensive documentation, but it is really hard for someone to go through and understand it all. I believe there are lots of engineers who can benefit from this series.
Thank you, Makoto! That means a lot to me knowing that my work can help other people. :)
I don't quite understand the example for
serialization anomaly
.In both transactions you just inserted a new row with some value for balance. Although the value for balance was the sum of balance columns for all previous rows, the value was explicitly written as an explicit number (not a SQL statement that sums up the balances).
So my question is, how PostgreSQL understands the relation and diagnoses the anomaly?
What if you do not use the SELECT query in both transactions and just insert the new row
sum
with the value810
forbalance
? Does PostgreSQL throws an error for read write dependencies among transactions?In the end, What if you use the SELECT query in both transactions but insert a row with some random value for
balance
instead? Does PostgreSQL throws an error for read write dependencies among transactions again?What a precious article series these are. Thank you so much.
Nice articles! Thank you very much!!!
Do Postgres repeatable, or serializible isolation levels prevent from race conditions and deadlocks? Could we just use one of these isolation levels and don't use 'SELECT ... FOR UPDATE' etc., and be sure that there won't be race conditions and deadlocks?
A great one! Thank you so much!
the best explanation so far about isolation level!!
Very nice !
This is a GEM , very under rated article
thank you for the great explanation, you saved my day