One of the hardest thing when working with database transaction is locking and handling deadlock.
From my experience, the best way to deal with deadlock is to avoid it. By that I mean we should fine-tune our queries in the transaction so that deadlock won’t have a chance to occur, or at least minimize its chance of occurrence.
And that’s exactly what I’m gonna show you in this article.
Here's:
- Link to the full series playlist on Youtube
- And its Github repository
Alright, let’s dive in!
A potential deadlock scenario
This is the money transfer transaction code that we’ve implemented in the previous lecture.
func (store *Store) TransferTx(ctx context.Context, arg TransferTxParams) (TransferTxResult, error) {
var result TransferTxResult
err := store.execTx(ctx, func(q *Queries) error {
var err error
result.Transfer, err = q.CreateTransfer(ctx, CreateTransferParams{
FromAccountID: arg.FromAccountID,
ToAccountID: arg.ToAccountID,
Amount: arg.Amount,
})
if err != nil {
return err
}
result.FromEntry, err = q.CreateEntry(ctx, CreateEntryParams{
AccountID: arg.FromAccountID,
Amount: -arg.Amount,
})
if err != nil {
return err
}
result.ToEntry, err = q.CreateEntry(ctx, CreateEntryParams{
AccountID: arg.ToAccountID,
Amount: arg.Amount,
})
if err != nil {
return err
}
result.FromAccount, err = q.AddAccountBalance(ctx, AddAccountBalanceParams{
ID: arg.FromAccountID,
Amount: -arg.Amount,
})
if err != nil {
return err
}
result.ToAccount, err = q.AddAccountBalance(ctx, AddAccountBalanceParams{
ID: arg.ToAccountID,
Amount: arg.Amount,
})
if err != nil {
return err
}
return nil
})
return result, err
}
Basically we’ve fixed the deadlock issue caused by the foreign key constraints. However, if we look at the code carefully, we can see a potential deadlock scenario.
In this transaction, we’re updating the balance of the fromAccount
and the toAccount
. And we know that they both require an exclusive lock to perform the operation. So if there are 2 concurrent transactions involving the same pair of accounts, there might be a potential deadlock.
But we already have a test that runs 5 concurrent transfer transactions with the same pair of accounts, but deadlock doesn’t occur, right?
func TestTransferTx(t *testing.T) {
store := NewStore(testDB)
account1 := createRandomAccount(t)
account2 := createRandomAccount(t)
fmt.Println(">> before:", account1.Balance, account2.Balance)
n := 5
amount := int64(10)
errs := make(chan error)
results := make(chan TransferTxResult)
// run n concurrent transfer transaction
for i := 0; i < n; i++ {
go func() {
result, err := store.TransferTx(context.Background(), TransferTxParams{
FromAccountID: account1.ID,
ToAccountID: account2.ID,
Amount: amount,
})
errs <- err
results <- result
}()
}
// check results
existed := make(map[int]bool)
for i := 0; i < n; i++ {
err := <-errs
require.NoError(t, err)
result := <-results
require.NotEmpty(t, result)
// check transfer
transfer := result.Transfer
require.NotEmpty(t, transfer)
require.Equal(t, account1.ID, transfer.FromAccountID)
require.Equal(t, account2.ID, transfer.ToAccountID)
require.Equal(t, amount, transfer.Amount)
require.NotZero(t, transfer.ID)
require.NotZero(t, transfer.CreatedAt)
_, err = store.GetTransfer(context.Background(), transfer.ID)
require.NoError(t, err)
// check entries
fromEntry := result.FromEntry
require.NotEmpty(t, fromEntry)
require.Equal(t, account1.ID, fromEntry.AccountID)
require.Equal(t, -amount, fromEntry.Amount)
require.NotZero(t, fromEntry.ID)
require.NotZero(t, fromEntry.CreatedAt)
_, err = store.GetEntry(context.Background(), fromEntry.ID)
require.NoError(t, err)
toEntry := result.ToEntry
require.NotEmpty(t, toEntry)
require.Equal(t, account2.ID, toEntry.AccountID)
require.Equal(t, amount, toEntry.Amount)
require.NotZero(t, toEntry.ID)
require.NotZero(t, toEntry.CreatedAt)
_, err = store.GetEntry(context.Background(), toEntry.ID)
require.NoError(t, err)
// check accounts
fromAccount := result.FromAccount
require.NotEmpty(t, fromAccount)
require.Equal(t, account1.ID, fromAccount.ID)
toAccount := result.ToAccount
require.NotEmpty(t, toAccount)
require.Equal(t, account2.ID, toAccount.ID)
// check balances
fmt.Println(">> tx:", fromAccount.Balance, toAccount.Balance)
diff1 := account1.Balance - fromAccount.Balance
diff2 := toAccount.Balance - account2.Balance
require.Equal(t, diff1, diff2)
require.True(t, diff1 > 0)
require.True(t, diff1%amount == 0) // 1 * amount, 2 * amount, 3 * amount, ..., n * amount
k := int(diff1 / amount)
require.True(t, k >= 1 && k <= n)
require.NotContains(t, existed, k)
existed[k] = true
}
// check the final updated balance
updatedAccount1, err := store.GetAccount(context.Background(), account1.ID)
require.NoError(t, err)
updatedAccount2, err := store.GetAccount(context.Background(), account2.ID)
require.NoError(t, err)
fmt.Println(">> after:", updatedAccount1.Balance, updatedAccount2.Balance)
require.Equal(t, account1.Balance-int64(n)*amount, updatedAccount1.Balance)
require.Equal(t, account2.Balance+int64(n)*amount, updatedAccount2.Balance)
}
That’s correct! However, the transactions in our existing test all do the same thing: transfer money from account 1
to account 2
. What if some of them transfer money from account 2
to account 1
?
To illustrate how deadlock might occur in this scenario, I have prepared 2 transactions’ in TablePlus:
-- Tx1: transfer $10 from account 1 to account 2
BEGIN;
UPDATE accounts SET balance = balance - 10 WHERE id = 1 RETURNING *;
UPDATE accounts SET balance = balance + 10 WHERE id = 2 RETURNING *;
ROLLBACK;
-- Tx2: transfer $10 from account 2 to account 1
BEGIN;
UPDATE accounts SET balance = balance - 10 WHERE id = 2 RETURNING *;
UPDATE accounts SET balance = balance + 10 WHERE id = 1 RETURNING *;
ROLLBACK;
The 1st transaction
will transfer 10
dollars from account 1
to account 2
, by first subtracting 10
from the the balance of account 1
, and then adding 10
to the balance of account 2
.
The 2nd transaction
will do the reverse work: transfer 10
dollars from account 2
to account 1
. First it subtracts 10
from the balance of account 2
. Then it adds 10
to the balance of account 1
.
Now let’s open the terminal to run these transactions in 2 parallel psql console.
First, I will start the first psql console and BEGIN
the 1st transaction
. Then I’m gonna run its 1st query to subtract 10
from account 1
’s balance.
The account is updated instantly. Now let’s open another tab, start a new psql console, and BEGIN
the 2nd transaction
. Then let’s run its 1st query to subtract 10
from account 2
’s balance.
This query also returns immediately. Now back to the 1st transaction
and run its 2nd query to update account 2
’s balance.
This time, the query is blocked because the 2nd transaction
is also updating the same account 2
.
If we go back to TablePlus and run this query to list all the locks:
We can see that this update account 2
query of transaction 1
is trying to acquire a ShareLock
on transaction ID 911
, but it is not granted yet
That's because transaction 2
is already holding an ExclusiveLock
on the same transaction ID. Therefore, transaction 1
must wait for transaction 2
to finish before continue.
Now if we continue running the 2nd query of transaction 2
to update account 1
’s balance:
We will get a deadlock because this account 1
is being updated by transaction 1
, thus transaction 2
also needs to wait for transaction 1
to finish before getting the result of this query. Deadlock occurs because these 2 concurrent transactions both need to wait for the other.
OK now let’s rollback these 2 transactions then go back to our simple bank project to replicate this scenario in a test.
Replicate deadlock scenario in a test
It’s gonna be very similar to the test that we’ve written in the last lecture, so I will just duplicate that TestTransferTx
function, and change its name to TestTransferTxDeadlock
.
Here, let’s say we’re gonna run n = 10
concurrent transactions. The idea is to have 5
transactions that send money from account 1
to account 2
, and another 5
transactions that send money in reverse direction, from account 2
to account 1
.
func TestTransferTxDeadlock(t *testing.T) {
store := NewStore(testDB)
account1 := createRandomAccount(t)
account2 := createRandomAccount(t)
fmt.Println(">> before:", account1.Balance, account2.Balance)
n := 10
amount := int64(10)
errs := make(chan error)
...
}
In this scenario, we only need to check for deadlock error, we don’t need to care about the result because it has already been checked in the other test. So I have removed the results
channel, and just keep the errs
channel.
Now inside the for loop, let’s define 2 new variables: fromAccountID
will be account1.ID
, and toAccountID
will be account2.ID
.
But since we want half of the transaction to send money from account 2
to account 1
, I will check if the counter i
is an odd number (i % 2 = 1
), then fromAccountID
should be account2.ID
and toAccountID
should be account1.ID
instead.
func TestTransferTxDeadlock(t *testing.T) {
...
for i := 0; i < n; i++ {
fromAccountID := account1.ID
toAccountID := account2.ID
if i%2 == 1 {
fromAccountID = account2.ID
toAccountID = account1.ID
}
go func() {
_, err := store.TransferTx(context.Background(), TransferTxParams{
FromAccountID: fromAccountID,
ToAccountID: toAccountID,
Amount: amount,
})
errs <- err
}()
}
}
Now inside the go routine, we should set the fields of TransferTxParams
to fromAccountID
and toAccountID
. Then remove the results <- result
statement because we don’t care about the result anymore.
OK, now the check errors part. Let’s delete the existed
map, and everything inside the for loop, except the error checking statements.
func TestTransferTxDeadlock(t *testing.T) {
...
for i := 0; i < n; i++ {
err := <-errs
require.NoError(t, err)
}
...
}
We also want to check the final updated balance of the 2 accounts. In this case, there are 10
transactions that move the same amount of money between account 1
and account 2
. But because of this if i%2 == 1
condition, 5
of them will move money from account 1
to account 2
, and the other 5
will move money from account 2
back to account 1
.
Therefore, we expect that in the end, the balance of both account 1
and account 2
should be the same as they were before the transactions.
func TestTransferTxDeadlock(t *testing.T) {
...
// check the final updated balance
updatedAccount1, err := store.GetAccount(context.Background(), account1.ID)
require.NoError(t, err)
updatedAccount2, err := store.GetAccount(context.Background(), account2.ID)
require.NoError(t, err)
fmt.Println(">> after:", updatedAccount1.Balance, updatedAccount2.Balance)
require.Equal(t, account1.Balance, updatedAccount1.Balance)
require.Equal(t, account2.Balance, updatedAccount2.Balance)
}
So here, updatedAccount1.Balance
should equal to account1.Balance
, and updatedAccount2.Balance
should equal to account2.Balance
.
OK let’s run this test!
We’ve got a deadlock error as expected. Let’s learn how to fix it!
Fix the deadlock issue
As you’ve already seen in the example that we ran in psql console, the reason deadlock occurs is because of the different order in which 2 concurrent transactions update the accounts’ balance, where transaction 1
update account 1
before account 2
, while the other transaction 2
update account 2
before account 1
.
So this gives us an idea of how deadlock can be avoided by making both transactions update the accounts balance in the same order. Let’s say in this transaction 2
, we just move the update account 1
query up, and keep everything else the same.
-- Tx1: transfer $10 from account 1 to account 2
BEGIN;
UPDATE accounts SET balance = balance - 10 WHERE id = 1 RETURNING *;
UPDATE accounts SET balance = balance + 10 WHERE id = 2 RETURNING *;
ROLLBACK;
-- Tx2: transfer $10 from account 2 to account 1
BEGIN;
UPDATE accounts SET balance = balance + 10 WHERE id = 1 RETURNING *; -- moved up
UPDATE accounts SET balance = balance - 10 WHERE id = 2 RETURNING *;
ROLLBACK;
So now both transaction 1
and transaction 2
will always update account 1
before account 2
. Let’s try to run them in the psql console to see what will happen!
First, begin transaction 1
and run its 1st query to update account 1
.
Then switch to the other console and begin transaction 2
. Also run its 1st query to update account 1
.
Now unlike before, this time the query is blocked right away, because transaction 1
is already holding an exclusive lock to update the same account 1
. So let’s go back to transaction 1
and run its 2nd query to update account 2
.
The result is returned immediately, and transaction 2
is still blocked. So we just COMMIT
this transaction 1
to release the lock. Then go to transaction 2.
We can see that it is unblocked instantly, and the balance has been updated to the new value.
We can run the 2nd query to update account 2
, then COMMIT transaction 2
successfully with no deadlocks.
Alright, so now we understand that the best defense against deadlocks is to avoid them by making sure that our application always acquire locks in a consistent order.
For example, in our case, we can easily change our code so that it always updates the account with smaller ID first.
Here we check if arg.FromAccountID
is less than arg.ToAccountID
then the fromAccount
should be updated before the toAccount
. Else, the toAccount
should be updated before the fromAccount
.
func (store *Store) TransferTx(ctx context.Context, arg TransferTxParams) (TransferTxResult, error) {
var result TransferTxResult
err := store.execTx(ctx, func(q *Queries) error {
...
if arg.FromAccountID < arg.ToAccountID {
result.FromAccount, err = q.AddAccountBalance(ctx, AddAccountBalanceParams{
ID: arg.FromAccountID,
Amount: -arg.Amount,
})
if err != nil {
return err
}
result.ToAccount, err = q.AddAccountBalance(ctx, AddAccountBalanceParams{
ID: arg.ToAccountID,
Amount: arg.Amount,
})
if err != nil {
return err
}
} else {
result.ToAccount, err = q.AddAccountBalance(ctx, AddAccountBalanceParams{
ID: arg.ToAccountID,
Amount: arg.Amount,
})
if err != nil {
return err
}
result.FromAccount, err = q.AddAccountBalance(ctx, AddAccountBalanceParams{
ID: arg.FromAccountID,
Amount: -arg.Amount,
})
if err != nil {
return err
}
}
return nil
})
return result, err
}
OK, now after this change, we expect that the deadlock should be gone. Let’s rerun our test!
It passed! In the logs, we can see the balances are the same before and after the transactions. Perfect!
Refactor the code
Before we finish, let’s refactor the code a bit, because now it looks quite long and somewhat duplicated. To do this, I’m gonna define a new addMoney()
function to add money to 2 accounts.
It will takes several inputs: the context, the queries object, the ID of the 1st account, the amount of money that should be added to that 1st account, the ID of the 2nd account, and the amount of money that should be added to that 2nd account.
The function will return 3 values: the 1st account object, the 2nd account object after updated, and a potential error.
func addMoney(
ctx context.Context,
q *Queries,
accountID1 int64,
amount1 int64,
accountID2 int64,
amount2 int64,
) (account1 Account, account2 Account, err error) {
account1, err = q.AddAccountBalance(ctx, AddAccountBalanceParams{
ID: accountID1,
Amount: amount1,
})
if err != nil {
return
}
account2, err = q.AddAccountBalance(ctx, AddAccountBalanceParams{
ID: accountID2,
Amount: amount2,
})
return
}
Inside this function, we first call q.AddAcountBalance()
to add amount1
to account1
’s balance. So the ID
should be accountID1
, and Amount
should be amount1
. We save the results to the output account1
and err
variables.
Then we check if err
is not nil, simply return. Here because we’re using named return variables, so this return with no parameters is basically the same as if we write return account1, account2, err
. This is a cool syntax feature of Go that makes the code more concise.
We do similar thing to add amount2
to account2
. Now with this addMoney function in hand, we can refactor our transfer transaction:
func (store *Store) TransferTx(ctx context.Context, arg TransferTxParams) (TransferTxResult, error) {
var result TransferTxResult
err := store.execTx(ctx, func(q *Queries) error {
...
if arg.FromAccountID < arg.ToAccountID {
result.FromAccount, result.ToAccount, err = addMoney(ctx, q, arg.FromAccountID, -arg.Amount, arg.ToAccountID, arg.Amount)
} else {
result.ToAccount, result.FromAccount, err = addMoney(ctx, q, arg.ToAccountID, arg.Amount, arg.FromAccountID, -arg.Amount)
}
return err
})
return result, err
}
If fromAccountID
is less than toAccountID
, we want to update fromAccount
before toAccount
. So here, we call addMoney()
, pass in the context ctx
, query q
, arg.FromAccountID
, -arg.Amount
because money is moving out, then arg.ToAccountID
, and finally arg.Amount
because money is moving in.
The output of this function call should be assigned to result.FromAccount
, result.ToAccount
, and err
.
Else, in case toAccountID
is smaller, we want to make sure that toAccount
is updated before fromAccount
. So we just duplicate the previous command, but change it a bit to reverse the accounts order.
And that’s it! The refactoring is done. Let’s rerun the TestTransferTxDeadlock
!
It passed! Excellent! Let’s also run the normal TestTransferTx
:
Also passed! And finally rerun the whole package test:
All passed!
So everything is now working properly. Deadlock is no longer a threat to our application.
And that’s the end of today’s lecture. I hope it’s useful for you.
Thanks a lot for reading! Happy coding and see you in the next one!
If you like the article, please subscribe to our Youtube channel and follow us on Twitter for more tutorials in the future.
If you want to join me on my current amazing team at Voodoo, check out our job openings here. Remote or onsite in Paris/Amsterdam/London/Berlin/Barcelona with visa sponsorship.
Top comments (4)
Maybe we can refactor the "named return variables" definition as "naked return" statement, as per discussed in ardanlabs.com/blog/2013/10/functio...
How would you resolve the deadlock if primary keys are UUIDs instead?
(thank you so much for the content btw ❤️)
In this case, have you tried using the created_at? Could come in handy.
Isn’t it better to use optimistic locking / advisory lock than dealing with deadlocks? Could you reveal this?