In software engineering, correctness is defined as an application behaving as intended for its expected use cases. Correctness is always important but never more so than when dealing with accounting data. I have been lucky enough to build software that has processed millions of pounds in payments, and these are some of the patterns & principles applied used to calculate and track account balances.
When building financial software, one of the tasks you undertake will be modelling how to calculate and store balances. An initial approach might include an attribute on an Account model, updated when a transaction occurs. Similar to the below:
The above is not a good solution because it breaks data normalisation. The balance of an account should equal the sum of transactions. Storing the calculated value (as the source of truth) will cause issues if the calculation becomes out of sync, and this will happen in an environment with lots of concurrent requests.
Race conditions occur when a system attempts to perform two or more operations simultaneously. In our case, we have a Read-Modify-Write race condition. This happens when two processes read a value and write back a new value. If you imagine multiple processes executing this, you should be able to see we have a race condition.
When we run the above code concurrently, you will see the potential for the processes to read the account balance simultaneously; therefore, once the execution finishes, the account_balance will be incorrect.
- Executor A reads account balance = £100
- Executor B reads account balance = £100
- Executor B adds £50 funds. account_balance = £150
- Executor A adds £50 funds. account_balance = £150
- Executor A saves the account model with attribute account_balance = £150
- Executor B saves the account model with attribute account_balance = £150
As you can see, with concurrent requests, the order of execution cannot be guaranteed. Executor B read the account’s current balance as 100 before Executor A had added the funds. The result is the account_balance is short by £50. This issue only gets worse as the system scales.
The first step in solving this problem will be to adjust how we check the current balance of an account. We will need to maintain a transaction log. Every time a credit or debit happens, we will write a record to this log with the details. We sum all transactions from the log for any given account to calculate the current balance.
This process does have a scaling issue because each time a transaction is added to the log, it increases the compute resources required to calculate the balance. You can alleviate this by implementing a statement system. Where you store the calculated balance at intervals (monthly works). You take the latest stored balance and sum the transactions since, to calculate the current balance.
The above code eradicates the possibility of concurrent requests causing the system to under-calculate the balance of an account at any one time. If we queried during the execution flow, we would have returned the correct balance, and after both requests finish, it will return £200.
However, it does not solve all of our problems. A classic problem in accounting is preventing overspend. Overspend happens when a request comes into an application to debit an account for funds. The application must first check if funds are available to process the request. Suppose concurrent requests to debit an account enter the system. In that case, we face a familiar situation where both checks may validate and write a debit transaction to the log, causing an overspend.
The overspend problem is applicable as long as there are balance constraints which should cause a debit transaction to be rejected, even if the system should also allow for negative balances (overdrafts).
- Account ID has a balance of £100.
- Executor A requests to debit the account for £70.
- Executor B requests to debit the account for £50.
- Executor B checks the balance of the account. The sum of transactions equals £100 credit. It can proceed.
- Executor A checks the balance of the account. The sum of transactions equals £100 credit. It can proceed.
- Executor B writes a debit transaction to the log.
- Executor A writes a debit transaction to the log.
After the above processes have exited, the account balance may be £-20, allowing an unauthorised overspend on the account despite the explicit check to prevent this. Each goroutine calls svc.CalculateBalance(), but they may complete this check simultaneously (although before writing the transaction log).
Solution
There are several layers to solving the above (including constraints at the database level); however, I will discuss one we can implement within the application.
Distributed locks
A distributed lock will ensure only one goroutine at a time can access a resource (like a mutex but in a distributed environment). We are thoughtful about where we place the lock to guarantee only a single transaction is processed (per account) at any time.
One way to get and hold a distributed lock is via Redis — the example below.
In the code above, we first define a key. We can use this key whenever we wish to limit an aspect of our account execution flow to a simultaneous execution. Once the key is defined, we attempt to acquire a lock on it. If another goroutine already has the lock, we will wait for the specified duration (whilst continually retrying to acquire it).
Limiting the scope of our lock key by injecting the account ID ensures we maintain good throughput across the system. If, for example, we had used a generic lock key like ‘account-transaction’, we would only be able to process a single withdrawal at any time for ALL accounts.
Once we have acquired the lock, we can safely continue with the withdrawal request because no other goroutine can access funds in the account.
We have also deferred the lock’s release to guarantee it is released, even in the event of an application error in our function. There is also a timeout set on the lock, ensuring that even if our node was killed (and could not execute the defer function), we will still release the lock after a set amount of time.
Conclusion
Distributed locks are only a single aspect of the controls you need when designing a system that calculates balances. Still, by implementing the above, you will be developing safer, more scalable software.
As all of the example code was executed in a single process, I could have used a Mutex; however this will not work in distributed applications.
Top comments (0)