[Reference: http://www.juimg.com/shiliang/201904/anniutubiao_1768500.html]
As a backend developer, I have been working with various API's and we are pretty much know how to design them. Designing an efficient API comes with experience and requires a diligent effort and rigorous testing. Quite recently I was dealing with the integration of an external API with our internal API where I come across concurrent callback requests for the same resource with the same payload. In the application, we end up having multiple records for the same resources, basically, they are duplicates. We can not possibly control external or third-party APIs and how they are designed. They might have retries or using a queue mechanism or reposting the resources accidentally.
Dealing with such a design will be crucial if we are dealing with payments or mission-critical applications.
Let's take an example of an application integrating with a payment gateway. Each customer who comes to our app uses to select the payment gateway of their choice to pay for a subscription. Internally in our API, we send a request to payment gateway to create a new payment (POST /api/payments/payment)
. If the transaction is successful, we get status successful status from the Payment gateway with transaction details, and the transaction is recorded in our DB. This is the sweet, ideal scenario as shown in the below illustration.
On the other hand, if we encounter any network issues while waiting for the payment gateway to acknowledge us, there is no way to find out that if the payment was successful or not as illustrated below.
After a while, the said payment gateway decided to callback our API to update the broken payment transaction. These calls could come in many means, could be by manually requesting for the payment status or retries from the payment gateway itself or automatic callbacks. And if we do not have a proper method to check for the duplicate transaction records in our database, we might end up updating customer payment as many times as we got the request from the gateway. How do we make sure we do not capture multiple payment records for the same transaction.
Ideally, most of us already know how to deal with such operations. I will try to elaborate on the methods I applied and the outcome of each method and why I choose to go with a certain method.
Method 1: Validation
Customer payments transactions are recorded in a table called payments. Structure of the table looks like
CREATE TABLE `payments` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`order_id` varchar(50) NOT NULL DEFAULT '',
`status` varchar(10) NOT NULL DEFAULT '',
`amount` decimal DEFAULT NULL,
`transaction_id` varchar(50) DEFAULT NULL,
`gateway_response` json DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Each time get a response or callback from the payment gateway, query the DB to check if there is an existing transaction for the same user with similar order_id
SELECT * from payments where user_id = ? and order_id = ?;
Pseudocode for the payment validation:
var payment = getPayment(userId, orderId);
if (payment) {
print 'Payment already exists!. Must be duplicate payment'
throw Error('duplicate') or return;
}
// else
var newPayment = createPayment(userId, reqParams)
The problem with this method is that, when we are swamped with concurrent requests with the same payload in a matter of seconds, we still can not avoid duplication. Because it takes time to query and get the response from DB and during that period, we could have created multiple payment records already.
Method 2: Locking
It is a very interesting mechanism that certainly could be applied in many cases not just dealing with concurrent requests. Implementation is quite similar to the above method, however, instead of DB, we could choose to implement this in in-memory data stores such as Redis.
Pseudocode for the payment validation:
// make a unique reference key for each payment transaction
var paymentKey = 'PAYMENT' + user_id + order_id
var payment = getPayment(paymentKey) // assume that this method calls redis or any other in-memory store to get the key
if (payment) {
print 'Payment already exists!. Must be duplicate payment'
throw Error('duplicate') or return;
}
setPayment(paymentKey) // assume that this methods sets the new payment reference in in-memory
var newPayment = createPayment(userId, reqParams)
From the above pseudocode, when we get the first request, we set the reference in in-memory and create the payment record. And for the subsequent payment records for the same transaction we ignore them.
Unfortunately, even with this method, a query to in-memory was slow enough to record multiple payment records and still could not avoid duplicates
Here are a few interesting articles regarding locking mechanisms to explore.
https://stackoverflow.com/questions/129329/optimistic-vs-pessimistic-locking
https://blog.couchbase.com/optimistic-or-pessimistic-locking-which-one-should-you-pick/
Method 3: Queuing
I did not try this approach since I did not have any queues implemented in the application. However, this could be a more reliable method and gives more flexibility for the application to deal with concurrent requests.
The idea here is to queue all the incoming requests into a queue and deal with them slowly using a consumer and validate each incoming request to make sure we capture only one request.
Method 4: Database table with composite UNIQUE constrain
In this method, we design the payments table with a composite unique key that ensures us to have a unique record for each payment. The table design as follows
CREATE TABLE `payments` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`order_id` varchar(50) NOT NULL DEFAULT '',
`status` varchar(10) NOT NULL DEFAULT '',
`amount` decimal DEFAULT NULL,
`transaction_id` varchar(50) DEFAULT NULL,
`gateway_response` json DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_payment_transaction` (`user_id`,`order_id`,`status`),
KEY `user_id` (`user_id`),
KEY `order_id` (`order_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Note: The DB structure I'm using here is MySQL, the structure may differ from other RDBS systems
From the above table schema, I made a composite key out of (user_id, order_id, status)
If we get the same payment request for the same order for a user, In MySQL we would only have one row recorded, and for the subsequent insertion, MySQL throws a Duplicate entry error. In the application, we can either throw the error or explicitly capture the error from the DB engine and show a meaningful error message to the user.
I chose to go with this approach since its more efficient than the method 1 and 2 and it guarantees that there will not be duplicates and does not require to implement extra validation nor requires to use any in-memory datastores.
Those who are resorting to NoSQL systems might need to look to the first three methods. Method 1 and 2 are not so efficient and queuing methods be the best option if your infrastructure supports. I personally do not prefer to go with NoSQL databases as my primary data stores :-) but they are preferable choices as secondary storage.
Here is the Github Repo that demonstrates the Method 4 implementation.
I appreciate any feedback.
Top comments (4)
Hi and thanks for this article. I have a small question, in the method 2 you use check if the payment exists in-memory but why not to just store the payment in-memory and handle the exception
// make a unique reference key for each payment transaction
var paymentKey = 'PAYMENT' + user_id + order_id
Hey, you could store the payment in in-memory, however, if you are application is scaled across multiple nodes, that method makes it hard to avoid duplicate constraint. Let me know whats your thoughts if you could elaborate more about your approach.
Hi, thanks for the explanation above. However, may i know does your backend use Node instead of PHP, as i did experience use method 4 with PHP will still face the issue due to slow read/write of mysql?
My Backend was Node.
However, if the read/write is slow, with method 4, you are guaranteed that there will not be a duplicate due to DB constraint policy.