DEV Community

ChunTing Wu
ChunTing Wu

Posted on

Explain Redlock in Depth

Previously, I introduced two types of locks, mutex locks and barriers, and I used Redis as an example to explain the differences between the two types of locks. Shortly after that, I received a reply saying that if you want to implement distributed locks, the Redis approach is not enough and you should refer to Redlock.

Well, his opinion is basically right.

Although Redlock is implemented through Redis, it can achieve a very high level of consistency and is one of the best paradigms for implementing distributed locking. However, Redlock has a very high cost behind it and is not suitable for all organizations and services.

Nevertheless, I will introduce Redlock and present my thoughts on why Redlock is impractical.

Redis is not reliable

Before getting started, I should emphasize that Redis persistence is unreliable, even with the most strict settings.

However, in a cluster environment, a Master may have a number of Slave redundancies. Is Redis still unreliable under these conditions? The answer is yes.

In Redis implementation, data replication is performed by the background process, not the master thread, so when a client writes successfully, it does not mean the data is replicated successfully. One of the procedures leading to the problem is as follows.

Image description

When Client 1 successfully locks via the original Master, but the Master dies before the data is replicated, then Client 2 can still successfully lock via the original Slave (the new Master).

This is the reason why Redis clusters are still unreliable.

Redlock Concept

As we have seen, a single Redis is not reliable, even in a cluster of multiple Redis. So how do we use Redis to implement a reliable Redlock?

The answer is through majority consensus. Since one Redis is not reliable, we form a committee of multiple Redis. If and only if more than half of the committee members agree, the lock will take effect; otherwise, the lock is invalid. The members can be single, master-slave, or even clusters, but nevertheless, they are independent of each other, in other words, they are not duplicates of each other, not to mention the same cluster.

According to the majority consensus algorithm, the committee should have an odd number of members and be approved when a majority of the members (N/2 + 1) agree. N indicates the total number of members.

The detailed process is written in the official Redis documentation on distributed locking, so I'll briefly describe the process below.

Suppose our committee is composed of three Redis.

The following is the process of successfully locking up and doing the task and unlocking successfully.

Image description

  1. The client who wants to obtain the lock generates a globally unique ID, and the official document selects the system time to use.
  2. Try to use this ID to get the consent of all committee members. Use the command SETNX to do this.
  3. 2 members agree, then the lock is successfully in place.
  4. After getting the lock, the user can do what desired.
  5. The next step is to unlock on every member, whether or not the lock is successfully getting.

The process for unsuccessful locking is similar, as long as Redis1 or Redis2 also fails to respond, then the lock cannot be acquired, that is to say, you cannot do anything, but you still have to perform the process of unlocking all the members.

Redlock Issues

After describing the Redlock process above, I'd like to explain why I rarely consider such an approach.

Firstly, the entire Redlock implementation process, as mentioned in the previous section, is very time-consuming. Particularly if you want to lock for 3 seconds, but actually only 2 seconds or less are left after the locking process. Because the application must be initiated to all Redis first, even with parallel processing, the network delay and packet loss still make the communication chaotic and complicated.

Secondly, in order to make unreliable Redis reliable, many independent Redis must be launched. In the context of site reliability engineering, the maintenance effort for so many Redis is very high, and it is also a problem to make the participating clients aware of the existence of so many Redis. Such a approach is impractical in terms of cost, maintenance effort, and complexity of implementation.

Furthermore, the core of this approach is GUID. If IDs are duplicated, both locking and unlocking may result in false positives and unpredictable results. When this happens, the difficulty of detecting it is also significant. Nevertheless, the system time, as officially documented, is a very weak guarantee. In a distributed system, it is difficult to ensure that all instances have the same time, which is known as clock skew in system design.

To sum up, Redlock is an expensive approach with a lot of technical depth. Although many people have implemented packages in various programming languages based on official documents, does each user understand the potential risks behind the simple use of the packages?

Conclusion

The main reason I don't use Redlock is because making unreliable Redis reliable is putting the cart before the horse. I always tell my team members, "Data in Redis needs to be aware that it will disappear without warning". If you want to keep the data persistent, you should consider a more persistent database rather than a cache.

When implementing distributed locking, instead of using Redis, we should use a more reliable database, such as MySQL, which is strongly consistent, or MongoDB, which is my personal preference. But even if we use a database, we should pay attention to the implementation details of the database. Take MongoDB as an example, if we want to implement a lock, then we need to be aware of read-after-write consistency.

There are many aspects to consider behind the system design, and it is not enough to just make the function work. How to control the budget? How to allocate manpower? How to maintain day-to-day? How to troubleshoot? All of these factors involve the capacity of the team, which I believe is far more important than the actual functionality.

When considering the use of distributed locks, I first ask myself, "Do we really need locks? Is there a way to avoid possible race conditions through architecture design?" It is far more effective to avoid locking by improving the architecture than to seek synchronization in a distributed system. If we really have to use a distributed lock, and we need to keep the usage to a minimum, then we don't need to use Redis but can use a relatively slow database to implement it.

Complexity is killing software developers

And Redlock is one of the most complicated approaches. In my opinion, it should be avoided.

Top comments (3)

Collapse
 
jhmilan profile image
Jose Milán • Edited

I have to disagree. The point with Redlock is specially about availability, which is the main issue if you use a consistent DB like MySQL or Mongo for distributed locks. I do use mongo for this as well but assuming that it becomes a single point of failure. Persistence in Redis is not s problem in this case. The solution has some pitfall, specially related to network partitions but not persistence. Regarding speed, in average our implementation takes 3ms per operation with 5 nodes and performing about 5k operations per minute, so I think it's quite nice.
Maintaining it also quite simple if you use kubernetes for example.

You advising against it for the reasons above is a bit too much in my opinion

Collapse
 
lazypro profile image
ChunTing Wu

That's fair.

My opinion has always been the complexity, and if your organization can afford it, then of course you can go with a relatively complex solution.

In fact, I do use Redlock on a limited number of occasions.

Collapse
 
jhmilan profile image
Jose Milán

Indeed. As usual there are no silver bullets for the problems we have and solutions really depend on the particularities. Redlock is indeed a bit more complex that a solution just based on Mongo.

Have a good day