In a distributed system, a process may attempt to connect to several other systems via a network. There can be various failure scenarios. What happens if the server being connected to takes too long to respond? In this case, the process may draw several conclusions. It might conclude after some time that the server is dead or unavailable. If the wait time is too short, the process may unnecessarily conclude that the server is unavailable. Conversely, if the wait time is too long, it will waste resources waiting for a response from the server. Thus, it is almost impossible to create a perfect failure detector. There should be a way for processes to check the availability of others without sending messages. Pings and heartbeats can be engineered to ensure availability without the need for message exchanges.
Ping
A ping is a periodic request that a process sends to another to check whether it is still available.
Pings are often used by load balancers for periodic health checks performed on servers in their pool to ensure they are available and responsive. The load balancer pings each server by sending requests to measure response time and overall server health. If a server fails to respond within a specified timeframe, it is marked as unhealthy and temporarily removed from the pool to prevent user traffic from being directed to it. If an unhealthy server begins responding correctly again, it is automatically reintroduced to the pool.
If you have ever played a multiplayer game, you are likely familiar with the concept of ping. The lower the ping, the better the gameplay experience. Last night, while playing Valorant with my friends, I noticed a ping of 24 on my PC. This means it took 24 milliseconds for a response to travel from my PC to the game server and back.
Heartbeat
A heartbeat refers to a message that a process sends at regular intervals to another process. If the receiving end does not receive a heartbeat within a designated time limit, it triggers a timeout and marks the process as unavailable. However, if the process resumes and begins sending heartbeats again, it will eventually be recognized as available once more.
In a real-time chat application using WebSockets, a heartbeat mechanism is crucial for maintaining a stable connection between the client and server. For instance, the client sends a heartbeat message every 30 seconds, typically containing a timestamp to indicate activity, while the server acknowledges receipt with a corresponding heartbeat response. This exchange helps ensure that the connection remains active and allows the client to quickly detect any disconnections. The server also keeps track if the client is online if it keeps on receiveing heartbeats. If the client does not receive a heartbeat response within a specified timeframe, it assumes the connection is lost and can then attempt to reconnect.
Are heartbeats and pings always used?
Pings and heartbeats are used in systems with frequent interactions, triggering immediate actions if a component becomes unreachable. In less active scenarios, checking for failures during communication is sufficient.
For example, a payment gateway communicates with the merchant’s server without using heartbeats or pings. If the server doesn't respond during a transaction, the gateway alerts the user or retries the payment. On the other hand, as we saw in the chat app, we need a heartbeat mechanism. You can check if a person is online without sending him a message like we can do in whatsapp.
Conclusion
In distributed systems, effective failure detection mechanisms like pings and heartbeats are essential for maintaining connectivity and resource efficiency. By minimizing unnecessary messages, these techniques ensure timely responses and enhance overall system reliability.
In the next blog, I’ll dive into another interesting topic related to distributed systems. Stay tuned!
Here are links to my previous posts on distributed systems:
- Building Resilient Applications: Insights into Scalability and Distributed Systems
- Understanding Server Connections in Distributed Systems
- How are your connections with web secure and integral?
- Understanding System Models in Distributed system
Feel free to check them out and share your thoughts!
Happy Diwali to all my readers!
Top comments (0)