DEV Community

Cover image for Overcoming Hard Rate Limits: Efficient Rate Limiting with Token Bucketing and Redis
Amogh Jalan for Middleware

Posted on

Overcoming Hard Rate Limits: Efficient Rate Limiting with Token Bucketing and Redis

As a developer, dealing with external integrations often presents challenges, especially when those integrations impose strict rate limits.

Recently, I faced a significant issue where our application was repeatedly hitting hard rate limits from an external API, resulting in frequent request failures and breaking our data ingestion pipeline. 

Initial attempts to manage the situation, such as naive retry mechanisms and manual rate limiting, proved inadequate. This led me to explore more sophisticated solutions: the token bucketing algorithm and exponential backoff, leveraging Redis to ensure scalability in our distributed system.

The Challenge: Rate Limits and Failed Requests

Meme with Status 429 as hurricane

When interacting with third-party APIs, you often encounter rate limits—restrictions on the number of requests you can make within a given time frame. Exceeding these limits can lead to denied requests, service disruptions, and ultimately a poor user experience. 

In our case, the challenge was compounded by the nature of our traffic and the distributed nature of our system. This issue arose when one of our customers linked all of their 400 repositories to be synced with Middleware, our developer productivity analytics platform.

To give you an idea, the data to sync included over 50,000 pull requests, each requiring about 4 requests, totalling 200,000 requests. Meanwhile, the rate limits for the account allowed just 2000 requests per hour. 
Simple solutions like fixed delays and single-server rate limiting were insufficient, prompting the need for a more robust approach.

High-Level Solution: Traffic Shaping with Token Bucketing and Exponential Backoff

To address the rate limit issue effectively, I aimed to implement a traffic-shaping strategy combining the token bucketing algorithm and exponential backoff. Here's a high-level overview of these concepts:

  1. Traffic Shaping: Managing the flow of requests to ensure smooth interaction with the external API without exceeding rate limits.
    • Token Bucketing: A rate-limiting algorithm that allows bursts of traffic while enforcing a steady rate over time.
  2. Exponential Backoff: A retry mechanism that increases the wait time between retries exponentially, reducing the likelihood of overwhelming the external API.

Understanding Traffic Shaping

Traffic shaping involves controlling the rate at which requests are sent to an external service. By implementing a token bucketing system, we can allow short bursts of traffic while maintaining a consistent request rate over time. This prevents overwhelming the external service and reduces the risk of hitting rate limits.

The Token Bucketing Algorithm

token bucketing algorithm visualization

Source: GeeksforGeeks

The token bucket algorithm is a simple yet effective method for rate limiting. Here’s how it works:

  • A bucket holds a certain number of tokens, each token representing permission to make one request.
  • Tokens are added to the bucket at a fixed rate.
  • When a request is made, a token is removed from the bucket. If the bucket is empty, the request is denied or delayed.
  • The bucket has a maximum capacity, preventing it from holding more tokens than a predefined limit.

Finding the Right Max Tokens and Refill Rate

Determining the optimal values for the maximum tokens and refill rate requires understanding your application's traffic patterns and the external API's rate limits. For example, if an API allows 100 requests per minute:

  • Max Tokens: Set to 100, allowing a burst of up to 100 requests.
  • Refill Rate: Set to approximately 1.67 tokens per second (100 tokens/60 seconds).

Exponential Backoff Algorithm

Exponential Backoff Algorithm Graph

Exponential backoff is a retry strategy where the wait time between retries increases exponentially. This approach reduces the load on the external API by spacing out retries and avoiding a flood of requests after an initial failure.

Here's a simple implementation in pseudocode:

initial_delay = 1 second
max_delay = 32 seconds
attempts = 0

while (attempts < max_attempts) {
    result = make_request()
    if (result.success) {
        break
    }
    delay = min(initial_delay * 2^attempts, max_delay)
    sleep(delay)
    attempts += 1
}
Enter fullscreen mode Exit fullscreen mode

Integrating Token Bucketing and Exponential Backoff with Redis

Redis, an in-memory data structure store, is an excellent choice for implementing rate limiting in a distributed system due to its speed and support for atomic operations. 

Why Redis?

Redis is chosen for its robust feature set that perfectly complements the needs of a distributed rate limiting solution:

  1. Scalability: Handles high throughput with low latency, making it suitable for distributed systems.
  2. Atomic Operations: Ensures that token bucket updates and checks are performed reliably across multiple instances.
  3. Persistence: Optionally persists data to disk, ensuring that the rate limiting state survives restarts.

Implementing Token Bucketing with Redis

import redis
import time

redis_client = redis.StrictRedis(host='localhost', port=6379, db=0)

def get_tokens(bucket_key, refill_rate, max_tokens):
    current_time = time.time()
    last_refill_time = redis_client.hget(bucket_key, 'last_refill_time') or current_time
    tokens = int(redis_client.hget(bucket_key, 'tokens') or max_tokens)
    
    elapsed_time = current_time - float(last_refill_time)
    new_tokens = int(elapsed_time * refill_rate)
    
    tokens = min(tokens + new_tokens, max_tokens)
    redis_client.hset(bucket_key, 'tokens', tokens)
    redis_client.hset(bucket_key, 'last_refill_time', current_time)
    
    return tokens

def consume_token(bucket_key, refill_rate, max_tokens):
    tokens = get_tokens(bucket_key, refill_rate, max_tokens)
    if tokens > 0:
        redis_client.hincrby(bucket_key, 'tokens', -1)
        return True
    else:
        return False
Enter fullscreen mode Exit fullscreen mode

By combining token bucketing and exponential backoff, we can manage external API rate limits effectively. Redis provides the necessary infrastructure to implement these algorithms in a distributed system, ensuring scalability and reliability. 

This approach not only prevents service disruptions but also enhances the overall user experience by maintaining a smooth and consistent interaction with third-party services.

By the way if you are looking to improve developer productivity in your organization check out Middleware.

GitHub logo middlewarehq / middleware

✨ Open-source DORA metrics platform for engineering teams ✨

Middleware Logo

Open-source engineering management that unlocks developer potential

continuous integration Commit activity per month contributors
license Stars

Join our Open Source Community

Middleware Opensource

Introduction

Middleware is an open-source tool designed to help engineering leaders measure and analyze the effectiveness of their teams using the DORA metrics. The DORA metrics are a set of four key values that provide insights into software delivery performance and operational efficiency.

They are:

  • Deployment Frequency: The frequency of code deployments to production or an operational environment.
  • Lead Time for Changes: The time it takes for a commit to make it into production.
  • Mean Time to Restore: The time it takes to restore service after an incident or failure.
  • Change Failure Rate: The percentage of deployments that result in failures or require remediation.

Table of Contents





PS: Do upvote the post for good karma 😇!!

Top comments (4)

Collapse
 
shivamchhuneja profile image
Shivam Chhuneja

insightful piece Amogh!

Collapse
 
amoghjalan profile image
Amogh Jalan

Thanks Shivam!

Collapse
 
samadyarkhan profile image
Samad Yar Khan

Great read!

Collapse
 
jayantbh profile image
Jayant Bhawal

I think being able to visualize exponential backoff in terms of pseudocode is pretty helpful. Thanks!