Design a rate limiter for system design interview

What is it?

A rate limiter controls the rate of traffic from a client or a server. In HTTP, it limits the number of client requests allowed to be sent over a specific time period.

Client-side vs Server-side

API requests can be limited on the client-side or server-side. Having a rate limiter implemented on the server-side allows full control of implementation. However, implementing it on the client-side can be manipulated by malicious users and limits the control of the implementation.

Furthermore, there's API gateway, a cloud microservice that we can use as a middleware when implementing the rate limiter on the server-side. It is a fully managed service that supports not only rate limiting, but also, SSL termination, authentication, IP whitelisting, static content servicing, etc. No reinventing the wheel!

Benefits

Prevent resource starvation (ex. DoS Attack).
In case your backend system relies on external APIs (payment, health records, etc), it can reduce per-call basis costs.
Prevent server overloading.

Algorithms

Token Bucket
Leaking Bucket
Fixed Window Counter
Sliding Window Log
Sliding Window Counter (hybrid: combination of 3 and 4)

Token Bucket

This algorithm basically adds a user-defined number of tokens to the bucket periodically, and each request consumes one token. If no token is available, drop subsequent requests.

verdict: easy to implement; space-efficient; tuning is challenging.

Leaking Bucket

This algorithm processes requests at a fixed rate, which is ideal for a stable outflow rate environment. Similar to token bucket algorithm, but each request is queued instead. If a queue is full, a request is rejected.

verdict: easy to implement; space-efficient; not suitable to handle burst of request spike.

Fixed Window Counter

This algorithm divides timeline into fix-sized time windows and assigns a counter to each window. Each request increments the counter and once a window reaches its threshold, the request is dropped until a new time window starts.

verdict: spikes at the edges of a window could cause more requests than allowed quota. For example, 5 requests/minute is the threshold, and there can be 5 requests between 2:00:00 and 2:01:00 (at the last 50% of the window) and another 5 requests between 2:01:00 and 2:02:00 (at the first 50% of the window).

Sliding Window Log

This algorithm stores the timestamp of each request in a log. When a new request comes in, older timestamps are removed from the log (relative to the start of the current window). If a log size is larger than a pre-defined size, a request is rejected.

verdict: it needs a lot of memory to store timestamps even if a request is rejected, some timestamp may still be in a log.

Sliding Window Counter

This algorithm computes the number of requests in the rolling window to decide whether to accept or reject a request based on the following formula:

requests in the current window + (requests in the previous window * overlap % of the rolling window and previous window)

For example, a minute window threshold is set to 7 requests/minute. 5 requests were made in the last 70% of the previous minute window, and 3 requests were made in the first 30% of the current minute window. Then, within a minute, there were 3 + 5 * 0.7 = 6.5 requests made. So, the current request can go through.
-verdict: it can smooth out spikes in traffic, but it only works for not-so strict look back window (because this algorithm assumes that requests in the previous window are evenly distributed)

Design Decision

Because multiple rate limiters may be required in a system, A configuration that defines a set of rules needs to be stored in a cache (for faster access than db). A configuration rule can look something like below:

domain: messaging
descriptors:
 - key: message_type
   value: marketing
   rate_limit:
     unit: day
     requests_per_unit: 5

Rate Limiter in Distributed Systems

To handle concurrent requests, locking mechanism can be used. However, there're still 2 major issues:

Race condition - which can be solved via Lua Script and sorted_sets (data structure) in Redis
Synchronization, which can be solved with a centralized data store. Sticky session is not efficient and still introduces the same problem.