DEV Community

Cover image for Why and how you should rate-limit your API
SystemGlitch
SystemGlitch

Posted on

Why and how you should rate-limit your API

Opening the gates

This is it. Your shiny new product is ready to be released to the public. You worked hard for it, surely it will be a smash hit! A horde of users are impatiently waiting to try it out.

Let me in!

With a confident hand, you click the button. The service is live! Your analytics show that more and more users are coming in. Your advertising campaign and marketing were very effective.

As the numbers grow, you feel overwhelmed, just like your cloud infrastructure. They are simply too many. Your bug tracker starts spitting thousands of reports. Your monitoring goes from green to yellow, to red... You weren't ready for such a huge traffic. You thought you were, but that wasn't the case.

Solving the issue

No time to waste, you need to fix this. You can't let your first user experience remain so miserable. You decide to just shut down the service temporarily while you work on this. But now, nobody can use it anymore. This doesn't feel right.

Open the gate a little meme

šŸ’” That's right! You need to open the gate just a little. This way you let in an amount of traffic that is manageable for your infrastructure.

The remainder will at least be informed that there is no room left for them right now. This is better than getting a terribly slow and malfunctioning experience.


Rate limiting

What you just set up is called rate limiting. It is a crucial component of every system exposed to the internet.

The idea is simple: managing traffic by limiting the number of requests within a period. That can be something like "100 requests per minute" for example.

Use cases

Rate limiting solves several problems.

  • Stability: by limiting the load, your infrastructure's stress is alleviated.
  • Cost control: one should never auto-scale without limits. You would inevitably receive an invoice from your cloud provider that would single-handedly make you go bankrupt. You know what to expect when you put clear limits on your service usage.
  • User experience and security: only abusive users will ever hit the rate limit if it's configured properly. This way, honest users won't have to suffer for a handful of malicious ones.
  • Data control: it is not unusual for any service to be visited by bots or malicious actors trying to extract all the data they have access to. Rate limiting is a great way to hinder scrapers.

Drawbacks

No solution is perfect. Rate limiting also has a fair share of drawbacks.

  • Complexity: behind this seemingly simple idea lies a ton of complexity. It is not that simple to set it up right. There are multiple policies you can use. You will have to carefully calculate and tweak the rate. Some applications also need to correctly handle request bursts.
  • User experience: It is a double-edged sword. If a user legitimately reaches the limit, they will get frustrated. It is never fun to have to stop and wait when you're very productive.
  • Scaling up: rate limiting needs to be constantly monitored and tweaked as you scale up. Limits may need to be increased when new features are rolled out. Do you prefer scaling your infrastructure or trying to squeeze as many users as possible until it degrades?

With that said, I would like to insist that there is no perfect way to do rate limiting. You will have to make your own choices depending on your service and your business.


Going down the rabbit-hole

Things are getting interesting, but more and more complicated. Let's dive into this rabbit hole and hopefully find out what will work best for you.

Someone jumping into a hole

Proxy vs App rate limiting

Before anything, you should ask yourself at which level you need to setup rate limiting. There are two options here:

  • The proxy level allows you to rate limit users even before they actually hit your service. This is the most efficient approach when it comes to performance and security. Most cloud providers have built-in solutions to handle this for you.
  • The application level allows for more fine-grained control over the quotas. You can make it vary depending on if the user is authenticated or not or if it has special permissions. This even allows you to potentially monetize your API by allowing paying customers a higher limit.

Why not both?

Opting for both solutions can be interesting. You would use the proxy level for DDoS protection and to avoid overloading your services. And you would use the application level alongside it where some business logic comes into play.

Policies

There are many different policies that can be used to calculate the users quotas. All of them have their use-cases, so it is again up to you to pick the one that works best for you. Without going to much into details, we will see the four most common rate limit policies.

Fixed window

Fixed window diagram

This policy is the simplest: the rate limit is applied within a fixed time window. Everyone has request counter that is reset every few moments. If the counter exceeds the allowed quota, the request is rejected.

Its main drawback is that it cannot handle burst traffic at all. Imagine you have a quota of 100 requests per minute. When the counters reset for everyone, potentially all your users can send 100 requests all at once.

I can also be very rigid. If the window is too long, users may have to wait a long time before being able to send requests again. If the window is too short, the benefits of rate limiting are reduced.

Sliding window

Sliding window diagram

This policy is an improvement over the fixed window. In short, it tracks the requests made by one user in the last moments rather than resetting all counters all at once.

Let's say we have a window of 1 minute and a quota of 100 requests. If the user has performed less than 100 requests during the last 60 seconds, then the request is accepted. The window is therefore sliding continuously relative to the current time.

This policy is still very rigid but ensures smooth traffic.

Token bucket

Token bucket diagram

The token bucket is a completely different approach. The idea is that every user has a bucket filled with a specified number of tokens. When performing a request, they take one token in the bucket. If the bucket is empty, the request is rejected. The bucket is refilled at a predefined rate until it's full again.

This policy is great for handling short burst traffic and spikes with a long-term smooth rate limiting.

Leaky bucket

Leaky bucket diagram

The leaky bucket works a bit like a funnel. Every user starts with an empty bucket that has a hole in the bottom. The hole is more or less wide, letting only a fixed amount of requests flow for a unit of time. As the requests come in, the bucket can fill up faster than it can drain. Eventually, the bucket is full and overflows: all new requests are rejected.

In the analogy, the width of the opening at the bottom represents the rate. The depth of the bucket represents the burst.

This policy is the most flexible of all four. It can be adjusted easily depending on the traffic and also smooths out the traffic flow.

HTTP standard

At the time of writing, the closest we have to a standard for rate limiting with HTTP is this expired IETF draft.

In short, this document defines a set of HTTP headers that can be used to inform the clients of their quotas and the policy used.

Unfortunately, it is hard to know where this is going or if this has been dropped completely. This is the best we've got, so let's roll with it.


Implementation

Let's do this

For our example, we will work at the application level. We will use Go, redis and the leaky bucket policy. To avoid having to implement the algorithm ourselves, we will use the go-redis/redis_rate library.

Why do we need redis?

Redis is a key/value store that we will use to store our user counters. In a distributed system that can scale up to a certain number of instance, you don't want counters to be held individually by each instance. That would mean that the rate limiting is done by instance and not actually for your service, making it basically useless.

Rate limit service

Let's start by implementing an agnostic service. This way, we can use it with any framework or library easily.

Let's create a new ratelimit package, import our libraries, and setup the basis for our service:



// service/ratelimit/ratelimit.go
package ratelimit

import (
    //...
    rate "github.com/go-redis/redis_rate/v10"
    "github.com/redis/go-redis/v9"
)

type Service struct {
    limiter *rate.Limiter
    limit   rate.Limit
}

func NewService(redisClient redis.UniversalClient, limit rate.Limit) *Service {
    return &Service{
        limiter: rate.NewLimiter(redisClient),
        limit:   limit,
    }
}


Enter fullscreen mode Exit fullscreen mode

Let's then expose a simple Allow() method.



// Allow checks if the given client ID should be rate limited.
// If an error is returned, it should not prevent the user from accessing the service
// (fail-open principle).
func (s *Service) Allow(ctx context.Context, clientID string) (*rate.Result, error) {
    return s.limiter.Allow(ctx, fmt.Sprintf("client-id:%s", clientID), s.limit)
}


Enter fullscreen mode Exit fullscreen mode

We are using the fail-open principle. It is well suited for high-availability services, where it would be more detrimental to block all traffic than to potentially let it flow a bit too much.

For more resource-intensive operations, it would be smarter to use a fail-close approach to ensure the stability even if the rate limiting fails.

Now, we can also implement a simple method that would update the response headers according to the IETF draft previously mentioned.



// UpdateHeaders of the HTTP response according to the given result.
// The headers are set following this IETF draft (not yet standard):
// https://datatracker.ietf.org/doc/draft-ietf-httpapi-ratelimit-headers
func (s *Service) UpdateHeaders(headers http.Header, result *rate.Result) {
    headers.Set(
        "RateLimit-Limit",
        strconv.Itoa(result.Limit.Rate),
    )

    headers.Set(
        "RateLimit-Policy",
        fmt.Sprintf(`%d;w=%.f;burst=%d;policy="leaky bucket"`, result.Limit.Rate, math.Ceil(result.Limit.Period.Seconds()), result.Limit.Burst),
    )

    headers.Set(
        "RateLimit-Remaining",
        strconv.Itoa(result.Remaining),
    )

    headers.Set(
        "RateLimit-Reset",
        fmt.Sprintf("%.f", math.Ceil(result.ResetAfter.Seconds())),
    )
}


Enter fullscreen mode Exit fullscreen mode

Finally, we need to identify our users. For unauthenticated users, it can be tricky. Usually, you then rely on the client's IP. It's not perfect but sufficient most of the time.



// GetDefaultClientID returns the client IP retrieved from the X-Forwarded-For header.
func (s *Service) GetDefaultClientID(headers http.Header) string {
    // X-Forwarded-For: <client-ip>,<load-balancer-ip>
    // or
    // X-Forwarded-For: <supplied-value>,<client-ip>,<load-balancer-ip>
    // We only keep the client-ip.
    parts := strings.Split(headers.Get("X-Forwarded-For"), ",")
    clientIP := parts[0]

    if len(parts) > 2 {
        clientIP = parts[len(parts)-2]
    }

    return strings.TrimSpace(clientIP)
}


Enter fullscreen mode Exit fullscreen mode

āš ļø This header format is the one used by Google Cloud load balancers. This can be different depending on your cloud provider.

We can now create an instance of our service like so:



opts := &redis.Options{
    Addr:       "127.0.0.1:6379",
    Password:   "",
    DB:         0,
    MaxRetries: -1, // Disable retry
}
redisClient := redis.NewClient(opts)
ratelimitService := ratelimit.NewService(redisClient, rate.PerMinute(200))


Enter fullscreen mode Exit fullscreen mode

Of course, ideally we would not hardcode all those settings. Making them configurable with a config file or environment variables would be best. This is however out of the scope of this article.

Middleware

Now that we are done with the rate limit service, we need to put it to use in a new middleware.

For this example, we are going to use the Goyave framework. This REST API framework provides a ton of useful packages and encourages the use of a strong layered architecture. We'll take the blog example project as a starting point.

Registering our service

The first step is to add a name to our rate limit service.



// service/ratelimit/ratelimit.go
import "example-project/service"

func (*Service) Name() string {
    return service.Ratelimit
}


Enter fullscreen mode Exit fullscreen mode


// service/service.go
package service

const (
    //...
    Ratelimit = "ratelimit"
)


Enter fullscreen mode Exit fullscreen mode

Then let's register it in our server:



// main.go
func registerServices(server *goyave.Server) {
    server.Logger.Info("Registering services")

    opts := &redis.Options{
        Addr:       "127.0.0.1:6379",
        Password:   "",
        DB:         0,
        MaxRetries: -1, // Disable retry
    }
    redisClient := redis.NewClient(opts)
    ratelimitService := ratelimit.NewService(redisClient, rate.PerMinute(200))

    server.RegisterService(ratelimitService)
    //...
}



Enter fullscreen mode Exit fullscreen mode

ā„¹ļø You can find the documentation explaining how services work in Goyave here.

Implementing the middleware

Let's set up the basis for our middleware. We'll first create a new interface that will be compatible with our rate limit service, and use it as a dependency of our middleware.



// http/middleware/ratelimit.go
package middleware

import (
    "context"
    "net/http"

    "goyave.dev/goyave/v5"

    "github.com/go-goyave/goyave-blog-example/service"
    rate "github.com/go-redis/redis_rate/v10"
)

type RatelimitService interface {
    Allow(ctx context.Context, clientID string) (*rate.Result, error)
    GetDefaultClientID(headers http.Header) string
    UpdateHeaders(headers http.Header, result *rate.Result)
}

type Ratelimit struct {
    goyave.Component
    RatelimitService RatelimitService
}

func NewRatelimit(ratelimitService RatelimitService) *Ratelimit {
    return &Ratelimit{
        RatelimitService: ratelimitService,
    }
}

func (m *Ratelimit) Init(server *goyave.Server) {
    m.Component.Init(server)
    ratelimitService := server.Service(service.Ratelimit).(RatelimitService)
    m.RatelimitService = ratelimitService
}


Enter fullscreen mode Exit fullscreen mode

Now let's implement the actual logic of our middleware. We want our authenticated users to have a quota of their own, and our guest users to be identified by their IP.



func (m *Ratelimit) getClientID(request *goyave.Request) string {
    if u, ok := request.User.(*dto.InternalUser); ok && u != nil {
        return  strconv.FormatUint(uint64(u.ID), 10)
    }

    return m.RatelimitService.GetDefaultClientID(request.Header())
}


Enter fullscreen mode Exit fullscreen mode

We just have the Handle() method left to implement:



import (
    //...
    "goyave.dev/goyave/v5/util/errors"
)

func (m *Ratelimit) Handle(next goyave.Handler) goyave.Handler {
    return func(response *goyave.Response, request *goyave.Request) {
        res, err := m.RatelimitService.Allow(request.Context(), m.getClientID(request))
        if err != nil {
            m.Logger().Error(errors.New(err))
            next(response, request)
            return // Fail-open
        }

        m.RatelimitService.UpdateHeaders(response.Header(), res)

        if res.Allowed == 0 {
            response.Status(http.StatusTooManyRequests)
            return
        }
        next(response, request)
    }
}



Enter fullscreen mode Exit fullscreen mode

Finally, let's add it as a global middleware, just after the authentication middleware.



// http/route/route.go

func Register(server *goyave.Server, router *goyave.Router) {
    //...
    router.GlobalMiddleware(authMiddleware)

    router.GlobalMiddleware(middleware.NewRatelimit())
    //...
}


Enter fullscreen mode Exit fullscreen mode

ā„¹ļø You can find the documentation explaining how middleware work in Goyave here.

One last thing

Wait! There is one problem with this. The rate limit middleware won't be executed if the authentication fails. Let's extend the auth.JWTAuthenticator to handle this case. We just have to make it implement auth.Unauthorizer. This interface allows custom authenticators to define a custom behavior when authentication fails. The idea is to execute the rate limit middleware even if the auth one blocks the request.

Let's create a new custom authenticator that will use composition with auth.JWTAuthenticator:



// http/auth/jwt.go
package auth

import (
    "net/http"

    "goyave.dev/goyave/v5"
    "goyave.dev/goyave/v5/auth"
)

type JWTAuthenticator[T any] struct {
    *auth.JWTAuthenticator[T]
    ratelimiter goyave.Middleware
}

func NewJWTAuthenticator[T any](userService auth.UserService[T], ratelimiter goyave.Middleware) *JWTAuthenticator[T] {
    return &JWTAuthenticator[T]{
        JWTAuthenticator: auth.NewJWTAuthenticator(userService),
        ratelimiter:      ratelimiter,
    }
}

func (a *JWTAuthenticator[T]) OnUnauthorized(response *goyave.Response, request *goyave.Request, err error) {
    a.ratelimiter.Handle(a.handleFailed(err))(response, request)
}

func (a *JWTAuthenticator[T]) handleFailed(err error) goyave.Handler {
    return func(response *goyave.Response, _ *goyave.Request) {
        response.JSON(http.StatusUnauthorized, map[string]string{"error": err.Error()})
    }
}


Enter fullscreen mode Exit fullscreen mode

We now need to update our routes:



// http/route/route.go

import (
    //...
    customauth "github.com/go-goyave/goyave-blog-example/http/auth"
)

func Register(server *goyave.Server, router *goyave.Router) {
    //...
    ratelimiter := middleware.NewRatelimit()

    authenticator := customauth.NewJWTAuthenticator(userService, ratelimiter)
    authMiddleware := auth.Middleware(authenticator)
    router.GlobalMiddleware(authMiddleware)

    router.GlobalMiddleware(ratelimiter)
}


Enter fullscreen mode Exit fullscreen mode

ā„¹ļø You can find the documentation explaining how authenticators work in Goyave here.

Rate limit in action

We are done! Let's test this.

All done!

Before that, we need to add a redis container in the docker-compose.yml:



services:
  #...
  redis:
    image: redis:7
    ports:
      - '127.0.0.1:6379:6379'


Enter fullscreen mode Exit fullscreen mode

Start the application as explained in the README:



docker compose up -d
dbmate -u postgres://dbuser:secret@127.0.0.1:5432/blog?sslmode=disable -d ./database/migrations --no-dump-schema migrate
go run main.go -seed


Enter fullscreen mode Exit fullscreen mode

Let's query our server with our trusty friend curl:



curl -v http://localhost:8080/articles


Enter fullscreen mode Exit fullscreen mode

Result:



HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json; charset=utf-8
Ratelimit-Limit: 200
Ratelimit-Policy: 200;w=60;burst=200;policy="leaky bucket"
Ratelimit-Remaining: 198
Ratelimit-Reset: 1
Date: Thu, 27 Jun 2024 12:47:08 GMT
Transfer-Encoding: chunked

{"records":[...


Enter fullscreen mode Exit fullscreen mode

We can see our RateLimit headers. Success!


Conclusion

You can finally open the gates (a little) once more and let everyone enjoy your awesome new product without issues or slowdowns.

In the process, you learned all you need to know about in order to get started with rate limiting. Despite not being perfect, this solution is greatly effective! Don't forget to closely monitor your services from now on, and make adjustments to your limits accordingly.

Check out the Goyave framework ! It can help you build better APIs faster in so many ways thanks to its many features such as routing, validation, localization, model mapping, and much, much more.

Let's talk! Was this article useful to you? Do you have anything to add or correct? Or maybe you have an interesting experience to share with us. I'll see you in the comments. Thank you for reading!

Top comments (8)

Collapse
 
artydev profile image
artydev

Thank you

Collapse
 
der_gopher profile image
Alex Pliutau

Great article. How robust the Redis part is? For a lot of unique requests we can make our Redis instance quite busy.

Collapse
 
systemglitch profile image
SystemGlitch

Redis holds up pretty well but it can indeed be overloaded too. If you start seeing too many errors from the Allow method, you should probably scale up redis a bit. If you have really high traffic, you can also start using a redis cluster.

Collapse
 
devulapallyabinay profile image
Abinay Kumar

Nice Work @systemglitch

Collapse
 
srbhr profile image
Saurabh Rai

Wow, this is a comprehensive guide to Rate-Limiting APIs. Nice work @systemglitch

Collapse
 
vijay80 profile image
Vijay Venkatasubramani

I was actually lost coz I have no knowledge of Go. A diagram of how redis helps in rate-limiting would have been really helpful.

Collapse
 
systemglitch profile image
SystemGlitch

Redis is our key/value store for the users request counters. A user will be identified by a key: their IP address or user ID if authenticated. This allows us to share the counters between multiple instances of the program serving the API.

Don't hesitate to ask if this it is still unclear.