Draft
The story starts with a link checker sharing that mentions the HTTP rate limit header in the IETF proposed standard.
Ideally, we expect something like this in the HTTP response headers:
RateLimit-Limit: 10
RateLimit-Remaining: 1
RateLimit-Reset: 7
RateLimit-Reset
specifies the remaining seconds for the current time window. This should not be considered as a fixed value.
It may also contain a Retry-After
header, usually with a 429 status code.
ratelimit-headers has a test implementation of this draft.
Sadly, some HTTP APIs do not strictly implement this draft (others may not even have these headers). You can find different names like X-RateLimit-Reset
, X-RateLimit-Requests-Reset
, X-RateLimit-Reset-After
, etc. Some official SDKs may consider this.
Python httpx
with rate limit
There are already some implementations for Python HTTP clients. One of them is aiometer. But it's not suitable for my use case. Since httpx
already has the internal pool, it would be better to reuse the design.
BTW, my use case is a web crawler client, I hope I can query the URL directly in the code (with rate limit), instead of gathering lots of URLs and using the map
function.
Here is a simple implementation:
class RateLimitTransport(httpx.AsyncHTTPTransport):
def __init__(self, max_per_second: float = 5, **kwargs) -> None:
"""
Async HTTP transport with rate limit.
Args:
max_per_second: Maximum number of requests per second.
Other args are passed to httpx.AsyncHTTPTransport.
"""
self.interval = 1 / max_per_second
self.next_start_time = 0
super().__init__(**kwargs)
async def notify_task_start(self):
"""
https://github.com/florimondmanca/aiometer/blob/358976e0b60bce29b9fe8c59807fafbad3e62cbc/src/aiometer/_impl/meters.py#L57
"""
loop = asyncio.get_running_loop()
while True:
now = loop.time()
next_start_time = max(self.next_start_time, now)
until_now = next_start_time - now
if until_now <= self.interval:
break
await asyncio.sleep(max(0, until_now - self.interval))
self.next_start_time = max(self.next_start_time, now) + self.interval
async def handle_async_request(self, request: httpx.Request) -> httpx.Response:
await self.notify_task_start()
return await super().handle_async_request(request)
async def __aenter__(self) -> Self:
await self.notify_task_start()
return await super().__aenter__()
async def __aexit__(self, *args: Any) -> None:
await super().__aexit__(*args)
You can specify the rate limit when you initialize your HTTP client like:
client = httpx.AsyncClient(
transport=RateLimitTransport(max_per_second=20),
)
Top comments (0)