Image (C) Tai Kedzierski
We just had a use case where we needed to POST a file over to a server. The naive implementation for posting with requests
is to do
with open("my_file.bin", 'rb') as fh:
requests.post(url, data={"bytes": fh.read()})
Job done! Well. If the file is reeeally big, that .read()
operation will attempt to load the entire file into memory, before passing the loaded bytes to requests.post(...)
Clearly, this is going to hurt. A lot.
Use mmap
A quick search yielded a solution using mmap
to create a "memory mapped" object, which would behave like a string, whilst being backed by a file that only gets read in chunks as needed.
As ever, I like making things re-usable, and easy to slot-in. I adapted the example into a contextual object that can be used in-place of a normal call to open()
# It's a tiny snippet, but go on.
# Delviered to You under MIT Expat License, aka "Do what you want"
# I'm not even fussy about attribution.
import mmap
class StringyFileReader:
def __init__(self, file_name, mode):
if mode not in ("r", "rb"):
raise ValueError(f"Invalid mode '{mode}'. Only read-modes are supported")
self._fh = open(file_name, mode)
# A file size of 0 means "whatever the size of the file actually is" on non-Windows
# On Windows, you'll need to obtain the actual size, though
fsize = 0
self._mmap = mmap.mmap(self._fh.fileno(), fsize, access=mmap.ACCESS_READ)
def __enter__(self):
return self
def read(self):
return self._mmap
def __exit__(self, *args):
self._mmap.close()
self._fh.close()
Which then lets us simply tweak the original naive example to:
with StringyFileReader("my_file.bin", 'rb') as fh:
requests.post(url, data={"bytes": fh.read()})
Job. Done.
EDIT: we've discovered through further use that requests
is pretty stupid. It sill tries to read the entire file into memory - possibly by doing a copy of the "string" it receives during one of its internal operations. So this solution seems to only stand in limited cases...
Top comments (4)
That's so terribly simple!
I was anticipating some multipart chunked transferring, but this makes excellent use of the machinery offered by the operating system.
Ever considered using
contextlib
?That does make it even more concise !
That said, I'm not sure how I feel about the enter/exit context being implicit behind this; as in, it reduces the amount of code, but I can feel like reading it back feels unintuitive.
Mind, it's not implicit! It's extracted into the
@contextmanager
function.I respect that you phrase it as unintuitive, since intuition is learned. Indeed, to (very) many, the extracted form of the code sandwich is intuitive. You can observe the movement from explicit sandwiches to extracted in many languages (e.g. Scope.Exit, using in C#).
The power it brings is that the developer cannot possibly forget to cleanup, so the reader doesn't have to wonder whether they did. Assuming code is read 10 times more than it is written, inner peace will be your part after growing this intuition.
Indeed. I guess it's something I just have to get used to - can be regarded as analogous to the
with
keyword which, unless you've learned and used it properly, can look oddly incomplete.