I recently wrote an article detailing the use of Python's urlopen()
for performing HTTP calls. While researching and writing, I learned of the OpenerDirector class. This class offers the opportunity to streamline urlopen()
, make it more secure, and provide custom error handling.
Security problems with the default urlopen()
You might try the following on a Linux system:
from urllib.request import urlopen
url = "file:///etc/passwd"
with urlopen(url) as response:
print(response.read().decode())
A little disturbing, yes? Bandit agrees. Perhaps you want to consider scanning with that security tool or its related flake8 plugin.
The code above certainly communicates the lesson "sanitize your user inputs". Of course, if you control that url
string, or can ensure that it starts with the correct https://
scheme, then things are looking better. If this url comes from user input, though, it would be good to check for protocol at least, and, better yet, ensure that the domain is as expected.
A suggested solution
The following code results in a urlopen()
command that only opens https://
URLs by default:
import urllib.request
class SafeOpener(urllib.request.OpenerDirector):
def __init__(self, handlers: typing.Iterable = None):
super().__init__()
handlers = handlers or (
urllib.request.UnknownHandler,
urllib.request.HTTPDefaultErrorHandler,
urllib.request.HTTPRedirectHandler,
urllib.request.HTTPSHandler,
urllib.request.HTTPErrorProcessor,
)
for handler_class in handlers:
self.add_handler(handler_class())
opener = SafeOpener()
urllib.request.install_opener(opener)
After running the above, using urllib.request.urlopen()
should fail with a URLError
if attempting to open http:
, ftp:
, file:
, data:
, or any other URL that doesn't have https:
at the beginning. It will still follow redirects automatically, just as urlopen()
does, and raise an exception for any HTTP status code that isn't in the 200s or 300s.
By the way, if you prefer not to override the opener in the urlopen()
function, you could remove the install_opener(opener)
line. Then call opener.open()
instead of using urlopen()
.
The above code assumes that all HTTP calls will be encrypted with TLS (aka "SSL", using https:
). That also means that testing will need to use https:
URLs as well. Consider using the vcr library to mock-reproduce HTTP calls, or the trustme library to actually set up certs for testing. If you need to use unencrypted http:
URLs, though, you can simply add urllib.request.HTTPHandler
to handlers
.
The handlers, defined
This is the chain of default handlers normally used by urlopen()
:
-
ProxyHandler
: searches system settings for proxies. If you are 100% sure that your tool or library will never be used with a proxy, then this is not necessary. -
UnknownHandler
: raises a URLError if the protocol requested in the URL is not supported by a handler in this chain. Very helpful and recommended. -
HTTPHandler
: handles unencryptedhttp://
connections. Only add this if you are sure you need URL support other thanhttps:
-
HTTPDefaultErrorHandler
: a prefilter of sorts that turns all responses into exceptions, for handling downstream. This is necessary unless you plan on handling statuses, exceptions, and redirects yourself. -
HTTPRedirectHandler
: handles redirects (status codes 301, 302, 303, or 307) and is necessary if automatic following of redirects is desired. -
FTPHandler
: handlesftp:
URLs. Not necessary for HTTP calls. -
FileHandler
: handlesfile:
URLs, and poses security risks. Rarely should this be necessary, if ever. -
HTTPErrorProcessor
: The final response handler, raising any non-20x (OK) responses -
DataHandler
: handlesdata:
URLs. Hard to imagine why this would be necessary in normal use, and could pose potential security risks with user input.
As may be apparent, several of the above are rarely necessary, if ever, for HTTP API work. Instead, I recommend this list as a happy medium between security and usability:
ProxyHandler
UnknownHandler
HTTPHandler
HTTPDefaultErrorHandler
HTTPRedirectHandler
HTTPSHandler
HTTPErrorProcessor
Of the above, ProxyHandler
could possibly be removed if you know you don't need it, and even HTTPHandler
could be removed if you know that only https:
URLs will be called. Actually, this is a pretty good combination: the point of https:
is to ensure that nothing is intercepting the connection, and that there are no proxies. So a most-secure list is the same as what is in the example code above:
UnknownHandler
HTTPDefaultErrorHandler
HTTPRedirectHandler
HTTPSHandler
HTTPErrorProcessor
Five handlers, not the original nine.
Flexibility
The use cases for custom OpenerDirector
instances go beyond just security and simplicity. By subclassing BaseHandler
then adding custom status handling methods with names like http_error_401
, you create your own handlers that then can be appended to the handler list. These can be used for authorization, retry cadence, and other goals.
Curious how these suggestions work for you! Feel free to suggest improvements and share experiences in the comments.
Top comments (0)