mitmproxy is a very versatile and powerful tool aiming at acting exactly like what its name implies: act as a man in the middle by deciphering TLS connections between a browser and a target website.
By defining its own Certificate Authority, and because you explicitly tells your browser to trust this brand new authority, you can allow mitmproxy to snoop the traffic between your browser and the remote website, even when this connection is made through TLS. Furthermore, using its powerful addon system, you can even tamper the stream of data and for example save resources automatically, deny some to be downloaded, modify headers, delete cookies etc. Those scripts are simple Python scripts, which use the dedicated mitmproxy Python API.
This gave me the idea to develop a straightforward method to implement a tiny personal firewall, to get rid of all those pesky ads resources, like those digital marketing services. Using one of the 3 mitmproxy commands, you can get an idea of how many of those resources are downloaded when you head to any website, as you can using the Chrome developer menus.
Beware that when started, mitmproxy can see and intercept all the https traffic between your browser and the remote site. You should therefore take all necessary security actions to prevent anyone other than you to use your configuration.
So let's elaborate a little bit.
First steps
first, download the mitmproxy executables for your platform. I personally use Linux, so I've just downloaded the archive and extracted the 3 executables into a dedicated directory. For a difference between those, go to https://docs.mitmproxy.org/stable/
next, start your proxy with no parameters for the moment:
./mitmdump
which by default is listening to the 8080 TCP port
start your Chrome browser and tells it to use mitmproxy:
google-chrome --proxy-server="http://127.0.0.1:8080"
head to http://mitm.it and follow the instructions. For Linux, I just clicked Other and the CA certifcate, generated by mitmproxy when it first runs, is downloaded:
mitmproxy-ca-cert.pem
now, type chrome://settings/certificates, click on Authorities, Import and select the CA file you've been downloading. The CA certificate is now installed.
restart Chrome with the same command line arguments, and voilà !
Note: I didn't try to install the CA file for other browsers, but it seems a little bit trickier.
Tampering data
Addons are simple Python scripts used to just get or even modify https traffic. A list of addon samples is located here: https://docs.mitmproxy.org/stable/addons-examples/
This sample addon is used to block resources, depending on an URL regex list. It can be enhanced for sure:
"""
Block URLs matching a regex, by just returning an HTTP 404 code. As addons can be called with an argument,
the file containing the URLs is hardcoded, but could be extracted from an environment variable for example.
Unfortunately in Python, contrary to Rust, you can't define a regex set and try to match any regex for a string.
"""
import re
from mitmproxy import http
from mitmproxy import ctx
class BlockResource:
def __init__(self):
# define a new list for holding all compiled regexes. Compilation is done once when the addon
# is loaded
self.urls = []
# read the configuration file having all string regexes
for re_url in open('urls.txt'):
self.urls.append(re.compile(re_url.strip()))
# log how many URLS we have read
ctx.log.info(f"{len(self.urls)} urls read")
def response(self, flow):
# test if the request URL is matching any of the regexes
if any(re.search(url, flow.request.url) for url in self.urls):
ctx.log.info(f"found match for {flow.request.url}")
flow.response = http.HTTPResponse.make(404)
addons = [
BlockResource()
]
And a sample for urls.txt:
scorecardresearch\.com
geo\.yahoo\.com
adversting\.
ads\.
\.taboola\.
\.doubleclick\.
\.xiti\.
criteo
\.pubmatic\.
\.chartbeat\.net
\.twimg\.com
\.keywee\.co
\.\w+adserver\.com
outbrain
\.bouncex\.
Now, just start mitmproxy with the -s parameter:
./mitmdump -s ./block-urls.py
For sure, for lots of URL regexes, this might be sub-optimal, because Python has to try to match all regexes up to the one which matches.
Kudos to the mitmproxy team for this awesome piece of code !
Hope this helps!
Photo by Blaz Erzetic on Unsplash
Top comments (2)
Hiie Alain,
Great, Your Article is Simple, Sweet, easy to read an understand and to point.
I totally love the mitmproxy and any other solution that provides such filtering
But we can use DNS Filtering as this requires minimal setup, Few Personal View Points:
Yes There are Few Disadvantages to this Approach
Below are some Free & Open Source Solutions for This.
Setting the DNS Server IP as the System DNS Servers and then the Filtering Starts.
Enjoy!!!!!!
They all have a dashboard, you can then add all your Regex over there and things should work awesome.
Hi Ashish,
Thanks for your comment. Yes I tried Pihole with a spare Raspberry PI, it worked great.