Selenium is an effective tool for web scraping with Python. If you're running it on a Mac, however, there may be extra set up issues involved
I’ve been on a web scraping binge for a couple of weeks and thought I might try using Selenium, a framework that allows for web scraping and testing automation.
Many of the tutorials I found used Windows computers. Up until now, I’ve only had to make a few minor adjustments for working on a Mac, but Selenium and Chromedriver proved to divert from the Windows tutorials quite a bit!
This tutorial assumes you've set up Python on your computer. I am running Catalina 10.15.3.
Feel free to try this out next time you want to run Selenium and the Chrome Webdriver on your Mac.
But first, a warning:
Web scraping is the process of extracting data from websites and other sources using scripts. Some websites prohibit web scraping, or ask that you receive written permission beforehand. I am using my own personal website for this tutorial. I also suggest, if you’re just starting out with web scraping, to use websites that are simply built, or ones that you coded yourself. Looking for a safe place to get started? Try this fictional bookstore website that's just begging to be scraped.
Download Selenium
In order start, you'll need to write some Python code and download Selenium.
First, create a directory and paste the following code into a file. I refer to this file as tutorial.py later in the tutorial.
from selenium import webdriver
browser = webdriver.Chrome()
browser.get(‘your website here’)
After doing so, you’ll need to download Selenium from the terminal you’re working in, or the terminal you intend to run your Python script from.
In order to download Selenium, type the following code:
pip install selenium
Easy enough, right?
Download Chromedriver
Navigate to the Chromedriver download site.
Make sure you download the correct driver for your Chrome version. You can find your Chrome version by hitting the vertical ellipsis (⋮) at the top right corner. From there go to Help, and open About Google Chrome.
Once you've located your Chrome version, pay attention to the first two numbers.
Go back to the downloads page, and download the same number of your Chrome version. If you're using an older version, you may need to scroll down.
In my experience the last two or so numbers did not matter, but the first ones (83 in this case) definitely did.
Put Chromedriver in /usr/local/bin
Once downloaded, find the Chromedriver in your downloads folder.
This next part can be a bit tricky. I chose to use the GUI to assist me with this.
Open your finder.
Press Command+Shift+G
Write the following:
/usr/local/bin
in the search bar that pops up.Drag and drop your Chromedriver into the bin folder.
Go back to your project directory, and feel free to execute the script from your terminal.
python tutorial.py
Did it work?
If yes, you’re done.
If not, never fear!
Take Chromedriver out of quarantine
Like many of us in 2020, your driver may be stuck in quarantine. When you try to run it, your Mac may throw a fit and tell you that the file you want to run is dangerous because it can’t verify the developer.
To resolve this, you’ll need to be in the bin directory where our Chromedriver is stored. From your root directory cd into /usr/local.
/usr/local
Then write:
bin
Once there, type the following command:
-d com.apple.quarantine chromedriver
This should take the driver out of quarantine.
Run it again
Now, from your project directory, try $
again. If you get the screen telling you "Chrome is being automated by test software” then it worked! Hooray!
python tutorial.py
Summary
Although seemingly more popular on Windows, Selenium can be used for Macs as well. Now that you've downloaded Selenium and Chromedriver on your Mac (and they're working how they should), you're ready to get scraping!
Top comments (2)
Doesn't work 4 me tried the brew, pip manually rebooted,etc.etc. and nothing works always a fail message about the chromedriver my chrome is 103 tried 102 , 103,104 same error... Has this thing ever worked?
This is great! Thank you. I've been looking at getting selenium to work. I had to make sure I used python3 tutorial.py because of some alias/symlinking not done properly I guess?