🚁 Tutorial Update
Article updated on 18th November 2020 because of macOS Catalina system update and identified virtualenv
installation issue.
Isolated Scrapy Installation
This article was a spark of bewilderment when I found out I can install Scrapy into a separate environment and keep the whole stuff separated from the system. Just simply, keep it isolated from your macOS system. For this purpose, we will use virtualenv. Don't worry, you'll pick up this concept pretty quickly, just keep going.
Briefly What Is Scrapy About
Scrapy allows you to write custom functions for your crawling spider. Spider than can process (scrape) data for example from websites you want to in the meaning of collecting data, removing data, and saving data to a database or other filetype you want to be known as CSV, XML, or JSON. Let's jump to it.
Install Homebrew
Let's start with the installation of Homebrew. If you're not sure if Homebrew is on your system just check it with brew -v
command and check it if working properly with brew doctor
command.
brew -v
brew doctor
In case no reply after one or another command will happen it's signal is not on your system. You should receive brew version info after brew -v
or after brew doctor
message like: "Your system is ready to brew."
If none of these appears just head to the website https://brew.sh/ copy main command and paste that in a macOS terminal. The whole command you gonna paste into your macOS terminal looks like this one hereunder.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
Install Python3
After this step system may ask for restart and installation of additional updates. This is highly recommended as I found out problems to install virtualenvs
without it. Now go ahead with python3 installation.
brew install python3
Check if python3 is installed and for what version you already have on your system just type this command. Notice it needs to be a capital letter -V
so the whole command is:
python3 -V
Install Virtualenv
Now install the virtual environment into your system. We'll use virtualenvwrapper because with simple pip3 install virtualenv
command you may facing issues after macOS Catalina update.
pip3 install virtualenv virtualenvwrapper
Then after type this command to edit zshrc
file.
nano ~/.zshrc
Nothing special, we'll just simply add this info into the file to specify info about virtualenvs.
# Configuration for virtualenv
WORKON_HOME="${HOME}/.virtualenvs"
export WORKON_HOME
Check if the virtual environment is now correctly installed on the system and look for version info.
virtualenv --version
You should receive a message into your terminal prompt like this.
virtualenv 20.1.0 from /usr/local/lib/python3.9/site-packages/virtualenv/__init__.p
Create Working Directories and Activating Virtualenv
Now we are going to create a working directory and enter to it to activate virtualenv
and fill it with libraries we need to as Scrapy and iPython shell (it will make syntax in Scrapy shell more readable in other words "beautifully colorful"). Use these commands one by one.
Recognize during this tutorial Virtualenvs with capital V stays for the directory (you can name this main directory whatever you want to) and virtualenv
stays for command.
mkdir Virtualenvs
cd Virtualenvs
virtualenv scrapyenv
source scrapyenv/bin/activate
The last command will activate the virtual environment named scrapyenv (you can name it whatever you want to as well). You will recognize it in the macOS terminal as your command line will start with (scrapyenv) and then with the user login username. After it follows with the directory name you are already in. In this case it's Virtualenvs. It looks like this example hereunder.
(scrapyenv) yourusername@123 Virtualenvs %
Important thing about repetitive activation of virtual environment scrapyenv is you need to be inside Virtualenvs directory so cd Virtualenvs
command is important to do because source scrapyenv/bin/activate
command won't work for you in any other directory. Simple scrapyenv is environment directory inside Virtualenvs directory, so enter into Virtualenvs before.
Now you can get rid of the fear of the unfamiliar. Our environment will make a sort of discipline to install the main libraries.
Install Scrapy Into Environment
Let's install Scrapy using pip
command into already activated environment.
pip3 install scrapy
There'll start the quite obsessive downloading process of multiple libraries. You can then check the version of Scrapy with this command.
scrapy -V
It will reward you at the beginning of output with version info and notification that there is no active project at this moment.
Scrapy 2.4.1 - no active project
Install Ipython Into Environment
And let's add iPython for making Scrapy shell looks more friendly to us.
pip3 install ipython
Check Both Scrapy And Ipython
The installation will stay there in your activated environment named as scrapyenv
and will stay there even after deactivation. You can check if things are there simple by checking directories or with commands if it's all installed correctly for both Scrapy and iPython. Let's check if Scrapy is fine. Type python
then import scrapy
. To check Scrapy module type scrapy
.
python
>>> import scrapy
>>> scrapy
You will receive a response where the Scrapy module is located like this.
<module 'scrapy' from '/Users/yourusername/Desktop/Virtualenvs/scrapyenv/lib/python3.9/site-packages/scrapy/__init__.py'>
Exit python command line with exit()
To check if iPython installation is fine just paste these two commands into a terminal. Start with ipython
and then import this
. You will receive a beautiful poem "The Zen of Python" from Tim Peters into your terminal prompt.
ipython
In [1]: import this
And exit it with same command exit()
.
Deactivation And Activation Of Virtualenv
Now that’s finished, the dream, the passion, the someday playground you have wished for your Scrapy inside the environment is here. You can now deactivate environment by simple command deactivate
or by entering Virtualenvs directory and activating with source scrapyenv/bin/activate
again.
deactivate
source scrapyenv/bin/activate
Ready To Start Scrapy Shell
Now you are ready to play around with your isolated Scrapy installation. You can start your Scrapy shell simple with the command scrapy shell
.
scrapy shell
First Scrapy Command To Fetch URL
And fetch your very first URL as a test if crawling with Scrapy works fine.
fetch("https://dev.to")
This simple command will come with a reply to the command prompt that everything is fine and the website is giving you server response 200.
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://dev.to> (referer: None)
Hope you found this installation introduction helpful. If you have any question, feel free to leave a comment or send me message here, so we can discuss. Happy Scraping.
Thanks to Eric Krull for the cover image from Unsplash.
Top comments (1)
I'd just like to personally thank you for this clear and concise tutorial. It was exactly what I was searching for. I hope that I can do that for some coder someday.