Data is vital for informed decision making because it provides reliable insights. However, we need to source the data before we can use it for analysis. Web scraping is a one method that can be used to leverage the large amount of data available on the web by extracting specific information from a website using selectors.
For a better understanding of selectors you can check this resource.
Selenium
Selenium is a tool used for automating web browsers and functions on various browsers, OS and can be used in different languages.
For this project, I will use selenium and the Chrome browser.
You need to have the chrome web driver which can be downloaded here depending on your chrome version.
We are going to scrape PURPINK which is a Kenyan online gift shop.
Scraping
Import selenium for browser automation and for locating elements on the website.
Import Options to run chrome in headless mode meaning you can launch the browser without creating a browser window.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
Specify the path of the chrome webdriver and the website URL.
options = Options()
options.add_argument("--headless") # prevents opening browser window
options = options
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.purpink.co.ke/collections/her")
Inspect elements on the website to retrieve the name and price of each product and write the results to a csv file.
product_names = driver.find_elements_by_class_name("product-thumbnail__title")
product_prices = driver.find_elements_by_class_name("money")
filename = "purpink.csv"
headers = ("Brand,Price(Ksh) \n")
f = open(filename, "w")
f.write(headers)
for (product, price) in zip(product_names, product_prices):
firstPrice = price.text.strip("KSh").split(",")
finalPrice = "".join(firstPrice)
f.write(product.text + "," + finalPrice + "\n")
The whole project can be achieved by
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless") # prevents opening browser window
options = options
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.purpink.co.ke/collections/her")
product_names = driver.find_elements_by_class_name("product-thumbnail__title")
product_prices = driver.find_elements_by_class_name("money")
filename = "purpink.csv"
headers = ("Brand,Price(Ksh) \n")
f = open(filename, "w")
f.write(headers)
for (product, price) in zip(product_names, product_prices):
firstPrice = price.text.strip("KSh").split(",")
finalPrice = "".join(firstPrice)
f.write(product.text + "," + finalPrice + "\n")
The code and csv results can be found on my github.
Top comments (0)