Scrape Flipkart website using beautifulsoup and store data in csv file.
IDE Used: PyCharm
First install all packages one by one
pip install bs4
pip install requests
pip install pandas
Import all modules
import requests
from bs4 import BeautifulSoup
import pandas as pd
requests
to send HTTP request.
BeautifulSoup
extract all content from web page (title, etc.)
pandas
to make csv file.
Step 1: Loading web page with 'request'
request
module allows to send HTTP requests. The HTTP request returns a Response object with all the response data (content, encoding, etc.)
url = "https://www.flipkart.com/search?q=laptop&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"
req = requests.get(url)
Step 2: Extracting title with BeautifulSoup
BeautifulSoup provides lot of simple methods for navigating, searching, and modifying a DOM tree.
Extract the title
page = BeautifulSoup(req.content, "html.parser")
print(page.prettify())
title = page.head
print(title.text)
page.prittify()
print all the content of page into a nicely format.
title.text
print title of the page.
print without calling .text
it will print full markup.
all_products = []
Add all extract data in this all_products
list. Print this list at the end.
Step 3: Start scrapping
products = page.findAll("div", {"class": "_3pLy-c row"})
findAll
is used for returning all the matches after scanning the entire document.
findAll(tag, attributes, recursive, text, limit, keywords)
In our case find all div
tags with class
name "_3pLy-c row"
.select()
it returns a list of elements. The select()
method to locate all elements of a particular CSS
class. It find elements by Attr., ID, Class, etc.
.strip()
method use to remove any extra newlines/whitespaces.
Step 4: Extract Name and Price of products
for product in products:
lname = product.select("div > div._4rR01T")[0].text.strip()
print(lname)
price = product.select("div > div._30jeq3 ")[0].text.strip()
print(lprice)
all_products.append([lname, lprice])
print("Record Inserted Successfully...")
all_products.append
append Name
and Price
in all_products
list.
print(all_products)
Step 5: Add Data in CSV file.
col = ["Name", "Price"]
First create list for column name.
data = pd.DataFrame(all_products, columns=col)
print(data)
DataFrame
It is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.
data.to_csv('extract_data.csv', index=False)
to_csv
used to export pandas DataFrame
to csv file.
Code
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.flipkart.com/search?q=laptop&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off'
req = requests.get(url)
page = BeautifulSoup(req.content, "html.parser")
print(page.prettify())
title = page.head
print(title.text)
all_products = []
col = ['Name', 'Price']
products = page.findAll("div", {"class": "_3pLy-c row"})
print(products)
for product in products:
lname = product.select("div > div._4rR01T")[0].text.strip()
print(lname)
price = product.select("div > div._30jeq3 ")[0].text.strip()
print(lprice)
all_products.append([lname, lprice])
print("Record Inserted Successfully...")
data = pd.DataFrame(all_products, columns=col)
print(data)
data.to_csv('laptop_products.csv', index=False)
Thank You ๐
Top comments (0)