DEV Community

Cover image for Scrape E-Commerce Site using BeautifulSoup and store Data in csv file.
Saish JagTap
Saish JagTap

Posted on

Scrape E-Commerce Site using BeautifulSoup and store Data in csv file.

Scrape Flipkart website using beautifulsoup and store data in csv file.

IDE Used: PyCharm

First install all packages one by one

pip install bs4
pip install requests
pip install pandas
Enter fullscreen mode Exit fullscreen mode

Import all modules

import requests
from bs4 import BeautifulSoup
import pandas as pd
Enter fullscreen mode Exit fullscreen mode

requests to send HTTP request.
BeautifulSoup extract all content from web page (title, etc.)
pandas to make csv file.

Step 1: Loading web page with 'request'

request module allows to send HTTP requests. The HTTP request returns a Response object with all the response data (content, encoding, etc.)

url = "https://www.flipkart.com/search?q=laptop&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"

req = requests.get(url)
Enter fullscreen mode Exit fullscreen mode

Step 2: Extracting title with BeautifulSoup

BeautifulSoup provides lot of simple methods for navigating, searching, and modifying a DOM tree.

Extract the title

page = BeautifulSoup(req.content, "html.parser")
print(page.prettify())

title = page.head
print(title.text)
Enter fullscreen mode Exit fullscreen mode

page.prittify() print all the content of page into a nicely format.

prittify

title.text print title of the page.

title

print without calling .text it will print full markup.

title

all_products = []
Enter fullscreen mode Exit fullscreen mode

Add all extract data in this all_products list. Print this list at the end.

Step 3: Start scrapping

products = page.findAll("div", {"class": "_3pLy-c row"})
Enter fullscreen mode Exit fullscreen mode

findAll is used for returning all the matches after scanning the entire document.

findAll(tag, attributes, recursive, text, limit, keywords)

In our case find all div tags with class name "_3pLy-c row"

.select() it returns a list of elements. The select() method to locate all elements of a particular CSS class. It find elements by Attr., ID, Class, etc.

.strip() method use to remove any extra newlines/whitespaces.

Step 4: Extract Name and Price of products

for product in products:
    lname = product.select("div > div._4rR01T")[0].text.strip()
    print(lname)
    price = product.select("div > div._30jeq3 ")[0].text.strip()
    print(lprice)

    all_products.append([lname, lprice])

print("Record Inserted Successfully...")
Enter fullscreen mode Exit fullscreen mode

all_products.append append Name and Price in all_products list.

all_products

print(all_products)
Enter fullscreen mode Exit fullscreen mode

img

Step 5: Add Data in CSV file.

col = ["Name", "Price"]
Enter fullscreen mode Exit fullscreen mode

First create list for column name.

data = pd.DataFrame(all_products, columns=col)
print(data)
Enter fullscreen mode Exit fullscreen mode

DataFrame It is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

csv

data.to_csv('extract_data.csv', index=False)
Enter fullscreen mode Exit fullscreen mode

to_csv used to export pandas DataFrame to csv file.

img

Code

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.flipkart.com/search?q=laptop&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off'
req = requests.get(url)

page = BeautifulSoup(req.content, "html.parser")
print(page.prettify())

title = page.head
print(title.text)

all_products = []
col = ['Name', 'Price']

products = page.findAll("div", {"class": "_3pLy-c row"})
print(products)

for product in products:
    lname = product.select("div > div._4rR01T")[0].text.strip()
    print(lname)
    price = product.select("div > div._30jeq3 ")[0].text.strip()
    print(lprice)

    all_products.append([lname, lprice])

print("Record Inserted Successfully...")

data = pd.DataFrame(all_products, columns=col)
print(data)
data.to_csv('laptop_products.csv', index=False)
Enter fullscreen mode Exit fullscreen mode

GitHub

Thank You ๐Ÿ˜Š

Top comments (0)