Introduction
In this article, you will learn how to scrape g2.com using Bose Framework.
Also, Scraping g2.com is an excellent way to do competitor analysis.
Bose Framework, is a Selenium based Bot Development Framework that provides a comprehensive set of tools and functionalities specifically aimed at making the Bot Development Process easy for Developers.
To make it easy to scrape g2.com, I have prepared a script that you can use to scrape g2 effectively. This article will walk you through the steps of utilizing the script.
Installation
- Clone Starter Template
git clone https://github.com/omkarcloud/g2-scraper
cd g2-scraper
- Install dependencies
python -m pip install -r requirements.txt
Usage
- In
extract_product_links.py
specify yourTask.product_url
- Run Project
python main.py
The script will start running and output progress updates to the console. When the scraper is complete, it will generate a JSON file named pending.json
in the output
directory. The JSON file will contain the product links.
Once the bot is detected by Cloudflare, the script will recognize it and prompt you to press the "Enter" key in the console once you have successfully solved the Cloudflare captcha.
Additionaly, you don't have to configure the Selenium driver as it will automatically download the appropriate driver based on your Chrome browser version.
- In
main.py
changetask
variable tosrc.extract_product_links
- Rerun Project
python main.py
- The products will be extracted and stored in the output/finished.csv and output/finished.json file after scraping.
Top comments (0)