Contents: intro, imports, what will be scraped, process, code, links, outro.
Intro
This blog post is a continuation of Bing's web scraping series. Here will be shown how to scrape Related Questions from Bing search results using Python.
Imports
from bs4 import BeautifulSoup
import requests
import lxml
from serpapi import GoogleSearch
import os # for creating environment variable
What will be scraped
Process
Everything below was done using SelectorGadget Chrome extension.
Selecting container CSS
selector with needed data
Selecting question CSS
selector
Selecting snippet CSS
selector
Selecting title URL CSS
selector
Selecting displayed URL CSS
selector
Code
from bs4 import BeautifulSoup
import requests, lxml
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
html = requests.get('https://www.bing.com/search?q=lion king&hl=en', headers=headers)
soup = BeautifulSoup(html.content, 'lxml')
for related_question in soup.select('#relatedQnAListDisplay .df_topAlAs'):
question = related_question.select_one('.b_1linetrunc').text
snippet = related_question.select_one('.rwrl_padref').text
title = related_question.select_one('#relatedQnAListDisplay .b_algo p').text
link = related_question.select_one('#relatedQnAListDisplay .b_algo a')['href']
displayed_link = related_question.select_one('#relatedQnAListDisplay cite').text
print(f'{question}\n{snippet}\n{title}\n{link}\n{displayed_link}\n')
# part of the output:
'''
What kind of game is The Lion King?
Jump on top of giraffe’s head and eat bugs in this awesome classic platformer game. The Lion King is a classic 1994 platformer video game based on the multi-award winning animated film of the same name. The game takes place after the death of Simba’s father where Simba was told a lie and forced to hide.
The Lion King - Play Game Online - ArcadeSpot.com
https://arcadespot.com/game/the-lion-king/
arcadespot.com/game/the-lion-king/
'''
Using Bing Related Questions API
SerpApi is a paid API with a free trial of 5,000 searches.
from serpapi import GoogleSearch
params = {
"api_key": "YOUR_API_KEY",
"engine": "bing",
"q": "lion king"
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results['related_questions']:
question = result['question']
snippet = result['snippet']
title = result['title']
link = result['link']
displayed_link = result['displayed_link']
print(f'{question}\n{title}\n{link}\n{displayed_link}\n{snippet}\n')
# part of the output:
'''
Is the Lion King a circle of life?
Disney THE LION KING | Award-Winning Best Musical
https://www.lionking.com/
www.lionking.com/
Circle of Life in 360 - Experience THE LION KING like never before - WATCH IT NOW Quite Simply, Stunning. -TimeOut New York A Deeply Felt Celebration of Life.
'''
Links
Code in the online IDE • Bing Related Questions API
Outro
If you have any questions or something isn't working correctly or you want to write something else, feel free to drop a comment in the comment section or via Twitter at @serp_api.
Yours,
Dimitry, and the rest of SerpApi Team.
Top comments (0)