Contents: intro, imports, what will be scraped, process, code, links, outro.
Intro
This blog post is a continuation of Google's web scraping series. Here you'll see examples of how you can scrape Inline Videos from Google Search using Python using beautifulsoup
, requests
and lxml
libraries. An alternative API solution will be shown.
Imports
import requests, lxml
from bs4 import BeautifulSoup
from serpapi import GoogleSearch
What will be scraped
Process
Selecting Container. Link lays directly in the container under href
attribute.
Selecting Title, Channel name, Platform, Date, Duration CSS
selectors.
Code
import requests, lxml
from bs4 import BeautifulSoup
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
response = requests.get("https://www.google.com/search?q=the last of us 2 reviews", headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
for result in soup.select('.WpKAof'):
title = result.select_one('.p5AXld').text
link = result['href']
channel = result.select_one('.YnLDzf').text.replace(' · ', '')
video_platform = result.select_one('.hDeAhf').text
date = result.select_one('.rjmdhd span').text
duration = result.select_one('.MyDQSe span').text
print(f'{title}\n{link}\n{video_platform}\n{channel}\n{date}\n{duration}\n')
---------------
'''
The Last of Us 2 Review
https://www.youtube.com/watch?v=QwreMeXlFoY
YouTube
IGN
Jun 12, 2020
8:01
'''
Using Google Inline Videos API
SerpApi is a paid API that provides a free trial of 5,000 searches.
The main differences is you don't have to maintain the parser, e.g. if layout/selectors is changed there's no need for debugging since it already done for the end-user, because at times it could annoying...
import json # used for pretty print output
from serpapi import GoogleSearch
params = {
"api_key": "YOUR_API_KEY",
"engine": "google",
"q": "the last of us 2 review",
"gl": "us",
"hl": "en"
}
search = GoogleSearch(params)
results = search.get_dict()
for results in results['inline_videos']:
print(json.dumps(results, indent=2, ensure_ascii=False))
--------------------
'''
{
"position": 1,
"title": "The Last of Us 2 Review",
"link": "https://www.youtube.com/watch?v=QwreMeXlFoY",
"thumbnail": "https://serpapi.com/searches/60e144a7d737d7a357e568fc/images/b8492386da38ba88cc43d7cb6b9076998ce8d724281cad47c9ee2d1516f61052.jpeg",
"channel": "IGN",
"duration": "8:01",
"platform": "YouTube",
"date": "Jun 12, 2020"
}
...
'''
Links
Code in the online IDE • Google Inline Videos API
Outro
If you have any questions or something isn't working correctly or you want to write something else, feel free to drop a comment in the comment section or via Twitter at @serp_api.
Yours,
Dimitry, and the rest of SerpApi Team.
Top comments (0)