Intro
In this blog post, we'll go through the process of extracting data from Google Maps Locals results using Python. You can look at the complete code in the online IDE (Replit).
If you prefer video format, we have a dedicated video that shows how to do that: Web Scraping Google Maps Local Results with Python and SerpApi.
What will be scraped
Why using API?
There're a couple of reasons that may use API, ours in particular:
- No need to create a parser from scratch and maintain it.
- Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
- Pay for proxies, and CAPTCHA solvers.
- Don't need to use browser automation.
SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.
Full Code
If you just need to extract all available data about the place, then we can create an empty list
and then extend
extracted data to it:
from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import os, json
params = {
# https://docs.python.org/3/library/os.html#os.getenv
'api_key': os.getenv('API_KEY'), # your serpapi api
'engine': 'google_maps', # SerpApi search engine
'q': 'coffee', # query
'll': '@40.7455096,-74.0083012,15.1z', # GPS coordinates
'type': 'search', # list of results for the query
'hl': 'en', # language
'start': 0, # pagination
}
search = GoogleSearch(params) # where data extraction happens on the backend
local_results = []
# pagination
while True:
results = search.get_dict() # JSON -> Python dict
# title = results['local_results']['title']
if 'next' in results.get('serpapi_pagination', {}):
search.params_dict.update(dict(parse_qsl(urlsplit(results.get('serpapi_pagination', {}).get('next')).query)))
else:
break
local_results.extend(results['local_results'])
print(json.dumps(local_results, indent=2, ensure_ascii=False))
Preparation
Install library:
pip install google-search-results
google-search-results
is a SerpApi API package.
Code Explanation
Import libraries:
from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import os, json
Library | Purpose |
---|---|
GoogleSearch |
to scrape and parse Google results using SerpApi web scraping library. |
urlsplit |
this should generally be used instead of urlparse() if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted. |
parse_qsl |
to parse a query string given as a string argument. |
os |
to return environment variable (SerpApi API key) value. |
json |
to convert extracted data to a JSON object. |
At the beginning of the code, parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params
dictionary:
params = {
# https://docs.python.org/3/library/os.html#os.getenv
'api_key': os.getenv('API_KEY'), # your serpapi api
'engine': 'google_maps', # SerpApi search engine
'q': 'coffee', # query
'll': '@40.7455096,-74.0083012,15.1z', # GPS coordinates
'type': 'search', # list of results for the query
'hl': 'en', # language
'start': 0, # pagination
}
Parameters | Explanation |
---|---|
api_key |
Parameter defines the SerpApi private key to use. |
engine |
Set parameter to google_maps to use the Google Maps API engine. |
q |
Parameter defines the query you want to search. You can use anything that you would use in a regular Google Maps search. |
ll |
Parameter defines GPS coordinates of location where you want your q (query) to be applied. |
type |
Parameter defines the type of search you want to make. search - returns a list of results for the set q parameter. |
hl |
Parameter defines the language to use for the Google Maps search. It's a two-letter language code. (e.g., en for English, es for Spanish, or fr for French). Head to the Google languages page for a full list of supported Google languages. |
start |
Parameter defines the result offset. It skips the given number of results. It's used for pagination. (e.g., 0 (default) is the first page of results, 20 is the 2nd page of results, 40 is the 3rd page of results, etc.). |
📌Note: The q
and ll
parameters are only required if type
is set to search
.
Additionally, I want to talk about the value that is written to the ll
parameter. The value must be built in the following sequence:
@ + latitude + , + longitude + , + zoom
This will form a string that looks like this:
@40.7455096,-74.0083012,15.1z
The zoom
parameter is optional but recommended for higher precision (it ranges from 3z
, map completely zoomed out - to 21z
, map completely zoomed in).
You can find the value of the ll
parameter in the URL for the local results you need: https://www.google.com/maps/search/coffee/@40.7455096,-74.0083012,15z?hl=en
Then, we create a search
object where the data is retrieved from the SerpApi backend:
search = GoogleSearch(params) # where data extraction happens on the backend
Declaring the local_results
list where the extracted data will be added:
local_results = []
Need to extract all local results considering pagination.
In the while
loop, we get data from JSON and write it to the results
dictionary:
while True:
results = search.get_dict() # JSON -> Python dict
Then we check if the next page exists. If so, then we update the JSON data in the search
object for the next page. Else, we break
the loop:
if 'next' in results.get('serpapi_pagination', {}):
search.params_dict.update(dict(parse_qsl(urlsplit(results.get('serpapi_pagination', {}).get('next')).query)))
else:
break
Expanding the local_results
list with new data from this page:
local_results.extend(results['local_results'])
After the all data is retrieved, it is output in JSON format:
print(json.dumps(local_results, indent=2, ensure_ascii=False))
Output
[
{
"position": 1,
"title": "Stumptown Coffee Roasters",
"place_id": "ChIJT2h1HKZZwokR0kgzEtsa03k",
"data_id": "0x89c259a61c75684f:0x79d31adb123348d2",
"data_cid": "8778389626880739538",
"reviews_link": "https://serpapi.com/search.json?data_id=0x89c259a61c75684f%3A0x79d31adb123348d2&engine=google_maps_reviews&hl=en",
"photos_link": "https://serpapi.com/search.json?data_id=0x89c259a61c75684f%3A0x79d31adb123348d2&engine=google_maps_photos&hl=en",
"gps_coordinates": {
"latitude": 40.7457399,
"longitude": -73.9882272
},
"place_id_search": "https://serpapi.com/search.json?data=%214m5%213m4%211s0x89c259a61c75684f%3A0x79d31adb123348d2%218m2%213d40.7457399%214d-73.9882272&engine=google_maps&google_domain=google.com&hl=en&start=0&type=place",
"rating": 4.5,
"reviews": 1371,
"price": "$$",
"type": "Coffee shop",
"address": "18 W 29th St, New York, NY 10001",
"open_state": "Closed ⋅ Opens 6:30AM Fri",
"hours": "Closed ⋅ Opens 6:30AM Fri",
"operating_hours": {
"thursday": "7AM–2PM",
"friday": "6:30AM–5PM",
"saturday": "7AM–5PM",
"sunday": "7AM–5PM",
"monday": "6:30AM–5PM",
"tuesday": "6:30AM–5PM",
"wednesday": "6:30AM–5PM"
},
"phone": "(855) 711-3385",
"website": "https://www.stumptowntogo.com/",
"description": "Coffee bar serving direct-trade java. Coffee bar chain offering house-roasted direct-trade coffee, along with brewing gear & whole beans",
"service_options": {
"dine_in": true,
"takeout": true,
"delivery": false
},
"thumbnail": "https://lh5.googleusercontent.com/p/AF1QipMg90Zc0FekBv9vaTYG4nVOf_RoSYAhVklxGxmg=w80-h106-k-no"
},
... other local results
]
📌Note: You can view playground or check the output. This way you will be able to understand what keys you can use in this JSON structure to get the data you need.
Links
Add a Feature Request💫 or a Bug🐞
Top comments (0)