DEV Community

Artur Chukhrai for SerpApi

Posted on • Edited on • Originally published at serpapi.com

Scraping Apple App Store Search with Python

What will be scraped

wwbs-apple-app-store-search

📌Note: In this blog post, I will show you how to scrape the Apple App Store search and get exactly the same result as on Apple iMac, because the search results on Mac are completely different from the results on PC. The screenshots below show the difference:

  • Mac results:

mac-results

  • PC results:

pc-results

Why using API?

There're a couple of reasons that may use API, ours in particular:

  • No need to create a parser from scratch and maintain it.
  • Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
  • Pay for proxies, and CAPTCHA solvers.
  • Don't need to use browser automation.

SerpApi handles everything on the backend with fast response times under ~2.6 seconds (~0.6 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.

serpapi-status

Full Code

If you don't need an explanation, have a look at the full code example in the online IDE.

from serpapi import GoogleSearch
import json

params = {
    'api_key': '...',               # https://serpapi.com/manage-api-key
    'engine': 'apple_app_store',    # SerpApi search engine 
    'term': 'image viewer',         # search query
    'device': 'desktop',            # device to get the results
    'country': 'us',                # country for the search
    'lang': 'en-us',                # language for the search
    'disallow_explicit': False,     # disallowing explicit apps
    'num': 20,                      # number of items per page
    'page': 0,                      # pagination
    # 'property': 'developer'       # developer of an app
}

app_store_results = []

while True:
    search = GoogleSearch(params)            # data extraction on the SerpApi backend
    new_page_results = search.get_dict()     # JSON -> Python dict

    app_store_results.extend(new_page_results['organic_results'])

    if 'next' in new_page_results.get('serpapi_pagination', {}):
        params['page'] += 1
    else:
        break

print(json.dumps(app_store_results, indent=2, ensure_ascii=False))
Enter fullscreen mode Exit fullscreen mode

Preparation

Install library:

pip install google-search-results
Enter fullscreen mode Exit fullscreen mode

google-search-results is a SerpApi API package.

Code Explanation

Import libraries:

from serpapi import GoogleSearch
import json
Enter fullscreen mode Exit fullscreen mode
Library Purpose
GoogleSearch to scrape and parse Google results using SerpApi web scraping library.
json to convert extracted data to a JSON object.

The parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params dictionary:

params = {
    'api_key': '...',               # https://serpapi.com/manage-api-key
    'engine': 'apple_app_store',    # SerpApi search engine 
    'term': 'image viewer',         # search query
    'device': 'desktop',            # device to get the results
    'country': 'us',                # country for the search
    'lang': 'en-us',                # language for the search
    'disallow_explicit': False,     # disallowing explicit apps
    'num': 20,                      # number of items per page
    'page': 0,                      # pagination
    # 'property': 'developer'       # developer of an app
}
Enter fullscreen mode Exit fullscreen mode
Parameters Explanation
api_key Parameter defines the SerpApi private key to use. You can find it under your account -> API key.
engine Set parameter to apple_app_store to use the App Store API engine.
term Parameter defines the query you want to search. You can use any search term that you would use in a regular App Store search.
device Parameter defines the device to use to get the results. It can be set to desktop to use a Mac App Store, tablet to use a iPad App Store, or mobile (default) to use a iPhone App Store.
country Parameter defines the country to use for the search. It's a two-letter country code. Head to the Apple Regions for a full list of supported Apple Regions.
lang Parameter defines the language to use for the search. It's a four-letter country code. Head to the Apple Languages for a full list of supported Apple Languages.
disallow_explicit Parameter defines the filter for disallowing explicit apps. It defaults to false.
num Parameter defines the number of results you want to get per each page. It defaults to 10. Maximum number of results you can get per page is 200.
page Parameter is used to get the items on a specific page. (e.g., 0 (default) is the first page of results, 1 is the 2nd page of results, 2 is the 3rd page of results, etc.).
property Parameter allows to search the property of an app. developer allows searching the developer title of an app ( e.g., property=developer and term=Coffee gives apps with "Coffee" in their developer's name. (Ex: Coffee Inc.)

📌Note: You can also add other API Parameters.

Define the app_store_results list to which the retrieved data will be added:

app_store_results = []
Enter fullscreen mode Exit fullscreen mode

The while loop is created that is needed to extract data from all pages:

while True:
    # data extraction will be here
Enter fullscreen mode Exit fullscreen mode

Then, we create a search object where the data is retrieved from the SerpApi backend. In the new_page_results dictionary we get data from JSON:

search = GoogleSearch(params)            # data extraction on the SerpApi backend
new_page_results = search.get_dict()     # JSON -> Python dict
Enter fullscreen mode Exit fullscreen mode

Adding new data from this page to the app_store_results list:

app_store_results.extend(new_page_results['organic_results'])

# title = new_page_results['organic_results'][0]['title']
# version = new_page_results['organic_results'][0]['version']
# description = new_page_results['organic_results'][0]['description']
Enter fullscreen mode Exit fullscreen mode

📌Note: In the comments above, I showed how to extract specific fields. You may have noticed the new_page_results['organic_results'][0]. This is the index of a product, which means that we are extracting data from the first product. The new_page_results['organic_results'][1] is from the second product and so on.

After data is retrieved from the current page, a check is made to see if the next page exists. If there is one in the serpapi_pagination dictionary, then the page parameter is incremented by 1. Else, the loop stops.

if 'next' in new_page_results.get('serpapi_pagination', {}):
    params['page'] += 1
else:
    break
Enter fullscreen mode Exit fullscreen mode

After the all data is retrieved, it is output in JSON format:

print(json.dumps(app_store_results, indent=2, ensure_ascii=False))
Enter fullscreen mode Exit fullscreen mode

Output

[
  {
    "position": 1,
    "id": 1507782672,
    "title": "Pixea",
    "bundle_id": "imagetasks.Pixea",
    "version": "2.1",
    "vpp_license": true,
    "age_rating": "4+",
    "release_note": "- New \"Fixed Size and Position\" zoom mode - Fixed a bug causing crash when browsing ZIP-files - Bug fixes and improvements",
    "seller_link": "https://www.imagetasks.com",
    "minimum_os_version": "10.12",
    "description": "Pixea is an image viewer for macOS with a nice minimal modern user interface. Pixea works great with JPEG, HEIC, PSD, RAW, WEBP, PNG, GIF, and many other formats. Provides basic image processing, including flip and rotate, shows a color histogram, EXIF, and other information. Supports keyboard shortcuts and trackpad gestures. Shows images inside archives, without extracting them. Supported formats: JPEG, HEIC, GIF, PNG, TIFF, Photoshop (PSD), BMP, Fax images, macOS and Windows icons, Radiance images, Google's WebP. RAW formats: Leica DNG and RAW, Sony ARW, Olympus ORF, Minolta MRW, Nikon NEF, Fuji RAF, Canon CR2 and CRW, Hasselblad 3FR. Sketch files (preview only). ZIP-archives. Export formats: JPEG, JPEG-2000, PNG, TIFF, BMP. Found a bug? Have a suggestion? Please, send it to support@imagetasks.com Follow us on Twitter @imagetasks!",
    "link": "https://apps.apple.com/us/app/pixea/id1507782672?mt=12&uo=4",
    "serpapi_product_link": "https://serpapi.com/search.json?country=us&engine=apple_product&product_id=1507782672&type=app",
    "serpapi_reviews_link": "https://serpapi.com/search.json?country=us&engine=apple_reviews&page=1&product_id=1507782672",
    "release_date": "2020-04-20 07:00:00 UTC",
    "price": {
      "type": "Free"
    },
    "rating": [
      {
        "type": "All Times",
        "rating": 0.0,
        "count": 0
      }
    ],
    "genres": [
      {
        "name": "Photo & Video",
        "id": 6008,
        "primary": true
      },
      {
        "name": "Graphics & Design",
        "id": 6027,
        "primary": false
      }
    ],
    "developer": {
      "name": "ImageTasks Inc",
      "id": 450316587,
      "link": "https://apps.apple.com/us/developer/id450316587"
    },
    "size_in_bytes": 7113871,
    "supported_languages": [
      "EN"
    ],
    "screenshots": {
      "general": [
        {
          "link": "https://is4-ssl.mzstatic.com/image/thumb/Purple113/v4/e0/21/86/e021868d-b43b-0a78-8d4a-e4e0097a1d01/0131f1c2-3227-46bf-8328-7b147d2b1ea2_Pixea-1.jpg/800x500bb.jpg",
          "size": "800x500"
        },
        {
          "link": "https://is4-ssl.mzstatic.com/image/thumb/Purple113/v4/55/3c/98/553c982d-de30-58b5-3b5a-d6b3b2b6c810/a0424c4d-4346-40e6-8cde-bc79ce690040_Pixea-2.jpg/800x500bb.jpg",
          "size": "800x500"
        },
        {
          "link": "https://is3-ssl.mzstatic.com/image/thumb/Purple123/v4/77/d7/d8/77d7d8c1-4b4c-ba4b-4dde-94bdc59dfb71/6e66509c-5886-45e9-9e96-25154a22fd53_Pixea-3.jpg/800x500bb.jpg",
          "size": "800x500"
        },
        {
          "link": "https://is3-ssl.mzstatic.com/image/thumb/PurpleSource113/v4/44/79/91/447991e0-518f-48b3-bb7e-c7121eb57ba4/79be2791-5b93-4c4d-b4d1-38a3599c2b2d_Pixea-4.jpg/800x500bb.jpg",
          "size": "800x500"
        }
      ]
    },
    "logos": [
      {
        "size": "60x60",
        "link": "https://is3-ssl.mzstatic.com/image/thumb/Purple113/v4/8a/a7/f7/8aa7f75f-620b-74d5-7958-35aa5d851582/AppIcon-0-0-85-220-0-0-0-0-4-0-0-0-2x-sRGB-0-0-0-0-0.png/60x60bb.png"
      },
      {
        "size": "512x512",
        "link": "https://is3-ssl.mzstatic.com/image/thumb/Purple113/v4/8a/a7/f7/8aa7f75f-620b-74d5-7958-35aa5d851582/AppIcon-0-0-85-220-0-0-0-0-4-0-0-0-2x-sRGB-0-0-0-0-0.png/512x512bb.png"
      },
      {
        "size": "100x100",
        "link": "https://is3-ssl.mzstatic.com/image/thumb/Purple113/v4/8a/a7/f7/8aa7f75f-620b-74d5-7958-35aa5d851582/AppIcon-0-0-85-220-0-0-0-0-4-0-0-0-2x-sRGB-0-0-0-0-0.png/100x100bb.png"
      }
    ]
  },
  ... other results
]
Enter fullscreen mode Exit fullscreen mode

Join us on Twitter | YouTube

Add a Feature Request💫 or a Bug🐞

Top comments (0)