DEV Community

Percival Villalva
Percival Villalva

Posted on • Originally published at blog.apify.com on

ScrapingBee review: top web scraping API?

There are lots of web scraping services out there, but which is the right choice for you? We look at ScrapingBee to see what it offers the dev looking to get data.

Whether you're building an application, conducting market research, or analyzing trends, accessing timely and accurate data is essential. However, identifying the most efficient and reliable methods for obtaining this data can be a daunting task. Should you build your own web scrapers? Use an existing web scraping API? Or go for something in between?

If you've spent some time googling around for an answer to those questions, then you've probably come across ScrapingBee But now a different question emerges. How do I know if this service is right for my use case? Well, thats precisely what we will try to answer in this article. We will review ScrapingBees service and analyze the different kinds of tools that they provide, and the pros and cons of using the service.

So, lets get started and see if ScrapingBee is worth using for your web scraping project.

ScrapingBee: what are the pros and cons?

Benefits: user-friendly web scraping API

ScrapingBee provides a user-friendly web scraping API that offers various features required for large-scale web scraping and to prevent getting blocked, including proxies and JavaScript rendering. It is recommended for developers seeking a simple solution for extracting data, which can be seamlessly integrated with their existing code for data processing.

Limitations: limited control and no integrated cloud solution

ScrapingBee's straightforward approach may be limiting for developers with advanced web scraping knowledge, as they are required to follow the rules set by ScrapingBee's API and have restricted control over the entire data extraction process.

Additionally, ScrapingBee lacks an integrated solution for managing data extraction flows in the cloud. This can be inconvenient since you would need to find a separate cloud provider or set up your own infrastructure.

ScrapingBee Proxy and API credit consumption

When it comes to large-scale data extraction, proxies are essential for circumventing anti-bot systems used by modern websites. However, utilizing proxies can significantly increase the cost of your web scraping activities. ScrapingBee's API provides several proxy options: Rotating Proxy (default), Premium Proxy, Stealth Proxy, or the ability to use your own proxy. Here is an overview of how the usage of these proxies impacts your API Credit consumption within their system:

Feature used API credit cost/request
Rotating Proxy without JavaScript rendering 1
Rotating Proxy with JavaScript rendering (default) 5
Premium Proxy without JavaScript rendering 10
Premium Proxy with JavaScript rendering 25
Stealth Proxy with JavaScript rendering (only option available) 75

ScrapingBee pricing

The pricing of a service often plays a crucial role in our decision-making process. Fortunately, ScrapingBee provides a freemium model that allows users to try their service for free with 1,000 API credits. Their paid plans range from $49/month to $599+/month for the business plan. The key distinction between these plans is the allocation of API credits, with the base plan offering 150,000 credits and the business plans providing 8,000,000+ credits, depending on your needs. Additionally, the more expensive plans offer higher limits for concurrent requests and improved support.

ScrapingBee scraping test

ScrapingBee offers a versatile data extraction API as one of its primary services, allowing users to extract data from a wide range of web pages. To evaluate its capabilities, I decided to scrape Amazon.com, a well-known website notorious for implementing sophisticated anti-bot systems.

Navigating through ScrapingBee's API was straightforward, and the ScrapingBee documentation provided clear and updated information. With just a few lines of code, as shown in the example below, I successfully extracted the titles, prices, and links of the iPhones listed on the first page of Amazon.com:

from scrapingbee import ScrapingBeeClient # Importing SPB's client
client = ScrapingBeeClient(api_key='YOUR_API_KEY')

response = client.get("<https://www.amazon.com/s?k=iphone&crid=1BIGRK4NGFLDS&sprefix=ipho%2Caps%2C278&ref=nb_sb_noss_2>", params={
'extract_rules':{
                 "product-titles": {
                     "selector": "div.a-section.a-spacing-none.puis-padding-right-small.s-title-instructions-style > h2 > a > span",
                     "type": "list",
                 },
                  "product-prices": {
                      "selector": "div.a-section.a-spacing-none.a-spacing-top-micro.puis-price-instructions-style > div > a > span > span.a-offscreen",
                      "type": "list",
                  },
                  "product-links": {
                     "selector": "div.a-section.a-spacing-none.puis-padding-right-small.s-title-instructions-style > h2 > a",
                     "type": "list",
                     "output": "@href"
                 },

                }
})

if response.ok:
    print(response.json())

Enter fullscreen mode Exit fullscreen mode

If you want to test the provided code yourself, follow these steps:

  1. Create a ScrapingBee account.

  2. Replace the placeholder text in the code with your own ScrapingBee API key.

Once you have completed these steps and run the code, you can expect to see results similar to the example below printed to your terminal:

{
   "product-titles":[
      "Apple iPhone 11, 64GB, Black - Unlocked (Renewed)",
      "Apple iPhone SE (2nd Generation), 64GB, Red - Unlocked (Renewed)",
      "Apple iPhone 12, 64GB, White - Fully Unlocked (Renewed)",
      "Apple iPhone 8, 64GB, Gold - Unlocked (Renewed)",
      "Apple iPhone 12 Mini, 64GB, Black - Unlocked (Renewed)",
      "Apple iPhone X, US Version, 64GB, Silver - Unlocked (Renewed)",
      "Apple iPhone XR, 64GB, Black - Unlocked (Renewed)",
      "Apple iPhone XS, US Version, 64GB, Space Gray - Unlocked (Renewed)",
      "Apple iPhone 8 Plus, US Version, 64GB, Gold - Unlocked (Renewed)",
      "Apple iPhone 14 Pro Max, 128GB, Space Black - Unlocked (Renewed)",
      "Apple iPhone 13, 256GB, Midnight - Unlocked (Renewed)",
      "Apple iPhone 11 Pro, 64GB, Midnight Green - Unlocked (Renewed)",
      "iPhone 13 Mini, 128GB, Pink - Unlocked (Renewed)",
      "Apple iPhone 12 Pro, 256GB, Gold - Fully Unlocked (Renewed)",
      "Apple iPhone SE 3rd Gen, 64GB, Midnight - Unlocked (Renewed)",
      "Apple iPhone 14, 512GB, Purple - Unlocked (Renewed Premium)"
   ],
   "product-prices":[
      "$305.55",
      "$147.00",
      "$394.95",
      "$137.99",
      "$308.99",
      "$223.00",
      "$214.75",
      "$232.00",
      "$189.99",
      "$1,019.99",
      "$629.99",
      "$388.00",
      "$494.99",
      "$584.99",
      "$257.99",
      "$875.00"
   ],
   "product-links":[
      "/Apple-iPhone-11-64GB-Black/dp/B07ZPKN6YR/ref=sr_1_1?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-1",
      "/Apple-iPhone-SE-64GB-Red/dp/B088N8TF64/ref=sr_1_2?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-2",
      "/Apple-iPhone-12-64GB-White/dp/B08PPBQM23/ref=sr_1_3?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-3",
      "/Apple-iPhone-Fully-Unlocked-64GB/dp/B0775717ZP/ref=sr_1_4?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-4",
      "/Apple-iPhone-12-Mini-Black/dp/B08PPDJWC8/ref=sr_1_5?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-5",
      "/Apple-iPhone-Fully-Unlocked-64GB/dp/B07C357FSJ/ref=sr_1_6?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-6",
      "/Apple-iPhone-XR-Fully-Unlocked/dp/B07P6Y7954/ref=sr_1_7?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-7",
      "/Apple-iPhone-64GB-Space-Gray/dp/B07SC58QBW/ref=sr_1_8?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-8",
      "/Apple-iPhone-Plus-Fully-Unlocked/dp/B07757LZ1J/ref=sr_1_9?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-9",
      "/Apple-iPhone-14-Pro-Max/dp/B0BN94DL3R/ref=sr_1_10?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-10",
      "/Apple-iPhone-13-256GB-Midnight/dp/B09LNCVCKW/ref=sr_1_11?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-11",
      "/Apple-iPhone-64GB-Midnight-Green/dp/B07ZQRMWVB/ref=sr_1_12?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-12",
      "/Apple-iPhone-13-Mini-128GB/dp/B09LKF2RPP/ref=sr_1_13?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-13",
      "/Apple-iPhone-Pro-256GB-Gold/dp/B08PN7R2MZ/ref=sr_1_14?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-14",
      "/Apple-iPhone-SE-3rd-Midnight/dp/B0BDY71GRG/ref=sr_1_15?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-15",
      "/Apple-iPhone-14-512GB-Purple/dp/B0BYKX35NT/ref=sr_1_16?keywords=iphone&qid=1688323279&sprefix=ipho%2Caps%2C278&sr=8-16"
   ]
}

Enter fullscreen mode Exit fullscreen mode

In this specific request, using ScrapingBee's API with the default configurations (Rotating Proxy and JavaScript rendering), I was charged 5 API credits. Despite making multiple requests to Amazon.com, I did not encounter any blocking issues when using the API's default settings, which is a good sign about the services reliability.

However, as our operation scales up, it is reasonable to assume that we would require more reliable and costly proxies to sustain this level of performance. So, let's see how we can enable different proxy options using ScrapingBee's API.

Using proxies in ScrapingBee

Enabling proxies in ScrapingBee is straightforward. To use a specific proxy type, you just need to include the corresponding parameter and set it to "True". For instance, to utilize the Premium Proxy, you would add "premium_proxy=True" to your response parameters, as shown below:

from scrapingbee import ScrapingBeeClient # Importing SPB's client
client = ScrapingBeeClient(api_key='YOUR_API_KEY')

response = client.get("<https://www.amazon.com/s?k=iphone&crid=1BIGRK4NGFLDS&sprefix=ipho%2Caps%2C278&ref=nb_sb_noss_2>", params={
# Choose the proxy type you want by adding the premium_proxy, stealth_proxy or own_proxy parameters
'premium_proxy': 'True',
'extract_rules':{
                 "product-titles": {
                     "selector": "div.a-section.a-spacing-none.puis-padding-right-small.s-title-instructions-style > h2 > a > span",
                     "type": "list",
                 },
                  "product-prices": {
                      "selector": "div.a-section.a-spacing-none.a-spacing-top-micro.puis-price-instructions-style > div > a > span > span.a-offscreen",
                      "type": "list",
                  },
                  "product-links": {
                     "selector": "div.a-section.a-spacing-none.puis-padding-right-small.s-title-instructions-style > h2 > a",
                     "type": "list",
                     "output": "@href"
                 },

                }
})

if response.ok:
    print(response.json())

Enter fullscreen mode Exit fullscreen mode

It's worth mentioning that enabling this option can enhance the reliability of our data extraction process by reducing the risk of our bot being blocked. However, it's important to note that this improvement comes at a higher cost per request.

For instance, in my case, using the Premium Proxy and JavaScript rendering for this request consumed 25 credits, which is a fivefold increase compared to the 5 credits spent when using the default Proxy rotation configuration.

Limitations of the ScrapingBee web scraping API

Although I was pleasantly surprised by the ease of extracting the desired data and the low incidence of blocked requests, I found it frustrating that the API had limitations when it came to more complex operations. For instance, if I were building my own scraper, I could easily handle Amazon's pagination and extract data from all the search results while maintaining complete control over the scraper's behavior. However, achieving a similar outcome using ScrapingBee's API was not immediately apparent, and their documentation lacked information on this matter.

Furthermore, the simplicity of ScrapingBee's pricing system has both positive and negative aspects. It is reassuring to know the exact number of credits each request will cost based on the chosen parameters. However, I would have appreciated a more detailed breakdown of my usage and charges within ScrapingBee's dashboard for better transparency.

Lastly, I missed having convenient access to an integrated cloud infrastructure like Apify or Zyte. While I understand that is not ScrapingBee's primary focus, having an all-in-one solution for my web scraping needs would save considerable time and effort, rather than having to search for and pay for different services to host my data extraction workflows.

Conclusion and final considerations

In conclusion, the ScrapingBee Data Extraction API offers a reliable solution for developers seeking a straightforward method to extract data from websites without the complexities of building a scraper from scratch. However, if you require a more comprehensive solution with a wider range of pre-built features and greater control over your applications and data extraction process, relying solely on ScrapingBee may not fully meet your needs.

Finally, I want to emphasize that this post serves as an introductory analysis and guide to ScrapingBee's service, assisting developers in determining if it is the right choice for them. It is important to note that not all features provided by their API have been explored in this article.

This is the first in a series of articles we commissioned from an external developer (although Percival is a former Apifier). We want to create unbiased reviews of other web scraping platforms and companies as part of our continued evaluation of the web scraping industry.

If you find yourself intrigued by ScrapingBee, I encourage you to further explore the ScrapingBee documentation for a more in-depth understanding of the platform's capabilities.

Best web scraping APIs in 2023

We explore 10 top-notch web scraping API options.

favicon blog.apify.com

.

Top comments (0)