Crawlbase

Posted on Oct 2 • Originally published at crawlbase.com

How To Create a Zalando Scraper

#zalandoscraper #zalandoscraping #scrapezalando

This blog was originally posted to Crawlbase Blog

Looking to scrape Zalando? You’re in the right place. Zalando is one of the top fashion online shopping sites with a huge range of stuff from clothes to accessories. Maybe you’re doing market research or building a fashion app - either way, knowing how to get good data straight from the site can be handy.

In this blog, we’ll show you how to create a reliable Zalando scraper with Puppeteer - a well-known web scraping tool. You’ll learn how to pull out product details such as prices, sizes, and stock levels. We’ll also give you tips on how to handle CAPTCHA, IP blocking, and how to scale your scraper with Crawlbase Smart Proxy.

Let’s get started!

Why Scrape Zalando for Product Data?

Scraping Zalando is a great way to get product data for various purposes. Whether you’re monitoring prices, tracking product availability, or analyzing fashion trends, having access to this data gives you an edge. Zalando is one of the largest online fashion platforms in Europe with a wide range of products from shoes and clothes to accessories.

By scraping Zalando, you can extract product names, prices, reviews, and availability. This data can be used to compare prices, create data-driven marketing strategies, or even build an automated price tracker. If you run an eCommerce business or just want to keep an eye on the latest fashion trends, scraping Zalando’s product data will help you stay ahead.

Using a scraper to get data from Zalando saves you the time and effort of manually searching and copying product information. With the right setup you can get thousands of product details in no time and efficiently, making your data collection process more streamlined.

Key Data Points to Extract from Zalando

When scraping Zalando you can extract several important product information. These details are useful for tracking trends, understanding prices, or analyzing market behaviors. Below are the main data points to focus on:

Product Name: The name of the product helps you identify and categorize what is being sold.
Product Price: Knowing the price, including discounts, is essential for monitoring price trends and comparing competitors.
Product Description: This gives specific information about the product, such as material, style, and other key features.
Product Reviews: Reviews provide information about product quality and popularity and are useful for sentiment analysis.
Product Availability: Checking if a product is in stock helps you understand demand and how quickly items are selling.
Product Images: Images give a clear view of the product, which is important for understanding fashion trends and styles.
Brand Name: Knowing the brand allows for better analysis of brand performance and comparison across different brands.

Setting Up Your Node.js Environment

In order to efficiently scrape Zalando, you will need to configure your Node.js environment. This process involves installing Node.js, the necessary libraries, and choosing a suitable Integrated Development Environment (IDE). Here’s how to do it step by step:

Installing Node.js

Download Node.js: Go to the official Node.js website to get its latest version for your operating system. Node.js comes with npm (Node Package Manager), which you’ll use to install other libraries.
Install Node.js: Follow the installation instructions for your operating system. You may verify if it is installed by opening your terminal or command prompt and typing:

node -v

This command should display the installed version of Node.js.

Installing Required Libraries

Create a New Project Folder: Create a folder for your scraping project. Open the terminal inside this folder.
Initialize npm: Inside your project folder, run:

npm init -y

This command creates a package.json file that keeps track of your project’s dependencies.

Install Required Libraries: You’ll need a few libraries to make scraping easier. Install Puppeteer and any other libraries you may require:

npm install puppeteer axios

Create the Main File: In your project folder, create a file named scraper.js. This file will contain your scraping code.

Choosing an IDE

Selecting an IDE can make coding easier. Some of the popular ones include:

Visual Studio Code: Popular editor with lots of extensions for working with JavaScript.
WebStorm: A powerful IDE specifically designed for JavaScript and web development, but it isn't free.
Atom: A hackable text editor that is customizable and user-friendly.

Now that you have your environment set up and scraper.js created. Let’s get started with scraping Zalando product listings.

Scraping Zalando Product Listings

After setting up the environment, we can start creating the scraper for Zalando product listings. We will scrape the handbags section from this URL:

https://en.zalando.de/catalogue/?q=handbags

We’ll extract the product page URL, title, store name, price, and image URL from each listing. We will also handle pagination to go through multiple pages.

Inspecting the HTML for Selectors

First we have to inspect the HTML of the product listings page to find the correct selectors. Open the developer tools in your browser and navigate to the handbag listings.

You’ll typically look for elements like:

Product Page URL: This is the link to the individual product page.
Product Title: Usually in an <h3> tag within a <div> element.
Brand Name: This may be found in an <h3> tag within a <div> element.
Price: Found in a <span> tag with a price class.
Image URL: Contained in the <img> tag within each product card.

Writing the Zalando Product Listings Scraper

Now that you have the selectors, you can write a scraper to collect product listings. Here’s an example code snippet using Puppeteer:

const puppeteer = require('puppeteer');

// Function to scrape product listings from Zalando
async function scrapeProductListings(page) {
  await page.goto('https://en.zalando.de/catalogue/?q=handbags', { timeout: 0 });

  // Scraping product listings
  const products = await page.evaluate(() => {
    return Array.from(document.querySelectorAll('div[data-zalon-partner-target="true"] > div.cYylcv.BaerYO')).map(
      (card) => {
        const title = card.querySelector('div.Zhr-fS h3:last-child')?.innerText; // Product title
        const storeName = card.querySelector('div.Zhr-fS h3:first-child')?.innerText; // Store name
        const price = card.querySelector('span.sDq_FX.lystZ1')?.innerText; // Price
        const productUrl = card.querySelector('a')?.href; // Product URL
        const thumbnail = card.querySelector('img:first-child')?.src; // Image URL

        return { title, storeName, price, productUrl, thumbnail };
      },
    );
  });

  return products;
}

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const productListings = await scrapeProductListings(page);
  console.log('Product Listings:', productListings);

  await browser.close();
})();

Code Explanation:

scrapeProductListings Function: This function navigates to the Zalando product page, with unlimited timeout limit, and extracts the product title, price, URL, and image URL.
Data Collection: The function returns an array of product objects containing the scraped information.

Example Output:

Product Listings: [
  {
    title: 'Handbag - black',
    brandName: 'Anna Field',
    price: '34,99 €',
    productUrl: 'https://en.zalando.de/anna-field-handbag-black-an651h0x2-q11.html',
    thumbnail: 'https://img01.ztat.net/article/spp-media-p1/4ce13463cf9a4dda9828bfc44f65bb6e/45133485dd0c4b03b1b122f0deeb0801.jpg?imwidth=300&filter=packshot'
  },
  {
    title: 'LEATHER - Handbag - black',
    brandName: 'Zign',
    price: '49,99 €',
    productUrl: 'https://en.zalando.de/zign-handbag-black-zi151h08a-q11.html',
    thumbnail: 'https://img01.ztat.net/article/spp-media-p1/a86e1fd894b33f8388ed33009cb6cfd2/62c903c4162141fa8c1452be53635f02.jpg?imwidth=300&filter=packshot'
  },
  {
    title: 'NOELLE TOP ZIP SHOULDER BAG - Handbag - coal logo',
    brandName: 'Guess',
    price: '124,95 €',
    productUrl: 'https://en.zalando.de/guess-noelle-top-zip-shoulder-bag-handbag-coal-logo-gu151h4zp-c11.html',
    thumbnail: 'https://img01.ztat.net/article/spp-media-p1/b6c00ad1942e4b439808bf3099e035ab/38798e461de54ddfad6a33d6f1ab5e42.jpg?imwidth=300&filter=packshot'
  },
  .... more
]

Handling Pagination

To gather more listings, you need to handle pagination. Zalando uses the &p= parameter in the URL to navigate between pages. Here’s how to modify your scraper to handle multiple pages:

async function scrapeAllProductListings(page, totalPages) {
  let allProducts = [];

  for (let i = 1; i <= totalPages; i++) {
    const url = `https://en.zalando.de/catalogue/?q=handbags&p=${i}`;
    await page.goto(url);
    await page.waitForSelector('.product-card'); // Wait for product cards to load

    const products = await scrapeProductListings(page);
    allProducts = allProducts.concat(products); // Combine products from all pages
  }

  return allProducts;
}

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const totalPages = 5; // Specify the total number of pages you want to scrape
  const allProductListings = await scrapeAllProductListings(page, totalPages);
  console.log('All Product Listings:', allProductListings);

  await browser.close();
})();

Code Explanation:

scrapeAllProductListings Function: This function loops through the specified number of pages, constructs the URL for each page, and calls the scrapeProductListings function to gather data from each page.
Pagination Handling: Products from all pages are combined into a single array.

Storing Data in a JSON File

Finally, it's useful to store the scraped data in a JSON file for later analysis. Here's how to do that:

const puppeteer = require('puppeteer');
const fs = require('fs');

// Copy scrapeAllProductListings and scrapeProductListings functions from previous code snippets

// Function to save scraped data to a JSON file
function saveDataToJson(data, filename = 'zalando_product_listings.json') {
  fs.writeFileSync(filename, JSON.stringify(data, null, 2));
  console.log(`Data successfully saved to ${filename}`);
}

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const totalPages = 5; // Specify the total number of pages you want to scrape
  const allProductListings = await scrapeAllProductListings(page, totalPages);

  // Save scraped product listings to a JSON file
  saveDataToJson(allProductListings);

  await browser.close();
})();

Code Explanation:

saveDataToJson Function: This function saves the scraped product listings to a json file (zalando_product_listings.json) so you can easily access the data

Next up we will cover how to scrape product data from individual product pages.

Scraping Zalando Product Details

Now that you have scraped the listings, the next step is to gather data from individual product pages. This allows you to get more specific data like product descriptions, material details, and customer reviews, which are not available on the listing pages.

To scrape the product details, we’ll first inspect the structure of the product page and identify the relevant HTML elements that contain the data we need.

Inspecting the HTML for Selectors

Visit any individual product page from Zalando and use your browser’s developer tools to inspect the HTML structure.

You’ll typically need to find elements like:

Product Title: Usually within a <span> tag with classes like EKabf7 R_QwOV.
Brand Name: Usually within a <span> tag with classes like z2N-Fg yOtBvf.
Product Details: Located in a <div> within data-testid="pdp-accordion-details".
Price: In a <span> tag with classes like dgII7d Km7l2y.
Available Sizes: Often listed in a <div> within data-testid="pdp-accordion-size_fit".
Image URLs: Contained in the <img> tag within a <ul> with classes like XLgdq7 _0xLoFW.

Writing the Zalando Product Details Scraper

Once you have the correct selectors, you can write a scraper to collect product details like the title, description, price, available sizes, and image URLs.

Here’s an example code to scrape Zalando product details using Puppeteer:

const puppeteer = require('puppeteer');

// Function to scrape product details from a single product URL
// Function to scrape product details from a single product URL
async function scrapeProductDetails(page, productUrl) {
  await page.goto(productUrl, { timeout: 0 });

  // Click the "Details" section
  const detailsButtonSelector = 'div[data-testid="pdp-accordion-details"] button';
  const sizesButtonSelector = 'div[data-testid="pdp-accordion-size_fit"] button';

  // Wait for the details button and click it
  await page.waitForSelector(detailsButtonSelector);
  await page.click(detailsButtonSelector);

  // Wait for the sizes button and click it
  await page.waitForSelector(sizesButtonSelector);
  await page.click(sizesButtonSelector);

  // Scraping product details
  const productDetails = await page.evaluate(() => {
    const title = document.querySelector('span.EKabf7.R_QwOV')?.innerText; // Product title
    const brandName = document.querySelector('span.z2N-Fg.yOtBvf')?.innerText; // Brand name
    const details = Object.fromEntries(
      Array.from(document.querySelectorAll('div[data-testid="pdp-accordion-details"] div.qMOFyE')).map((item) => [
        item.querySelector('dt')?.innerText.trim(),
        item.querySelector('dd')?.innerText.trim(),
      ]),
    ); // Product details
    const price = document.querySelector('span.dgII7d.Km7l2y')?.innerText; // Price
    const sizes = Object.fromEntries(
      Array.from(document.querySelectorAll('div[data-testid="pdp-accordion-size_fit"] div.qMOFyE')).map((item) => [
        item.querySelector('dt')?.innerText.trim(),
        item.querySelector('dd')?.innerText.trim(),
      ]),
    ); // Available Sizes
    const imageUrls = Array.from(document.querySelectorAll('ul.XLgdq7._0xLoFW li img')).map((img) => img.src); // Product images URL

    return { title, brandName, details, price, sizes, imageUrls };
  });

  return { url: productUrl, ...productDetails };
}

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const productUrls = [
    'https://en.zalando.de/anna-field-handbag-black-an651h0x2-q11.html',
    'https://en.zalando.de/zign-handbag-black-zi151h08a-q11.html',
    // Add more product URLs here
  ];

  const allProductDetails = [];

  for (const url of productUrls) {
    const details = await scrapeProductDetails(page, url);
    allProductDetails.push(details);
  }

  console.log('Product details scraped successfully:', allProductDetails);
  await browser.close();
})();

Code Explanation:

scrapeProductDetails Function: This function navigates to the product URL, waits for the content to load, and scrapes the product title, description, price, available sizes, and image URLs. To access the relevant content, the function first waits for the "Details" and "Sizes" buttons to become visible using await page.waitForSelector(), then clicks them with await page.click(). This expands the respective sections, enabling the extraction of their content.
Product URLs Array: This array contains the product page URLs you want to scrape.

Example Output:

Product details scraped successfully: [
  {
    url: 'https://en.zalando.de/anna-field-handbag-black-an651h0x2-q11.html',
    title: 'Handbag - black',
    brandName: 'Anna Field',
    details: { 'Fastening:': 'Zip', 'Pattern:': 'Plain', 'Details:': 'Buckle' },
    price: '108,95 €',
    sizes: {
      'Height:': '28 cm (Size One Size)',
      'Length:': '36 cm (Size One Size)',
      'Width:': '12 cm (Size One Size)'
    },
    imageUrls: [
      'https://img01.ztat.net/article/spp-media-p1/3359d0e0d8484d9ba930544c6c71a861/7859902ec50b4d88899541e3c1cf976b.jpg?imwidth=762',
      'https://img01.ztat.net/article/spp-media-p1/4ce13463cf9a4dda9828bfc44f65bb6e/45133485dd0c4b03b1b122f0deeb0801.jpg?imwidth=762&filter=packshot',
      'https://img01.ztat.net/article/spp-media-p1/4ce13463cf9a4dda9828bfc44f65bb6e/45133485dd0c4b03b1b122f0deeb0801.jpg?imwidth=156&filter=packshot',
      .... more
    ]
  },
  {
    url: 'https://en.zalando.de/zign-handbag-black-zi151h08a-q11.html',
    title: 'LEATHER - Handbag - black',
    brandName: 'Zign',
    details: { 'Fastening:': 'Zip', 'Pattern:': 'Plain' },
    price: '51,99 €',
    sizes: {
      'Height:': '25 cm (Size One Size)',
      'Length:': '36 cm (Size One Size)',
      'Width:': '11 cm (Size One Size)'
    },
    imageUrls: [
      'https://img01.ztat.net/article/spp-media-p1/a86e1fd894b33f8388ed33009cb6cfd2/62c903c4162141fa8c1452be53635f02.jpg?imwidth=762&filter=packshot',
      'https://img01.ztat.net/article/spp-media-p1/cb7586f888fe39bc8e160d909a2403e3/194701c057bb4c6595849a0ffe13da24.jpg?imwidth=762',
      'https://img01.ztat.net/article/spp-media-p1/a86e1fd894b33f8388ed33009cb6cfd2/62c903c4162141fa8c1452be53635f02.jpg?imwidth=156&filter=packshot',
      .... more
    ]
  }
]

Storing Data in a JSON File

After scraping the product details, it’s a good idea to save data in a JSON file. This makes it easier to access and analyze later. Here's how to save the scraped product details to a JSON file.

const fs = require('fs');

// Copy scrapeProductDetails function from previous code snippet

// Function to save scraped data to a JSON file
function saveDataToJson(data, filename = 'zalando_product_details.json') {
  fs.writeFileSync(filename, JSON.stringify(data, null, 2));
  console.log(`Data successfully saved to ${filename}`);
}

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const productUrls = [
    'https://en.zalando.de/anna-field-handbag-black-an651h0x2-q11.html',
    'https://en.zalando.de/zign-handbag-black-zi151h08a-q11.html',
    // Add more product URLs here
  ];

  const allProductDetails = [];

  for (const url of productUrls) {
    const details = await scrapeProductDetails(page, url);
    allProductDetails.push(details);
  }

  // Save scraped product details to a JSON file
  saveDataToJson(allProductDetails);

  await browser.close();
})();

Code Explanation:

saveDataToJson Function: This function writes the scraped product details to a JSON file (zalando_product_details.json), formatted for easy reading.
Data Storage: After scraping the details, the data is passed to the function to be saved in a structured format.

In the next section, we’ll look at how you can optimize your scraper using Crawlbase Smart Proxy to avoid getting blocked while scraping.

Optimizing with Crawlbase Smart Proxy

When scraping Zalando, you might get blocked or throttled. To avoid this, use a proxy service. Crawlbase Smart Proxy helps you scrape safely and fast. Here’s how to integrate it into your Zalando scraper.

How to Use Crawlbase Smart Proxy with Puppeteer

Integrating Crawlbase Smart Proxy into your Puppeteer script is straightforward. You’ll need your Crawlbase API key to get started.

Here’s how to set it up:

Sign Up for Crawlbase: Go to the Crawlbase website and create an account. After signing up you’ll get an API Token.
Update Your Puppeteer Script: Modify your existing scraper to use the Crawlbase proxy.

Here’s an updated version of your Zalando product scraper with Crawlbase Smart Proxy:

const puppeteer = require('puppeteer');

const proxyUrl = 'http://_USER_TOKEN_@smartproxy.crawlbase.com:8012'; // Replace _USER_TOKEN_ with your token

async function scrapeProductDetails(page, productUrl) {
  await page.goto(productUrl, { timeout: 0 });

  // Scraping product details
  const productDetails = await page.evaluate(() => {
    const title = document.querySelector('h1')?.innerText;
    const description = document.querySelector('.product-description')?.innerText;
    const price = document.querySelector('.price')?.innerText;
    const sizes = Array.from(document.querySelectorAll('.size-options')).map((size) => size.innerText);
    const imageUrl = document.querySelector('img')?.src;

    return { title, description, price, sizes, imageUrl };
  });

  return { url: productUrl, ...productDetails };
}

(async () => {
  const browser = await puppeteer.launch({
    args: [`--proxy-server=${proxyUrl}`], // Use Crawlbase proxy
  });
  const page = await browser.newPage();

  const productUrls = [
    'https://en.zalando.de/anna-field-handbag-black-an651h0x2-q11.html',
    'https://en.zalando.de/zign-handbag-black-zi151h08a-q11.html',
    // Add more product URLs here
  ];

  const allProductDetails = [];

  for (const url of productUrls) {
    const details = await scrapeProductDetails(page, url);
    allProductDetails.push(details);
  }

  console.log('Product details scraped successfully:', allProductDetails);
  await browser.close();
})();

Code Explanation:

Proxy Setup: Replace _USER_TOKEN_ with your actual Crawlbase token. This tells Puppeteer to use the Crawlbase proxy for all requests.
Browser Launch Options: The args parameter in the puppeteer.launch() method specifies the proxy server to use. This way, all your requests go through the Crawlbase proxy.

Optimize your Zalando Scraper with Crawlbase

Scraping Zalando can provide useful information for your projects. In this blog, we showed you how to set up your Node.js environment and scrape product listings and details. Always check Zalando’s scraping rules to stay within their limits.

Using Puppeteer with Crawlbase Smart Proxy makes your scraping faster and more robust. Storing your data in JSON makes it easy to manage and analyze. Remember, website layouts can change, so keep your scrapers up to date.

If you're interested in exploring scraping from other e-commerce platforms, feel free to explore the following comprehensive guides.

📜 How to Scrape Amazon
📜 How to scrape Walmart
📜 How to Scrape AliExpress
📜 How to Scrape Flipkart
📜 How to Scrape Etsy

If you have any questions or feedback, our support team is always available to assist you on your web scraping journey. Good luck with your scraping journey!

Frequently Asked Questions

Q. Is scraping Zalando legal?

Scraping data from Zalando can have legal implications. Make sure to review the website’s terms of service to see what they say about data scraping. Some websites will explicitly not allow scraping, while others will allow it under certain conditions. By following the website’s rules, you can avoid legal issues and be ethical.

Q. What tools do I need to scrape Zalando?

To scrape Zalando, you need specific tools because the site uses JavaScript rendering. First, install Node.js, which allows you to run JavaScript code outside a browser. Then, use Puppeteer, a powerful library that controls a headless Chrome browser so you can interact with JavaScript-rendered content. Also, consider using Crawlbase Crawling API, which can help with IP rotation and bypassing blocks. Together, these tools will help you extract data from Zalando’s dynamic pages.

Q. Why use Crawlbase Smart Proxy while scraping Zalando?

Using Crawlbase Smart Proxy for Zalando scraping is good for several reasons. It will prevent you from getting blocked by the website, it rotates IP addresses to mimic regular user behavior. So your scraping will be more effective and you can collect data continuously without interruptions. Crawlbase Smart Proxy will also speed up your scraping, so you can collect data faster and more efficiently.

DEV Community

How To Create a Zalando Scraper

Why Scrape Zalando for Product Data?

Key Data Points to Extract from Zalando

Setting Up Your Node.js Environment

Installing Node.js

Installing Required Libraries

Choosing an IDE

Scraping Zalando Product Listings

Inspecting the HTML for Selectors

Writing the Zalando Product Listings Scraper

Handling Pagination

Storing Data in a JSON File

Scraping Zalando Product Details

Inspecting the HTML for Selectors

Writing the Zalando Product Details Scraper

Storing Data in a JSON File

Optimizing with Crawlbase Smart Proxy

How to Use Crawlbase Smart Proxy with Puppeteer

Optimize your Zalando Scraper with Crawlbase

Frequently Asked Questions

Q. Is scraping Zalando legal?

Q. What tools do I need to scrape Zalando?

Q. Why use Crawlbase Smart Proxy while scraping Zalando?

Top comments (0)

Read next

A Handbook to Grasping Contemporary Mobile Device Technology

PHP Design Patterns: Front Controller

Stop Trying to Learn Everything -Focus on These 5 Key Skills Every Developer Needs

Domina el arte del análisis de datos junto a AWS