DEV Community

Cover image for How to Scrape Producthunt Profiles and Products
Crawlbase
Crawlbase

Posted on • Updated on

How to Scrape Producthunt Profiles and Products

This blog was originally posted to Crawlbase Blog

Product Hunt, established in 2013, has evolved into a dynamic platform that prominently features new products and startups across diverse sectors. It boasts a substantial community of creators and enthusiasts. From its inception, Product Hunt has accumulated an extensive array of profiles and products. Presently, the platform harbors many registered profiles and products, rendering it an invaluable resource for exploring information. Scrape Product Hunt to find detailed descriptions and reviews of products, along with insights into user engagement. The platform offers a wealth of information ripe for discovery.

In this blog post, we will scrape information from Product Hunt profiles and products using the Crawlbase Crawling API and JavaScript. With these tools, we can scrape important data like product names, descriptions, details about the makers, upvote counts, release dates, and what users are saying about Product Hunt platform. Let's take a closer look at how this whole process works and what interesting things we can learn from the data on Product Hunt.

Table of Contents:

Product Hunt Data to Scrape

  • Product Data
  • User Data
  • Engagement Metrics
  • Trending and Historical Data

Featured Products and Profiles

  • Curated Selection
  • Increased Visibility

Scraping Product Hunt Data

  • Learn JavaScript Basics
  • Get Crawlbase API Token
  • Setting Up the Environment

Fetching Product Hunt Products data HTML

Scrape Product Hunt Products meaningful Data

Scrape Product Hunt Profile Data

Conclusion

Frequently Asked Questions

Product Hunt Data to Scrape

Product Hunt provides a rich dataset that encompasses a variety of information, offering a comprehensive view of the products and the community. Here's a breakdown of the key types of data available:

  1. Product Data:
    • Name and Description: Each product listed on Product Hunt comes with a name and a detailed description, outlining its features and purpose.
    • Category: Products are categorized into different sections, ranging from software and mobile apps to hardware and books.
    • Launch Date: The date when a product was officially launched is recorded, providing insights into the timeline of innovation.
  2. User Data:
    • Profiles: Users have profiles containing information about themselves, their submitted products, and their interactions within the community.
    • Products Submitted: A record of the products a user has submitted, reflecting their contributions to the platform.
    • Engagement Metrics: Information on how users engage with products, including upvotes, comments, and followers.
  3. Engagement Metrics:
    • Upvotes: The number of upvotes a product receives indicates its popularity and acceptance within the community.
    • Comments: User comments provide qualitative insights, feedback, and discussions around a particular product.
    • Popularity: Metrics that quantify a product's overall popularity, which can be a combination of upvotes, comments, and other engagement factors.
  4. Trending and Historical Data:
    • Trending Products: Identification of products currently gaining momentum and popularity.
    • Historical Trends: Analysis of how a product's popularity has changed over time, helping identify patterns and factors influencing success.

Image description

Featured Products and Profiles

Product Hunt prominently features a curated selection of products and profiles on its homepage. Understanding the criteria for featuring provides valuable insights into the dynamics of the platform:

Curated Selection:

  • Product Hunt Team Selection: The Product Hunt team curates and features products they find particularly innovative, interesting, or relevant.
  • Community Engagement: Products that receive significant user engagement, such as upvotes and comments, are more likely to be featured.

Increased Visibility:

  • Homepage Exposure: Featured products enjoy prime placement on the Product Hunt homepage, increasing their visibility to a broader audience.
  • Enhanced Recognition: Being featured lends credibility and recognition to a product, potentially attracting more attention from users, investors, and the media.

It's essential for anyone using the Product Hunt platform to understand how different types of data work together and the things that affect which products get featured. This knowledge helps you navigate and make the most of Product Hunt effectively.

Scraping Product Hunt Data

Learn JavaScript Basics:

Before scraping data from Product Hunt, we must understand some basics of JavaScript, the programming language we'll be using. Familiarize yourself with concepts like DOM manipulation, which helps us interact with different parts of a webpage, make HTTP requests to get data, and handle asynchronous operations for smoother coding. Knowing these basics will be helpful for our project.

Get Crawlbase API Token:

Let's talk about getting the token we need from Crawlbase to make our Product Hunt scraping work.

  1. Log in to your Crawlbase account on their website.
  2. Once logged in, find the "Account Documentation" page inside your Crawlbase dashboard.
  3. Look for a code called "JavaScript token" on that page. Copy this code – it's like a secret key that ensures our code can communicate with Product Hunt properly.

Image description

Now that you have this token, you can complete the setup for our Product Hunt scraping project to work smoothly.

Setting Up the Environment

Now that we have everything ready let's set up the tools we need for our JavaScript code. Follow these steps in order:

  1. Create Project Folder:

Open your terminal and type mkdir producthunt_scraper to create a new folder for your project. You can name this folder whatever you want.

mkdir producthunt_scraper
Enter fullscreen mode Exit fullscreen mode
  1. Navigate to Project Folder:

Type cd producthunt_scraper to go into the new folder. This helps you manage your project files better.

cd producthunt_scraper
Enter fullscreen mode Exit fullscreen mode
  1. Create JavaScript File:

Type touch scraper.js to create a new file called scraper.js. You can name this file differently if you prefer.

touch scraper.js
Enter fullscreen mode Exit fullscreen mode
  1. Install Crawlbase Package:

Type npm install crawlbase to install a package called Crawlbase. This package is crucial for our project as it helps us interact with the Crawlbase Crawling API, making it easier to get information from websites.

npm install crawlbase
Enter fullscreen mode Exit fullscreen mode

By following these steps, you're setting up the basic structure for your Product Hunt scraping project. You'll have a dedicated folder, a JavaScript file to write your code, and the necessary Crawlbase tool to make the scraping process smooth and organized.

Fetching Product Hunt Products Data HTML

After getting your API credentials and installing the Node.js library for web scraping, it's time to work on "scraper.js" file. Now, choose the Product Hunt category page you want to scrape. For this example, let's focus on the Product Hunt category page for "Best Engineering & Development Products of 2024" to scrape different product data. In the "scraper.js" file, you'll use Node.js and the Cheerio library to extract information from the chosen Product Hunt page. Make sure to replace the code's placeholder URL with the page's actual URL.

Image description
To make the Crawlbase Crawling API work, follow these steps:

  1. Make sure you have the "scraper.js" file created, as explained before.
  2. Copy and paste the script provided into that file.
  3. Run the script in your terminal by typing "node scraper.js" and pressing Enter.
const { CrawlingAPI } = require('crawlbase'),
  fs = require('fs'),
  crawlbaseToken = 'YOUR_CRAWLBASE_JS_TOKEN',
  api = new CrawlingAPI({ token: crawlbaseToken }),
  producthuntPageURL = 'https://www.producthunt.com/categories/engineering-development';

api.get(producthuntPageURL).then(handleCrawlResponse).catch(handleCrawlError);

function handleCrawlResponse(response) {
  if (response.statusCode === 200) {
    fs.writeFileSync('response.html', response.body);
    console.log('HTML saved to response.html');
  }
}

function handleCrawlError(error) {
  console.error(error);
}
Enter fullscreen mode Exit fullscreen mode

HTML Response:

Image description

Scrape Product Hunt Products meaningful Data

This example shows you how to scrape different product data from a Product Hunt category page. This includes the product's name, description, stars, and reviews. We'll be using two JavaScript libraries: Cheerio, which is commonly used for web scraping, and fs, which is often used for handling files.

The provided JavaScript code uses the Cheerio library to extract details from a Product Hunt page. It takes the HTML content you obtained in the previous step from "scraper.js," processes it with Cheerio, and collects information like the product's name, description, stars, and reviews. The script reviews each product listing and saves the gathered data in a JSON array.

const fs = require('fs'),
  cheerio = require('cheerio'),
  htmlContent = fs.readFileSync('response.html', 'utf8'),
  $ = cheerio.load(htmlContent),
  products = [];

$('div.flex.direction-column.mb-mobile-10.mb-tablet-15.mb-desktop-15.mb-widescreen-15').each((index, element) => {
  const productInfo = {
    name: $(element).find('div.color-blue.fontSize-18.fontWeight-600').text(),
    stars: $(element).find('div.flex.direction-row.align-center label').length,
    reviews: $(element).find('div.ml-3.color-lighter-grey.fontSize-14.fontWeight-400').text().trim(),
    description: $(element)
      .find(
        'div.color-lighter-grey.fontSize-mobile-14.fontSize-tablet-16.fontSize-desktop-16.fontSize-widescreen-16.fontWeight-400',
      )
      .text()
      .trim(),
  };

  products.push(productInfo);
});

const jsonData = JSON.stringify(products, null, 2);
fs.writeFileSync('products_info.json', jsonData, 'utf8');
console.log(jsonData);
Enter fullscreen mode Exit fullscreen mode

JSON Response:

[
  {
    "name": "The Free Website Guys",
    "stars": 5,
    "reviews": "151 reviews",
    "description": "The Free Website Guys is a popular website development agency that is famous for its free website program. To date, it has helped over 10,000+ entrepreneurs get a professional website. It does not charge for this work, instead using its free website program as a way to create trust and build connections with business owners, a percentage of whom later hire the company for larger paid projects further down the line.\n\nIt is rated the #1 web agency on Clutch, G2, TrustPilot, UpCity, and Good Firms."
  },
  {
    "name": "Zipy",
    "stars": 5,
    "reviews": "132 reviews",
    "description": "Zipy is a debugging platform with user session replay, frontend and network monitoring in one. ⏰ Install in a min ▶️ Replay error sessions in real time 🚀 Dev tools, Stack Trace, Console & Network Logs Have questions? Ask the Maker"
  },
  {
    "name": "Graphite",
    "stars": 5,
    "reviews": "60 reviews",
    "description": "Ship code faster with Graphite. Stay unblocked on code review with “stacking” - the workflow engineers at top companies use to accelerate their development. Now available to anyone with a GitHub account."
  },
  {
    "name": "Mage",
    "stars": 5,
    "reviews": "63 reviews",
    "description": "Open-source data pipeline tool for transforming and integrating data. The modern replacement for Airflow.\n- Integrate and synchronize data from 3rd party sources\n- Build real-time and batch pipelines to transform data using Python, SQL, and R\n- Run, monitor, and orchestrate thousands of pipelines without losing sleep"
  },
  {
    "name": "SingleStore Kai™",
    "stars": 5,
    "reviews": "105 reviews",
    "description": "SingleStore Kai enables up to 100x faster analytics on JSON data within existing MongoDB applications. The easy-to use-API for MongoDB enables developers to use familiar MongoDB commands to achieve real-time analytics for their applications."
  },
  {
    "name": "Lottielab",
    "stars": 5,
    "reviews": "66 reviews",
    "description": "Create and export Lottie animations to your websites and apps easily! - Import SVGs, Lotties, from Figma or create from scratch - Animate with a simple but powerful timeline - Export as Lottie, Gif or MP4 to any platform - Collaborate with your team"
  },
  {
    "name": "Wewaat",
    "stars": 5,
    "reviews": "34 reviews",
    "description": "One directory for all your no code needs, plus marketing and sales tools to help you launch, market and sell. Search and discover tools based on your project requirements or your budget with more than 30 different categories of tools."
  },
  {
    "name": "Datatera.ai",
    "stars": 5,
    "reviews": "41 reviews",
    "description": "Convert ANY website or file to a structured dataset or CRM/ERP/HR and other solutions in seconds without code and mappings with a power of AI"
  },
  {
    "name": "beams",
    "stars": 5,
    "reviews": "86 reviews",
    "description": "Constant context switching, too many open tabs and distracting notifications - sounds familiar? beams gently guides you through your busy workday - directly from the menu bar. Joining a call or going into undisturbed focus time is now only a keystroke away. Stay tuned!"
  },
  {
    "name": "Codelita",
    "stars": 5,
    "reviews": "81 reviews",
    "description": "Codelita® is an online platform for learning programming from scratch, even on mobile devices!"
  }
]
Enter fullscreen mode Exit fullscreen mode

Scrape Product Hunt Profile Data

In this example, we'll explain how to extract information from a Product Hunt user profile, specifically focusing on the Saas Warrior profile. The data we want to collect includes user details like user ID, name, about section, followers, following, points, interests, badges, and more. To do this, we'll first get the HTML code of the Product Hunt user profile page and then create a custom JavaScript Product Hunt scraper to extract the desired data from this HTML code.

For this task, we'll use two JavaScript libraries: cheerio, commonly used for web scraping, and fs, which helps with file operations. The provided script reads through the HTML code of the Product Hunt user profile page, extracts the relevant data, and saves it into a JSON array.

const { CrawlingAPI } = require('crawlbase'),
  fs = require('fs'),
  cheerio = require('cheerio'),
  crawlbaseToken = 'YOUR_CRAWLBASE_JS_TOKEN',
  api = new CrawlingAPI({ token: crawlbaseToken }),
  producthuntPageURL = 'https://www.producthunt.com/@saaswarrior';

api.get(producthuntPageURL).then(handleCrawlResponse).catch(handleCrawlError);

function handleCrawlResponse(response) {
  if (response.statusCode === 200) {
    fs.writeFileSync('profile_page.html', response.body);
    processProfileInfo();
  }
}

function handleCrawlError(error) {
  console.error(error);
}

function processProfileInfo() {
  const htmlContent = fs.readFileSync('profile_page.html', 'utf8'),
    $ = cheerio.load(htmlContent),
    profileInfo = {};

  profileInfo.avatar = $('div[data-test="userImage"] img.styles_image__Je5S2').attr('src');
  profileInfo.name = $('h1.color-darker-grey.fontSize-24.fontWeight-600').text().trim();
  profileInfo.headline = $('div.color-lighter-grey.fontSize-18.fontWeight-300').text().trim();
  profileInfo.userId = $('div.color-lighter-grey.fontSize-14.fontWeight-400').first().text().trim();
  profileInfo.followers = $('a[href="/@saaswarrior/followers"]').text().trim();
  profileInfo.following = $('a[href="/@saaswarrior/following"]').text().replace(/\n\s+/g, ' ').trim();
  profileInfo.points = $('span.color-lighter-grey.fontSize-14.fontWeight-400:contains("points")').text().trim();
  profileInfo.streak = $('a[href="/visit-streaks?ref=profile_page"]')
    .contents()
    .filter(function () {
      return this.nodeType === 3;
    })
    .text()
    .replace(/\n\s+/g, ' ')
    .trim();

  profileInfo.products = [];
  $('.styles_even__Qeyum, .styles_odd__wazk7').each((index, element) => {
    const product = {
      name: $(element).find('img.styles_thumbnail__Y9ZpZ').attr('alt'),
      imageSrc: $(element).find('img.styles_thumbnail__Y9ZpZ').attr('src'),
    };
    profileInfo.products.push(product);
  });
  profileInfo.about = $('.styles_aboutText__AnpTz').text().replace(/\n\s+/g, ' ').trim();

  profileInfo.socialLinks = [];
  $('.styles_userLink__eDq16').each((index, element) => {
    const link = {
      title: $(element).text().trim(),
      url: $(element).attr('href'),
    };
    profileInfo.socialLinks.push(link);
  });

  profileInfo.interests = [];
  $('.styles_topicLink__WH5Y6').each((index, element) => {
    const interest = $(element).text().trim();
    profileInfo.interests.push(interest);
  });

  profileInfo.badges = [];
  $('.styles_badge__HPZB8').each((index, element) => {
    const badge = {
      name: $(element).find('.color-darker-grey.fontSize-14.fontWeight-600').text().trim(),
      imageSrc: $(element).find('img').attr('src'),
    };
    profileInfo.badges.push(badge);
  });

  const jsonData = JSON.stringify(profileInfo, null, 2);
  console.log(jsonData);
}
Enter fullscreen mode Exit fullscreen mode

JSON Response:

{
  "avatar": "https://ph-avatars.imgix.net/2530835/original?auto=compress&codec=mozjpeg&cs=strip&auto=format&w=120&h=120&fit=crop",
  "name": "Ankit Sharma",
  "headline": "Founder of SaasWarrior",
  "userId": "#2530835",
  "followers": "2,807 followers",
  "following": "110 following",
  "points": "1,414 points",
  "streak": "🔥 793 day streak",
  "products": [
    {
      "name": "Canva",
      "imageSrc": "https://ph-files.imgix.net/d7c5e3c2-fab2-42e4-afe3-e525a4c8a953.jpeg?auto=compress&codec=mozjpeg&cs=strip&auto=format&w=24&h=24&fit=crop"
    },
    {
      "name": "Facebook",
      "imageSrc": "https://ph-files.imgix.net/91ffb275-f64b-4915-ba70-b77dd6540b71.png?auto=compress&codec=mozjpeg&cs=strip&auto=format&w=24&h=24&fit=crop"
    },
    {}
  ],
  "about": "Software is something I love. I have been searching for and evaluating new tools ever since I was in school and I still do. 🤩",
  "socialLinks": [
    {
      "title": "Twitter",
      "url": "https://twitter.com/iamsaaswarrior"
    },
    {
      "title": "Facebook",
      "url": "https://www.facebook.com/groups/saaswarrior/"
    },
    {
      "title": "Linkedin",
      "url": "https://www.linkedin.com/in/ankitsharmaofficial/"
    }
  ],
  "interests": ["Design Tools", "Marketing", "SEO", "Artificial Intelligence", "Tech", "Animation"],
  "badges": [
    {
      "name": "Good find 🧐",
      "imageSrc": "https://ph-files.imgix.net/855ca417-a531-4de4-b205-28cbf1d6f85a.png?auto=compress&codec=mozjpeg&cs=strip&auto=format&w=44&h=44&fit=max"
    },
    {
      "name": "Pixel perfection 💎",
      "imageSrc": "https://ph-files.imgix.net/5d0878a7-4f73-4ffd-85f3-219eeff97a2f.png?auto=compress&codec=mozjpeg&cs=strip&auto=format&w=44&h=44&fit=max"
    },
    {
      "name": "Bright Idea 💡",
      "imageSrc": "https://ph-files.imgix.net/996af07f-85bc-455c-8289-ffcddf7132d7.png?auto=compress&codec=mozjpeg&cs=strip&auto=format&w=44&h=44&fit=max"
    },
    {
      "name": "Plugged in 🔌",
      "imageSrc": "https://ph-files.imgix.net/9e2c38ac-2858-44a0-958f-9b482a7474c6.png?auto=compress&codec=mozjpeg&cs=strip&auto=format&w=44&h=44&fit=max"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

This guide offers information and tools to help you scrape data from Product Hunt using JavaScript and the Crawlbase Crawling API. You can gather various data sets, such as user profile details (user id, name, followers, following, points, social links, interests, badges) and information about different products (product name, image, description, rating, reviews). Whether you're new to web scraping or have some experience, these tips will get you started. If you're interested in trying scraping on other websites like Etsy, Walmart, or Glassdoor, we have more guides for you to explore.

Related JavaScript guides:

Frequently Asked Questions

Are there any rate limits or IP blocking measures for scraping data from Product Hunt?

Product Hunt might enforce rate limits and IP blocking measures to prevent abuse and ensure fair usage of their platform. Excessive or aggressive scraping could trigger these mechanisms, resulting in temporary or permanent blocks. To mitigate this, it's recommended to use a reliable solution like the Crawlbase Crawling API. This API allows users to scrape websites without concerns about rate limits or IP blocks as it manages requests through a pool of rotating IP addresses. Integrating Crawlbase into your development process ensures a smoother scraping experience, avoids disruptions, and ensures you follow Product Hunt's policies effectively.

What information can be extracted from Product Hunt profiles?

You can extract useful information from Product Hunt profiles. This info includes the product's name, description, details about the maker, upvote count, release date, and user comments. The product description tells you about its features, while the maker information details who created it. Upvote counts show how much the community likes it. Release dates give you a timeline, and user comments offer feedback and discussions, giving you an idea of user experiences.

Can I use the scraped data for commercial purposes?

If you want to use data you get from scraping Product Hunt for commercial reasons, you have to follow Product Hunt's rules. It's important to read and follow their policies because they say what you can and can't do with their data. Using the data for commercial purposes without permission might break their rules and lead to legal problems. If you plan to use the data for commercial purposes, ask Product Hunt for permission or check if they have an official way (like an API) that lets you use the data for business. Following the platform's rules is important to use the data fairly and legally.

What are the limitations of Product Hunt API?

The Product Hunt API has several limitations, including the default commercial use restriction. Users must contact Product Hunt for approval to use it for business purposes. Additionally, the API employs OAuth2 token authentication and may have rate limits to prevent misuse. For alternative scraping solutions, Crawlbase Crawling API offers a robust option. It facilitates web scraping without rate limits or IP blocks, using a pool of rotating IP addresses. This helps ensure uninterrupted data retrieval. Crawlbase is a useful tool for developers seeking a reliable and efficient solution for web scraping, particularly in scenarios where rate limits are a concern.

Top comments (0)