Web scraping with cheerio

#scraping #cheerio #node #javascript

Web scraping means extracting data from websites. This post covers extracting data from the page's HTML tags.

create a scraper object with load method by passing HTML content as an argument
- set decodeEntities option to false to preserve encoded characters (like &) in their original form

  const $ = load('<div><!-- HTML content --></div>', { decodeEntities: false });

  const items = $('.item');

  items.each((index, element) => {
    // ...
  });

access element content using specific methods

text $(element).text()
HTML $(element).html()
attributes
- all $(element).attr()
- specific one $(element).attr('href')
child elements
- first $(element).first()
- last $(element).last()
- all $(element).children()
- specific one $(element).find('a')
siblings
- previous $(element).prev()
- next $(element).next()

Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.

Build your SaaS in 2 weeks - Start Now

DEV Community