Web scraping means extracting data from websites. This post covers extracting data from the page's HTML tags.
Prerequisites
cheerio
package is installedHTML page is retrieved via an HTTP client
Usage
- create a scraper object with
load
method by passing HTML content as an argument- set
decodeEntities
option to false to preserve encoded characters (like &) in their original form
- set
const $ = load('<div><!-- HTML content --></div>', { decodeEntities: false });
- find DOM elements by using CSS-like selectors
const items = $('.item');
- iterate through found elements using
each
method
items.each((index, element) => {
// ...
});
access element content using specific methods
- text
$(element).text()
- HTML
$(element).html()
- attributes
- all
$(element).attr()
- specific one
$(element).attr('href')
- all
- child elements
- first
$(element).first()
- last
$(element).last()
- all
$(element).children()
- specific one
$(element).find('a')
- first
- siblings
- previous
$(element).prev()
- next
$(element).next()
- previous
Disclaimer
Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.
Related reading
Demo
The demo with the mentioned examples is available here.
Top comments (0)