DEV Community

Željko Šević
Željko Šević

Posted on • Originally published at sevic.dev on

Web scraping with jsdom

Web scraping means extracting data from websites. This post covers extracting data from the page's HTML when data is stored in JavaScript variable or stringified JSON.

The scraping prerequisite is retrieving an HTML page via an HTTP client.

Examples

The example below moves data into a global variable, executes the page scripts and accesses the data from the global variable.

import jsdom from 'jsdom';

fetch(URL)
  .then((res) => res.text())
  .then((response) => {
    const dataVariable = 'someVariable.someField';
    const html = response.replace(dataVariable, `var data=${dataVariable}`);

    const dom = new jsdom.JSDOM(html, {
      runScripts: 'dangerously',
      virtualConsole: new jsdom.VirtualConsole(),
    });

    console.log('data', dom?.window?.data);
  });
Enter fullscreen mode Exit fullscreen mode

The example below runs the page scripts, and access stringified JSON data.

import jsdom from 'jsdom';

fetch(URL)
  .then((res) => res.text())
  .then((response) => {
    const dom = new jsdom.JSDOM(response, {
      runScripts: 'dangerously',
      virtualConsole: new jsdom.VirtualConsole(),
    });

    const data = dom?.window?.document?.getElementById('someId')?.value;

    console.log('data', JSON.parse(data));
  });
Enter fullscreen mode Exit fullscreen mode

Disclaimer

Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.

Boilerplate

Here is the link to the boilerplate I use for the development.

Top comments (0)