Sometimes you need to get the information presented on a webpage in a structured form out of that page. Let's have a look at your dev.to reading list for example. Let's imagine, that we want to extract a list of the articles listed there. So let's scroll through our reading list, open the console and let's get going!
Let's inspect the elements, that contain the links to the articles.
We can see, that the anchor, which contains the link has a class item
. So let's try to grab all elements of that page with class="item
.
document.querySelectorAll('.item')
This will return a NodeList
with the selected elements.
Next, we want to convert this NodeList to an array, because it's easier to iterate on that. We use an Array.from
for that:
Array.from(document.querySelectorAll('.item'))
We now have an array with the selected DOM elements, that contain all the necessary information. To get an array of just the links, we can simply access the href
property of our DOM elements.
Array.from(document.querySelectorAll('.item'))
.map(a => a.href)
But it would be nicer, to also have the title. So let's have a look at the DOM structure again:
We can see, that the title is contained in a div with the class item-title
inside of the already selected anchor. So we can use another querySelector
on that anchor to get the title:
Array.from(document.querySelectorAll('.item')).map(a => ({
href: a.href,
title: a.querySelector('.item-title').innerText,
}))
To access the text content of a DOM node we can use the innerText
prop.
Well done! We now have all the information of our reading list as structured content.
If you want to get your links as a pastable markdown snippet, you wouldn't return an array in the map function, but a string in the structure [title](href)
. Afterwards you can use reduce
to boil the array down to just one string, that contains the links as a list.
Array.from(document.querySelectorAll('.item'))
.map(a => `[${a.querySelector('.item-title').innerText}](${a.href})`)
.reduce((acc, e) => `${acc}\n* ${e}`, '')
Top comments (0)