Coming from years of web developing React-Native feels like a fresh start to me. You get better access to native functionality AND you have fewer rules imposed to your app. For example, you can use fetch()
toy get any website you want. What this enables is client site web crawling.
Why
Maybe you need data from a service, but they don't expose an API or the API doesn't give you all the data you need or the API is simply bad. Normally you would have to set up a server that crawls the target website and turns it into an API that you can use, but when you can access all data from all websites inside your client, you can save time.
Lets take the Amazon website for example. You want to show all products of a page and a way to load the next, but you want it in our own data structure, so you can build your own UI around it.
How
- Get the HTML from the server
- Extract the needed data from the HTML
- Reshape the data for our use
1 Get the HTML from the Server
That's the easy part.
async function loadGraphicCards(page = 1) {
const searchUrl = `https://www.amazon.de/s/?page=${page}&keywords=graphic+card`;
const response = await fetch(searchUrl); // fetch page
const htmlString = await response.text(); // get response text
...
}
Fetching a URL with a search pattern returns a HTML page with some items.
2 Extract the Needed Data from the HTML
This is a bit trickier. The data is inside the HTML, but it's a string.
The naive approach would be to use a regular expression to parse the string and get the data, but HTML doesn't have a regular grammar so that wouldn't work.
The better way is to use a HTML parser and CSS selectors.
Cheerio is this solution. It comes with a HTML parser and a re-implementation of jQuerys core functionality, so you can use it on Node.js.
Problem is, React-Native is missing most Node.js packages so it doesn't work.
I searched quite some time to finde a re-implementation of Cheerio that works on React-Native the naming of the package was a bit strange, haha.
But with this, the extraction of the data is now childs play too.
async function loadGraphicCards(page = 1) {
const searchUrl = `https://www.amazon.de/s/?page=${page}&keywords=graphic+card`;
const response = await fetch(searchUrl); // fetch page
const htmlString = await response.text(); // get response text
const $ = cheerio.load(htmlString); // parse HTML string
const liList = $("#s-results-list-atf > li"); // select result <li>s
...
}
3 Reshape the Data for further Use
After the data has been extracted from the HTML, we can start to reshape it for our use-cases. Extraction and reshaping are a bit blurry here, the <li>
s we selected are full of markup and getting the right data out of them is extraction too, but often these two steps go hand-in-hand.
async function loadGraphicCards(page = 1) {
const searchUrl = `https://www.amazon.de/s/?page=${page}&keywords=graphic+card`;
const response = await fetch(searchUrl); // fetch page
const htmlString = await response.text(); // get response text
const $ = cheerio.load(htmlString); // parse HTML string
return $("#s-results-list-atf > li") // select result <li>s
.map((_, li) => ({ // map to an list of objects
asin: $(li).data("asin"),
title: $("h2", li).text(),
price: $("span.a-color-price", li).text(),
rating: $("span.a-icon-alt", li).text(),
imageUrl: $("img.s-access-image").attr("src")
}));
}
This is not a robust example, but I think you get the idea. We can now use the new list of objects in our app to make our own UI for the Amazon results.
class App extends ReactComponent {
state = {
page: 0,
items: [],
};
componentDidMount = () => this.loadNextPage();
loadNextPage = () =>
this.setState(async state => {
const page = state.page + 1;
const items = await loadGraphicCards(page);
return {items, page};
});
render = () => (
<ScrollView>
{this.state.items.map(item => <Item {...item} key={item.asin}/>)}
</ScrollView>
);
}
const Item = props => (
<TouchableOpacity onPress={() => alert("ASIN:" + props.asin)}>
<Text>{props.title}</Text>
<Image source={{uri: props.imageUrl}}/>
<Text>{props.price}</Text>
<Text>{props.rating}</Text>
</TouchableOpacity>
);
Conclusion
As with most problems, if you have the right tools solutions can become simple. Often the problem is more about finding these tools :D
This client side crawling approach can be used to build quick prototypes without the need of an API. Amazon is so nice to deliver okay-ish static HTML, so it works rather well on their sites.
Top comments (17)
Glad this article is still helpful after all that time :D
Hello K, great post learned a lot I didnt know this was possible using fetch. Just one quick question. How would you manage if you wanted to fetch some quick data in front end (react) but had to enter information in an input tag and maybe even click a button? I hope you can help me out a bit
Glad you liked it.
I'd use React hooks.
That is great I think that might works thanks a lot! I'm just trying to figure out this error:
Access to fetch at 'MyUrl' from origin 'localhost:8100' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's. I've found to fix it adding {mode:"no-cors"} in the fetch call but the object returns null. Do you know anything about this kind of error?
Hi K, this awesome post.
So already try and it's working fine,
But I have some problem,
I try scrape from streaming service website (anime).
But there are no video tag inside the website.
So I try to inspect element, then I saw that website need to click "play" button then I got ifarame embedded html document with video tag.
So how can i do click on cheerio then get embeded video?
Thanks
Sorry, I don't know if Cheerio works with JavaScript sites.
One way to solve this would be to check if you could calculate the video URL from the data that is already in the HTML.
Otherwise I don't know.
why Async ? what if just fetch ?
You can use fetch without async/await. React-Native supports async/await, that's why I used it, but it isn't needed, you can use promises directly :)
Nice one! I had no clue there was a jquery-like tool for RN. Very useful.
Thankyou! Impressive breakdown of web crawling in React-Native! Your methodical approach and clear explanations make it accessible for anyone diving into this field. Don't forget to streamline your efforts with Crawlbase for enhanced efficiency.
Hi K, could the above crawling applies to reactjs or only to react native?
Only React-Native, because you can't access other websites from within a browsers, just sites from the same domain or such that are CORS enabled.
No, it's not working. ERROR.
ESLint Parsing error: Unexpected token
Please go in depth, i couldn't get cheerio-without-node-native to work
There's also react-native-cheerio, I've not yet used it myself but, obviously, I'm doing the research, also.
I'm new to this react native thing... I've tried your exact method as of now yet nothing is displayed on my react native mobile app.
I'm new to this so your help will matter alot.
There's no error but also nothing's displayed.