I was recently in a situation of needing sports data - soccer data in particular. I am writing this post, as I had to overcome some initial problems. You should be able to follow my thoughts and my path that lead me to a solution.
For this tutorial, I came across fleshscore.com, a website that provides plenty of leagues and covers fixtures and live matches.
I started with the following basic script:
const axios = require('axios');
//performing a GET request
axios.get('https://www.flashscore.com/')
.then(response => {
//handling the success
const html = response.data;
console.log(response.data);
})
//handling error
.catch( error => {
console.log(error);
});
To investigate what is being returned by the script, I forwarded all returned content into a test.html
file.
node scraper.js > test.html
After I had opened the HTML-file inside my browser, I quickly realized that all match information, that was shown on the original website, was missing. This was not a big surprise, as I expected the content to be rendered by javascript.
As the script above is written in nodejs, I started to play around with puppeteer, which is a node library that provides a high-level API to control headless Chrome or Chromium.
After some time, I ended up with the following piece of code:
const puppeteer = require ('puppeteer');
//initiating Puppeteer
puppeteer
.launch ()
.then (async browser => {
//opening a new page and navigating to Fleshscore
const page = await browser.newPage ();
await page.goto ('https://www.flashscore.com/');
await page.waitForSelector ('body');
//manipulating the page's content
let grabMatches = await page.evaluate (() => {
let allLiveMatches = document.body.querySelectorAll ('.event__match--oneLine');
//storing the post items in an array then selecting for retrieving content
scrapeItems = [];
allLiveMatches.forEach (item => {
let postDescription = '';
try {
let homeTeam = item.querySelector ('.event__participant--home').innerText;
let awayTeam = item.querySelector ('.event__participant--away').innerText;
let currentHomeScore = item.querySelector('.event__scores.fontBold span:nth-of-type(1)').innerText;
let currentAwayScore = item.querySelector('.event__scores.fontBold span:nth-of-type(2)').innerText;
scrapeItems.push ({
homeTeam: homeTeam,
awayTeam: awayTeam,
currentHomeScore: currentHomeScore,
currentAwayScore: currentAwayScore,
});
} catch (err) {}
});
let items = {
"liveMatches": scrapeItems,
};
return items;
});
//outputting the scraped data
console.log (grabMatches);
//closing the browser
await browser.close ();
})
//handling any errors
.catch (function (err) {
console.error (err);
});
Now I ran the script again with the following command:
node scraper.js
As you can see I retrieved a beautiful list of JSON data.
Now, of course, there is plenty of work that could be spent to sort the data by the league, country, etc. etc.
For my use case, this snippet was enough. If you aim for more serious scraping, you may as well pick a general sports- or soccer API (I.e. sportdataapi.com, xmlsoccer.com.
Happy Scraping :-)
Top comments (1)
Hello,
How to insert this fetched data in a databases ... using mysql.