Digital news sources have flourished at an extraordinary rate, ranging from a handful of digital news posts to many digital news sources and publications. This is because news posts now cover a wide range of issues and events, increasing their reach. These publications not only represent the world but also change and shape our perception of it.
Storing news data is now common due to the high demand for instant access to historical news data, for which people commonly use the News API. These news datasets can be useful for research purposes and for personal and professional artificial intelligence (AI) and machine learning (ML).
If you are looking for historical news data to power your AI and ML algorithms, you can use these free news datasets or the Newsdata.io tool which I will mention below. News datasets can help you find a wide range of historical stories related to any topic, organization, person, and more.
In this article, we will discuss a simple and reliable way to access historical news data sets. Let’s get right into it.
Here are the top 20 news datasets that you can download for free for your personal and professional AI, machine learning, and data analytics projects.
- Newsdata.io
Name- Covid-19 news dataset
Link- https://newsdata.io/files/datasets/covid19-news
This Covid-19 dataset contains the latest world news related to Coronavirus.
- Kaggle.com
Name- BBC News Classification (News article categorization)
Link- https://www.kaggle.com/c/learn-ai-bbc
The dataset is broken into 1490 records for training and 735 for testing. The goal will be to build a system that can accurately classify previously unseen news articles into the right category.
- BBC
Name- BBC datasets
Link- http://mlg.ucd.ie/datasets/bbc.html
Two news article datasets, originating from BBC News, provided for use as benchmarks for machine learning research.
- Harvard Dataverse
Name- A Million News Headlines
Link- https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/SYBGZL
This contains data on news headlines published over a period of eighteen years. Sourced from the reputable Australian news source ABC (Australian Broadcasting Corporation)
- Newsdata.io
Name- Covid-19 and vaccine news dataset
Link- https://newsdata.io/files/datasets/covid-vaccine-news
This contains data on the latest published news headlines from across the web. News headlines with all the metadata and full description.
- Webz.io
Name- Political news articles
Link- https://webz.io/free-datasets/political-news-articles/
This contains world politics-related news article data fetch with the help of Webz.io news API.
- Paperswithcode
Name- COVID-19 Fake News Dataset
Link- https://paperswithcode.com/dataset/covid-19-fake-news-dataset
Along with the COVID-19 pandemic, we are also fighting an `infodemic’. Fake news and rumors are rampant on social media. Believing in rumors can cause significant harm.
- Kaggle
Name- India News Headlines Dataset
Link- https://www.kaggle.com/therohk/india-headlines-news-dataset
This news dataset is a persistent historical archive of notable events in the Indian subcontinent from start-2001 to end-2020, recorded in real-time by the journalists of India. It contains approximately 3.4 million events published by the Times of India.
- Data.world
Name- Economic News Article Tone
Link- https://data.world/crowdflower/economic-news-article-tone
Contributors read snippets of news articles. They then noted if the article was relevant to the US economy and, if so, what the tone of the article was.
- Archive.org
Name- World Politics news dataset
Link- https://archive.org/details/world-politics-news-dataset
This dataset contains the latest news related to politics around the world with the available news article’s metadata.
- IEEE.org
Name- Covid-19 and vaccine
Link- https://ieee-dataport.org/documents/covid-19-and-vaccine-news-dataset
This dataset contains world news related to Covid-19 and vaccine and also with the news article’s available metadata.
- IEEE.org
Name- World politics news
Link- https://ieee-dataport.org/documents/world-politics-news-dataset
This dataset contains world news related to politics and also with the news article’s available metadata.
- IEEE.org
Name- Covid-19 news
Link- https://ieee-dataport.org/documents/covid-19-news
This dataset contains all the latest news data related to Covid-19 from around the world.
- IEEE.org
Name- COVIFN : FAKE NEWS ON COVID19
Link- https://ieee-dataport.org/documents/covifn-fake-news-covid19
COVIFN is a CoVID-19-specific dataset that consists of fact-checked fake news scraped from Poynter and true news from news publishers’ verified portals. The dataset was pre-processed, the removal of special characters and non-vital information is performed.
- IEEE.org
Name- FAKE NEWS ON HEALTHCARE
Link- https://ieee-dataport.org/documents/fake-news-healthcare
The Internet is a vast repository of useful knowledge, but it has been contaminated by the spread of false information. Relying on misinformation can be disastrous. According to a World Health Organization survey, about 6,000 individuals were hospitalized throughout the world as a result of fake news on COVID-19 in the first three months of 2020.
- IEEE.org
Name- NEWS CREDIBILITY DATASET
Link- https://ieee-dataport.org/documents/news-credibility-dataset
Features of each news according to seven credibility categories
- IEEE.org
Name- AI-Based automated extraction of entities, entity categories, and sentiment on Covid-19 situation.
Artificial Intelligence (AI) based in-depth analysis of social media content would allow a strategic decision-maker to obtain evidence-based responses to complex queries.
- Kaggle
Name- Reddit Omicron Panic
Link- https://www.kaggle.com/yamqwe/reddit-omicron-panic
As we all know, a new variant of COVID-19 is spreading worldwide causing massive panic. This dataset captures mentions of the new variant on Reddit.
- Kaggle
Name- Omicron daily cases by country (COVID-19 variant)
Link- https://www.kaggle.com/yamqwe/omicron-covid19-variant-daily-cases
Tracking the progression of the new omicron COVID-19 variant
- IEEE.org
Name- Daily report of Covid-19 confirmed cases in Thailand.
Link- https://ieee-dataport.org/documents/daily-report-covid-19-confirmed-cases-thailand
A dataset contains a total of 578,375 COVID-19 confirmed cases reported in Thailand that were being recorded between 22 January 2021 to 30 July 2021.
Top comments (0)