Automatic generation of the timeline — a graphical representation of a time period, on which important events are marked — from a Wikipedia article is a fascinating idea and very useful in quickly grasping the historical perspective. This post outlines the approach to create a well formatted timeline from any Wikipedia article using WinkNLP’s API and Named Entity Recognition (NER) feature:
- Fetch the article's contents and convert them into a WinkNLP document.
- Iterate through detected entities and filter only DATEs.
- Use shapes of dates to convert them into standard Unix time.
- Using parentSentence() API, extract the sentence containing the date; also markup() the date to highlight it in the corresponding sentence.
- Collect each Unix time and sentence pair in an array and sort them on Unix time.
- Converts this array into a well formatted timeline using Observable capabilities along with some CSS.
The above approach is realized in about 30 lines of code:
timeLine = {
const response = await fetch( `https://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=${WikiArticleTitle || '2022 United Nations Climate Change Conference'}&explaintext=1&formatversion=2&format=json&origin=*` );
const body = await response.json();
const text = body.query.pages[ 0 ].extract;
var doc = nlp.readDoc( text || '' );
var timeline = [];
doc
.entities()
.filter( ( e ) => {
var shapes = e.tokens().out( its.shape );
// We only want dates that can be converted to an actual
// time using new Date()
return (
e.out( its.type ) === 'DATE' &&
(
shapes[ 0 ] === 'dddd' ||
( shapes[ 0 ] === 'Xxxxx' && shapes[ 1 ] === 'dddd' ) ||
( shapes[ 0 ] === 'Xxxx' && shapes[ 1 ] === 'dddd' ) ||
( shapes[ 0 ] === 'dd' && shapes[ 1 ] === 'Xxxxx' && shapes[ 2 ] === 'dddd' ) ||
( shapes[ 0 ] === 'dd' && shapes[ 1 ] === 'Xxxx' && shapes[ 2 ] === 'dddd' ) ||
( shapes[ 0 ] === 'd' && shapes[ 1 ] === 'Xxxxx' && shapes[ 2 ] === 'dddd' ) ||
( shapes[ 0 ] === 'd' && shapes[ 1 ] === 'Xxxx' && shapes[ 2 ] === 'dddd' )
)
);
})
.each( ( e ) => {
e.markup();
let eventDate = e.out();
if ( isNaN( eventDate[ 0 ] ) ) eventDate = '1 ' + eventDate;
timeline.push({
date: e.out(),
unixTime: new Date( eventDate ).getTime() / 1000,
sentence: e.parentSentence().out( its.markedUpText )
})
});
return timeline.sort( ( a, b ) => a.unixTime - b.unixTime )
}
You can see it in action on an interactive Observable notebook — "How to visualize timeline of a Wiki article?".
About winkNLP
WinkNLP is a developer friendly JavaScript library for Natural Language Processing (NLP). It can easily process large amount of raw text at speeds over 650,000 tokens/second on a M1 Macbook Pro in both browser and Node.js environments. It even runs smoothly on a low-end smartphone's browser.
It is built ground up with a lean code base that has no external dependency. A test coverage of ~100% and compliance with the Open Source Security Foundation best practices make winkNLP the ideal tool for building production grade systems with confidence.
Top comments (0)