Natural Language Processing commonly known as NLP to the Machine Learning experts is a field that is rapidly evolving in the present times. With the advent of AI bots like Siri, Cortana, Alexa, and Google Assistant the use of NLP has increased many folds. People are trying to build models that can better understand human languages like English, Spanish, Mandarin, Hindi, Japanese, etc which are formally known as Natural Languages.
The most common uses of Natural Language Processing in our daily life are Search Engines, Machine translation, Chatbots, and Home assistants.
Defining the two terms Natural Language and Natural Language Processing in a much more formal way.
Natural Language
Natural language is a language that has developed naturally in humans.
Natural Language Processing
Natural Language Processing (NLP) is the ability of a computer program to understand human languages as it is spoken. The ultimate objective of NLP is to read, decipher, understand, and make sense of the human languages in a manner that is valuable.
NLP basically has two most important parts
1.Natural Language Understanding
2.Natural Language Generation
Natural Language Understanding
Natural Language Understanding means that a machine learning or deep learning model is able to understand the language spoken by humans. In other words, the system is able to comprehend the sentences spoken or written by us. It can be used to solve many real-world problems like Question-Answering, Query resolution, Sentiment Analysis, Similarity detection in texts, and Chatbots. If a system is able to understand the natural language then only it is able to reply to our answers.
Natural Language Generation
Natural Language Generation is the ability of a machine learning model to generate output in the form of text or audio which is similar to human-comprehensible language. In this task, we generate sentences from predefined text datasets using the model. It is used for summarization of text, replying to queries or questions, machine translation(translation from one language to another), and generation of answers.
In the past two to three years many advances have been made in the field of NLP. This has been possible due to increased resources in the form of large text datasets, Cloud platforms for the training of large models, the need of humans to communicate with computers in the language understandable by both. But the most important factor is the discovery of transformers and its architecture and the use of Transfer Learning in the field of NLP.
Now, the models are pre-trained on large datasets and then this pre-trained model with its parameters or weights adjusted is used to solve the required task. This process of using pre-trained models to solve actual problems is known as transfer learning. The pre-trained model is fine-tuned to do tasks like text classification, part-of-speech tagging, named entity recognition, summarization of text, and question-answering, etc. Some of the terms may be quite unknown to the people who are new to the field of machine learning or NLP feel free to ask about them in the comments section or just google them out for better understanding and deep-diving in the field of NLP.
Some of the recent advances in the field of Natural Language Processing that led to development of Large Language Models (LLMs) and tools like ChatGPT are given below
1 - Attention is All You Need
Google AI -June 2017
"Attention is all you need" this was a research paper published by Google employees. Ashish Vaswani et. al. published this paper which revolutionized the NLP industry. It was the first time the concept of transformers was referenced. Before this paper, RNN and CNN were used in the field of NLP but they had two problems
- Dealing with long term dependencies
- No parallelization during training
RNNs were not able to deal with long-term dependencies even with different improvements like Bidirectional RNNs or LSTMs and GRUs. Transformers with self-attention came to the rescue of these problems and made a breakthrough in NLP. It was state-of-the-art for seq2seq models which are used for language translation.
2 - ULMFiT (Universal Language Model Fine- Tuning)
fast.ai -May 2018
The other most important development was the use of transfer learning in the field of NLP. This language model introduced the concept of transfer learning to the NLP community. It is a single universal language model fine-tuned for multiple tasks. The same model can be fine-tuned to solve 3 different NLP tasks. AWD-LSTM forms the building block of this model. AWD stands for Asynchronous Stochastic Gradient Descent(ASGD) Weight Dropped.
3 - BERT (Bidirectional Encoder Representation from Transformers)
Google AI -November 2018
It uses the concept of both the above-mentioned advancements i.e. transformers and transfer learning. It does full bidirectional training of transformers. It is a SOTA(state-of-the-art) model for 11 NLP tasks. It is pre-trained on the whole English Wikipedia dataset consisting of almost 2.5 billion words.
4 - Google's Transformer-XL
Google AI -January 2019
This model outperformed even BERT in Language Modeling. It also resolved the issue of context fragmentation which was faced by the original Transformers.
5 - Stanford NLP
Stanford University -January 2019
The official site defines it as -
StanfordNLP is a Python natural language analysis package. It contains tools, which can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts of speech and morphological features, and to give a syntactic structure dependency parse.
It contains pre-trained neural models for 53 human languages. Thus increasing the scope of NLP to a global level instead of being constricted to just English.
6 - OpenAI's GPT-2
OpenAI -February 2019
GPT-2 stands for “Generative Pretrained Transformer 2” as the name suggests it is basically used for tasks concerned with the natural language generation part of NLP. This is the SOTA model for text generation. GPT-2 has the ability to generate a whole article based on small input sentences. It is also based on transformers. GPT-2 achieves state-of-the-art scores on a variety of domain-specific language modeling tasks. It is not trained on any of the data specific to any of these tasks and is only evaluated on them as a final test; this is known as the “zero-shot” setting.
7 - XLNet
CMU AI -June 2019
It uses Auto-regressive methods for language modeling instead of Auto-encoding used in BERT. It uses the best features of both BERT and TransformerXL.
8 - PyTorch-Transformers
Hugging face -July 2019
The folks at Hugging face have created a miracle by making PyTorch Transformers through which we can use BERT, XLNET, and TransformerXL like SOTA models with a few lines of Python code.
9 - Baidu's Enhanced Representation through kNowledge IntEgration
Baidu's ERNIE
Baidu Research -July 2019
The Chinese search giant Baidu made this model and it has the feature of continual pre-training. It is a pre-trained language understanding model that achieved state-of-the-art (SOTA) results and outperformed BERT and the recent XLNet in 16 NLP tasks in both Chinese and English.
10 - RoBERTa : A Robustly Optimized BERT pretraining approach
facebook research -July 2019
It is FacebookAI's improvement over BERT. The development team at FacebookAI optimized BERT's training process and hyperparameters to achieve this model.
11 - spaCy-PyTorch - Transformers
spaCy + Hugging face -August 2019
It is a PyTorch transformer for language processing. It is also used for the deployment of transformers. spaCy is used along with PyTorch to build the Transformers.
12 - Facebook AI's XLM/mBERT
facebook research -August 2019
Multilingual language model consisting of almost 100 languages. It is SOTA for cross-lingual classification and machine translation.
13 - Stanza
Stanford University -April 2020
It is the advanced version of the StanfordNLP and supports 66 languages. Stanza features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition.
Further reading for the better understanding of the topics mentioned above :-
1) https://arxiv.org/abs/1706.03762
2) https://arxiv.org/abs/1810.04805
3) http://jalammar.github.io/illustrated-bert/
4) https://openai.com/blog/better-language-models/
5) https://arxiv.org/abs/1907.11692
6) http://research.baidu.com/Blog/index-view?id=121
8) https://arxiv.org/abs/1901.02860
For the Spanish readers here is the translation of the post done by Chema Bescos
https://www.ibidem-translations.com/edu/traduccion-procesamiento-lenguaje-natural/
Top comments (2)
Fun fact :- Most of the NLP Models are named after muppets. Sesame Street is inspiration for most of the names. Some of the famous models named after Sesame Street characters are Bert, Elmo, Ernie, and Big Bird. These names are given also to acknowledge the influence and inspiration of one model over other.
News Alert :- OpenAI released GPT-3 just this month in June 2020. It is trained on 175 billion parameters approximately 100x than the previous one. It gives state-of-the-art results without gradient adjusting or fine tuning.