DEV Community

Cover image for GETTING STARTED WITH NATURAL LANGUAGE PROCESSING
Huynh-Chinh
Huynh-Chinh

Posted on • Edited on

GETTING STARTED WITH NATURAL LANGUAGE PROCESSING

Introduction

Natural language processing (NLP) is concerned with enabling computers to interpret, analyze, and approximate the generation of human speech. Typically, this would refer to tasks such as generating responses to questions, translating languages, identifying languages, summarizing documents, understanding the sentiment of text, spell checking, speech recognition, and many other tasks. The field is at the intersection of linguistics, AI, and computer science.

Roadmap of NLP for Machine Learning

1. Pre-processing

  • Sentence cleaning
  • Stop Words
  • Regular Expression
  • Tokenization
  • N-grams (Unigram, Bigram, Trigram)
  • Text Normalization
  • Stemming
  • Lemmatization

read more...

2. Linguistics

  • Part-of-Speech Tags
  • Constituency Parsing
  • Dependency Parsing
  • Syntactic Parsing
  • Semantic Analysis
  • Lexical Semantics
  • Coreference Resolution
  • Chunking
  • Entity Extraction/ Named Entity Recognition(NER)
  • Named Entity Disambiguation/ Entity Linking
  • Knowledge Graphs

3. Word Embeddings

a. Frequency-based Word Embedding

  • One Hot Encoding
  • Bag of Words or CountVectorizer()
  • TFIDF of TFIDFVectorizer()
  • Co-occurrence Matrix, Co-occurrence Vector
  • Hashing Vectorizer

b. Pretrained Word Embedding

  • Word2Vec (by Google): CBOW, Skip-Gram
  • GloVe (by Stanford)
  • fastText (by Facebook)

4. Topic Modeling

  • Latent Semantic Analysis (LSA)
  • Probabilistic Latent Semantic Analysis (pLSA)
  • Latent Dirichlet Allocation (LDA)
  • lda2Vec
  • Non-Negative Matrix Factorization (NMF)

5. NLP with Deep Learning

  • Machine Learning (Logistic Regression, SVM, Naïve Bayes)
  • Embedding Layer
  • Artificial Neural Network
  • Deep Neural Network
  • Convolution Neural Network
  • RNN/LSTM/GRU
  • Bi-RNN/Bi-LSTM/Bi-GRU
  • Pretrained Language Models: ELMo, ULMFiT
  • Sequence-to-Sequence/Encoder-Decoder
  • Transformers (attention mechanism)
  • Encoder-only Transformers: BERT
  • Decoder-only Transformers: GPT
  • Transfer Learning

6. Example Use cases

  • Sentiment Analysis
  • Question Answering
  • Language Translation
  • Text/Intent Classification
  • Text Summarization
  • Text Similarity
  • Text Clustering
  • Text Generation
  • Chatbots (DialogFlow, RASA, Self-made Bots)

7. Libraries

  • NLTK
  • Spacy
  • Gensim

Conclusion
Thank you very much for taking time to read this. I would really appreciate any comment in the comment section.
Enjoy🎉

Top comments (0)