NLP Pipeline
NLP Applications
- Tokenization
- Stop Words Removal
- Text Normalization (e.g. U.S.A. -> USA)
- Spelling Correction
- Lemmatization (or sometimes Stemming)
- find equivalence classes (using thesauri, e.g. WordNet) (semantic stuff)
- POS Tagging
- Named Entity Recognition
- building Statistical Language Models
Information Retrieval
- Tokenization
- Stop Words Removal
- Text Normalization
- Stemming or Lemmatization
- Spelling Correction
- Phonetic Normalization (e.g. with Soundex)
- find equivalence classes (using thesauri, e.g. WordNet) (semantic stuff)
- Named Entity Recognition
- building Inverted Index and Vector Space Model