NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet
Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5. Example: Calculating WordNet Synset Similarity 6. Other Functionalities
What is NLTK? A tool consisting of a collection of libraries and programs in python that allows for customization and optimization of NLP processes Downloading
What is NLTK? NLP tools typically use other NLP tools Other tools include Wordnet Stanford Dependency Parser Conceptnet DBPedia Google Mate-Tools
Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5. Other Functionalities 6. Works Cited
NLTK Basic Functionalities 1. Sentence Tokenization 2. Word Tokenization 3. Wordnet, Synsets, and Synonyms 4. Stemming Words and Lemmas
Sentence Tokenization Basic Tokenization Statistically Based Training Methodology Tokenizing for Multiple Sentences Pickle File Tokenizing with Other Languages
Word Tokenization Basic Word Tokenizer Penn Treebank Project Other Types of Word Tokenizers: PunctWordTokenizer: splits on punctuation but keeps it with the punctuation with the associated word token WordPunctTokenizer: splits all punctuation onto separate tokens Word Tokenizers and Regular Expressions Match on tokens separators, or gaps Stopwords and Filtering
Wordnet, Synsets, and Synonyms Wordnet is a tool integrated into NLTK that contains listings of word relations (i.e. a lexical database) Groupings of synonymous meanings that express the same concept are synset instances Expressed in a tree Hypernyms and Hyponyms Synonyms and Antonyms
Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5. Other Functionalities 6. Works Cited
POS Tagging String Representation for Tagged Tokens (tuples) Default Tagging Tagging based off a Trained Corpus (Brown)
POS Tagging Types of Tagging Unigram/Bigram Tagger Regexp Tagging Brill: uses and initial tagger than then applies transformation rules learned from the training corpus using “rule templates”
Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5. Other Functionalities 6. Works Cited
Chunking and Trees Default Chunking Trees and Parsing Drawing Trees
Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5. Other Functionalities 6. Works Cited
Other Functionalities Replacing and Correcting Words Calculating WordNet Synset Similarity Word Collections Text Classification Transforming Chunks and Trees Processes for Distributed Processing and Handling Large Datasets Parsing for Specific Data(Location, Dates and Times)
Works Cited Perkins, Jacob. Python Text Processing with NLTK 2.0 Cookbook. n_treebank_pos.html n_treebank_pos.html