Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing
– 2 – CSCE 771 Spring 2013 Overview Last Time NLTK book
– 3 – CSCE 771 Spring 2013 Chomsky on You-tube
– 4 – CSCE 771 Spring 2013 Python Text Processing with NLTK 2.0 Cookbook 1.Tokenizing Text and WordNet Basics 2.Replacing and Correcting Words 3.Creating Custom Corpora 4.Part-of-Speech Tagging 5.Extracting Chunks 6.Transforming Chunks and Trees 7.Text Classification 8.Distributed Processing and Handling Large Datasets 9.Parsing Specific Data
– 5 – CSCE 771 Spring 2013 Chapter 1. Tokenizing Text and WordNet Basics In this chapter, we will cover: Tokenizing text into sentences Tokenizing sentences into words Tokenizing sentences using regular expressions Filtering stopwords in a tokenized sentence Looking up synsets for a word in WordNet Looking up lemmas and synonyms in WordNet Calculating WordNet synset similarity Discovering word collocations
– 6 – CSCE 771 Spring 2013 Chapter 2. Replacing and Correcting Words In this chapter, we will cover: Stemming words Lemmatizing words with WordNet Translating text with Babelfish Replacing words matching regular expressions Removing repeating characters Spelling correction with Enchant Replacing synonyms Replacing negations with antonyms Perkins, Jacob ( ). Python Text Processing with NLTK 2.0 Cookbook (p. 25). Packt Publishing. Kindle Edition.
– 7 – CSCE 771 Spring 2013 Chapter 3. Creating Custom Corpora In this chapter, we will cover: Setting up a custom corpus Creating a word list corpus Creating a part- of-speech tagged word corpus Creating a chunked phrase corpus Creating a categorized text corpus Creating a categorized chunk corpus reader Lazy corpus loading Creating a custom corpus view Creating a MongoDB backed corpus reader Corpus editing with file locking Perkins, Jacob ( ). Python Text Processing with NLTK 2.0 Cookbook (p. 45). Packt Publishing. Kindle Edition.
– 8 – CSCE 771 Spring 2013 Chapter 4. Part-of-Speech Tagging 1.Default tagging 2.Training a unigram part-of-speech tagger 3.Combining taggers with backoff tagging 4.Training and combining 5.Ngram taggers 6.Creating a model of likely word tags 7.Tagging with regular expressions 8.Affix tagging 9.Training a Brill tagger 10.Training the TnT tagger 11.Using WordNet for tagging Tagging proper names
– 9 – CSCE 771 Spring 2013 Chapter 5. Extracting Chunks Chapter 5. Extracting Chunks In this chapter, we will cover: Chunking and chinking with regular expressions Merging and splitting chunks with regular expressions Expanding and removing chunks with regular expressions Partial parsing with regular expressions Training a tagger-based chunker Classification-based chunking Extracting named entities Extracting proper noun chunks Extracting location chunks Training a named entity chunker Perkins, Jacob ( ). Python Text Processing with NLTK 2.0 Cookbook (p. 111). Packt Publishing. Kindle Edition.
– 10 – CSCE 771 Spring 2013 Chapter 6. Transforming Chunks and Trees In this chapter, we will cover: Filtering insignificant words Correcting verb forms Swapping verb phrases Swapping noun cardinals Swapping infinitive phrases Singularizing plural nouns Chaining chunk transformations Converting a chunk tree to text Flattening a deep tree Creating a shallow tree Converting tree nodes Perkins, Jacob ( ). Python Text Processing with NLTK 2.0 Cookbook (p. 143). Packt Publishing. Kindle Edition.
– 11 – CSCE 771 Spring 2013 Chapter 7. Text Classification Chapter 7. Text Classification In this chapter, we will cover: Bag of Words feature extraction Training a naive Bayes classifier Training a decision tree classifier Training a maximum entropy classifier Measuring precision and recall of a classifier Calculating high information words Combining classifiers with voting Classifying with multiple binary classifiers Perkins, Jacob ( ). Python Text Processing with NLTK 2.0 Cookbook (p. 167). Packt Publishing. Kindle Edition.
– 12 – CSCE 771 Spring 2013 Chapter 8. Distributed Processing and Handling Large Datasets In this chapter, we will cover: Distributed tagging with execnet Distributed chunking with execnet Parallel list processing with execnet Storing a frequency distribution in Redis Storing a conditional frequency distribution in Redis Storing an ordered dictionary in Redis Distributed word scoring with Redis and execnet Perkins, Jacob ( ). Python Text Processing with NLTK 2.0 Cookbook (p. 201). Packt Publishing. Kindle Edition.
– 13 – CSCE 771 Spring 2013 Chapter 9. Parsing Specific Data Chapter 9. Parsing Specific Data In this chapter, we will cover: Parsing dates and times with Dateutil Time zone lookup and conversion Tagging temporal expressions with Timex Extracting URLs from HTML with lxml Cleaning and stripping HTML Converting HTML entities with BeautifulSoup Detecting and converting character encodings Perkins, Jacob ( ). Python Text Processing with NLTK 2.0 Cookbook (p. 227). Packt Publishing. Kindle Edition.
– 14 – CSCE 771 Spring 2013
– 15 – CSCE 771 Spring 2013
– 16 – CSCE 771 Spring 2013
– 17 – CSCE 771 Spring 2013
– 18 – CSCE 771 Spring 2013
– 19 – CSCE 771 Spring 2013