Mallet & MaxEnt POS Tagging Shallow Processing Techniques for NLP Ling570 November 16, 2011
Roadmap Mallet Classifiers Testing Resources HW #8 MaxEnt POS Tagging POS Tagging as classification Feature engineering Sequence labeling
Mallet Commands Mallet command types: Data preparation Data/model inspection Training Classification Command line scripts Shell scripts Set up java environment Invoke java programs --help lists command line parameters for scripts
Mallet Data Mallet data instances: Instance_id label f1 v1 f2 v2 ….. Stored in internal binary format: “vectors” Binary format used by learners, decoders Need to convert text files to binary format
Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype –input data.vector- -training-portion output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc
Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype --training-portion output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc
Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype - -training-portion output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc --report: train:accuracy, test:f1:en
Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype - -training-portion output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc --report: train:accuracy, test:f1:en Can also use pre-split training & testing files e.g. output of vectors2vectors --training-file, --testing-file
Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype - -training-portion output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc --report: train:accuracy, test:f1:en Confusion Matrix, row=true, column=predicted accuracy=1.0 label 0 1 |total 0 de 1. |1 1 en. 1 |1 Summary. train accuracy mean = 1.0 stddev = 0 stderr = 0 Summary. test accuracy mean = 1.0 stddev = 0 stderr = 0
Accessing Classifiers classifier2info --classifier maxent.model Prints out contents of model file
Accessing Classifiers classifier2info --classifier maxent.model Prints out contents of model file FEATURES FOR CLASS en book the i
Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model
Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model Also instance file, directories: classify-file, classify-dir
Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model Also instance file, directories: classify-file, classify-dir Prints class,score matrix
Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile -- classifier maxent.model Also instance file, directories: classify-file, classify-dir Prints class,score matrix Inst_id class1 score1 class2 score2 array:0en0.995de array:1en0.970de array:2en0.064de0.935 array:3en0.094de0.905
General Use bin/mallet import-svmlight --input svmltrain.vectors.txt -- output svmltrain.vectors Builds binary representation from feature:value pairs
General Use bin/mallet import-svmlight --input svmltrain.vectors.txt -- output svmltrain.vectors Builds binary representation from feature:value pairs bin/mallet train-classifier --input svmltrain.vectors –trainer MaxEnt --output-classifier svml.model Trains MaxEnt classifier and stores model
General Use bin/mallet import-svmlight --input svmltrain.vectors.txt -- output svmltrain.vectors Builds binary representation from feature:value pairs bin/mallet train-classifier --input svmltrain.vectors –trainer MaxEnt --output-classifier svml.model Trains MaxEnt classifier and stores model bin/mallet classify-svmlight --input svmltest.vectors.txt -- output - --classifier svml.model Tests on the new data
Other Information Website: Download and documentation (such as it is)
Other Information Website: Download and documentation (such as it is) API tutorial:
Other Information Website: Download and documentation (such as it is) API tutorial: Local guide (refers to older version 0.4) k/mallet_guide.pdf
HW #8
Goals Get experience with Mallet Import data Build and evaluate classifiers
Goals Get experience with Mallet Import data Build and evaluate classifiers Build your own text classification systems w/Mallet 20 Newsgroups data Build your own feature extractor Train and test classifiers
Text Classification Q1: Build representations of 20 Newsgroups data Use mallet built-in functions text2vectors --input dropbox…/20_newsgroups/* --skip- headers --output news3.vectors Q2: Do the same thing but build your own featues
Feature Creation Skip headers Read data only from first blank line Simple Tokenization: Convert a non-alphabetic chars ([a-zA-Z]) to white space Convert everything to lowercase Split tokens on white space Feature values Frequencies of tokens in documents
Example Xref: cantaloupe.srv.cs.cmu.edu misc.headlines:41568 talk.politics.guns:53293 … Lines: 38 wrote: : In article, (Steve Manes) writes: Due to F. Xia
Tokenized Example wrote: :In article, writes: writes hambidge bms com wrote In article c psog c magpie linknet com manes magpie linknet com stevemanes writes Due to F. Xia
Example Feature Vector guns a:11 about:2 absurd:1 again:1 an:1 and:5 any:2 approaching:1 are:5 argument:1 article:1 as:5 associates:1 at:1 average:2 bait:1 …. Due to F. Xia
MaxEnt POS Tagging
N-gram POS tagging Bigram Model: Trigram Model:
MaxEnt POS Tagging POS tagging as classification What are the inputs?
MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified?
MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes?
MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes? POS tags
MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes? POS tags What information should we use?
MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes? POS tags What information should we use? Consider the ngram model
POS Feature Representation Feature templates What feature templates correspond to trigram POS?
POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0
POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1
POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful?
POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful? More word context
POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful? More word context Previous: w -1; Pre-pre: w -2 ; Next: w +1 ;…. Word bigram: w -1 w 0
POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful? More word context Previous: w -1; Pre-pre: w -2 ; Next: w +1 ;…. Word bigram: w -1 w 0 Backoff tag context: t -1
Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) x2 (flies) x3 (like)
Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) x2 (flies) Time x3 (like)flies
Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time x2 (flies) Timeflies x3 (like)flieslike
Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time x2 (flies) TimefliesTime flies x3 (like)flieslikeflies like
Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time flies x2 (flies) TimefliesTime flieslike x3 (like)flieslikeflies likean
Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOS x2 (flies) TimefliesTime flieslikeN x3 (like)flieslikeflies likeanN
Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV
Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet:
Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet: N prevW= :1 currw=Time:1 precurrW= -Time:1 postW=flies:1 preT=BOS:1
Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet: N prevW= :1 currw=Time:1 precurrW= -Time:1 postW=flies:1 preT=BOS:1 N prevW=Time:1 currw=flies:1 precurrW=Time-flies:1 postW=like:1 preT=N:1
Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet: N prevW= :1 currw=Time:1 precurrW= -Time:1 postW=flies:1 preT=BOS:1 N prevW=Time:1 currw=flies:1 precurrW=Time-flies:1 postW=like:1 preT=N:1 V prevW=flies:1 currw=like:1 precurrW=flies-like:1 postW=an:1 preT=N:1
MaxEnt Feature Template Words: Current word: w 0 Previous word: w -1 Word two back: w -2 Next word: w +1 Next next word: w +2 Tags: Previous tag: t -1 Previous tag pair: t -2 t -1 How many features?
MaxEnt Feature Template Words: Current word: w 0 Previous word: w -1 Word two back: w -2 Next word: w +1 Next next word: w +2 Tags: Previous tag: t -1 Previous tag pair: t -2 t -1 How many features? 5|V|+|T|+|T| 2
Unknown Words How can we handle unknown words?
Unknown Words How can we handle unknown words? Assume rare words in training similar to unknown test What similarities can we exploit?
Unknown Words How can we handle unknown words? Assume rare words in training similar to unknown test What similarities can we exploit? Similar in link between spelling/morphology and POS -able: JJ -tion NN -ly RB Case: John NP, etc
Representing Orthographic Patterns How can we represent morphological patterns as features?
Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences?
Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences? Prefixes/suffixes e.g. suffix(w i )=ing or prefix(w i )=well
Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences? Prefixes/suffixes e.g. suffix(w i )=ing or prefix(w i )=well Specific characters or character types Which?
Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences? Prefixes/suffixes e.g. suffix(w i )=ing or prefix(w i )=well Specific characters or character types Which? is-capitalized is-hyphenated
MaxEnt Feature Set
Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”?
Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”? 5 in paper Uncommon words better represented by spelling
Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”? 5 in paper Uncommon words better represented by spelling Spelling could generalize Specific words would be undertrained Intuition: Rare features = features less than X times in training
Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”? 5 in paper Uncommon words better represented by spelling Spelling could generalize Specific words would be undertrained Intuition: Rare features = features less than X times in training Infrequent features unlikely to be informative Skip
Examples well-heeled: rare word
Examples well-heeled: rare word JJ prevW=about:1 prev2W=stories-about:1 nextW=communities:1 next2W=and:1 pref=w:1 pref=we:1 pref=wel:1 pref=well:1 suff=d:1 suff=ed:1 suff=led:1 suff=eled:1 is-hyphenated:1 preT=IN:1 pre2T=NNS- IN:1
Finding Features In training, where do features come from? Where do features come from in testing? w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV
Finding Features In training, where do features come from? Where do features come from in testing? tag features come from classification of prior word w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV
Sequence Labeling
Goal: Find most probable labeling of a sequence Many sequence labeling tasks POS tagging Word segmentation Named entity tagging Story/spoken sentence segmentation Pitch accent detection Dialog act tagging
Solving Sequence Labeling
Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM
Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features?
Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions:
Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions: Don’t use features that depend on class labels (loses info)
Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions: Don’t use features that depend on class labels (loses info) Use other process to generate class labels, then use
Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions: Don’t use features that depend on class labels (loses info) Use other process to generate class labels, then use Perform incremental classification to get labels, use labels as features for instances later in sequence