Download presentation
Presentation is loading. Please wait.
1
Mallet & MaxEnt POS Tagging Shallow Processing Techniques for NLP Ling570 November 16, 2011
2
Roadmap Mallet Classifiers Testing Resources HW #8 MaxEnt POS Tagging POS Tagging as classification Feature engineering Sequence labeling
3
Mallet Commands Mallet command types: Data preparation Data/model inspection Training Classification Command line scripts Shell scripts Set up java environment Invoke java programs --help lists command line parameters for scripts
4
Mallet Data Mallet data instances: Instance_id label f1 v1 f2 v2 ….. Stored in internal binary format: “vectors” Binary format used by learners, decoders Need to convert text files to binary format
5
Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype –input data.vector- -training-portion 0.9 -- output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc
6
Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype --training-portion 0.9 --output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc
7
Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype - -training-portion 0.9 --output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc --report: train:accuracy, test:f1:en
8
Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype - -training-portion 0.9 --output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc --report: train:accuracy, test:f1:en Can also use pre-split training & testing files e.g. output of vectors2vectors --training-file, --testing-file
9
Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype - -training-portion 0.9 --output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc --report: train:accuracy, test:f1:en Confusion Matrix, row=true, column=predicted accuracy=1.0 label 0 1 |total 0 de 1. |1 1 en. 1 |1 Summary. train accuracy mean = 1.0 stddev = 0 stderr = 0 Summary. test accuracy mean = 1.0 stddev = 0 stderr = 0
10
Accessing Classifiers classifier2info --classifier maxent.model Prints out contents of model file
11
Accessing Classifiers classifier2info --classifier maxent.model Prints out contents of model file FEATURES FOR CLASS en -0.036953801963395115 book 0.004605219133228236 the 0.24270652500835088 i 0.004605219133228236
12
Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model
13
Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model Also instance file, directories: classify-file, classify-dir
14
Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model Also instance file, directories: classify-file, classify-dir Prints class,score matrix
15
Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile -- classifier maxent.model Also instance file, directories: classify-file, classify-dir Prints class,score matrix Inst_id class1 score1 class2 score2 array:0en0.995de0.0046 array:1en0.970de0.0294 array:2en0.064de0.935 array:3en0.094de0.905
16
General Use bin/mallet import-svmlight --input svmltrain.vectors.txt -- output svmltrain.vectors Builds binary representation from feature:value pairs
17
General Use bin/mallet import-svmlight --input svmltrain.vectors.txt -- output svmltrain.vectors Builds binary representation from feature:value pairs bin/mallet train-classifier --input svmltrain.vectors –trainer MaxEnt --output-classifier svml.model Trains MaxEnt classifier and stores model
18
General Use bin/mallet import-svmlight --input svmltrain.vectors.txt -- output svmltrain.vectors Builds binary representation from feature:value pairs bin/mallet train-classifier --input svmltrain.vectors –trainer MaxEnt --output-classifier svml.model Trains MaxEnt classifier and stores model bin/mallet classify-svmlight --input svmltest.vectors.txt -- output - --classifier svml.model Tests on the new data
19
Other Information Website: Download and documentation (such as it is) http://mallet.cs.umass.edu
20
Other Information Website: Download and documentation (such as it is) http://mallet.cs.umass.edu API tutorial: http://mallet.cs.umass.edu/mallet-tutorial.pdf
21
Other Information Website: Download and documentation (such as it is) http://mallet.cs.umass.edu API tutorial: http://mallet.cs.umass.edu/mallet-tutorial.pdf Local guide (refers to older version 0.4) http://courses.washington.edu/ling572/winter07/homewor k/mallet_guide.pdf
22
HW #8
23
Goals Get experience with Mallet Import data Build and evaluate classifiers
24
Goals Get experience with Mallet Import data Build and evaluate classifiers Build your own text classification systems w/Mallet 20 Newsgroups data Build your own feature extractor Train and test classifiers
25
Text Classification Q1: Build representations of 20 Newsgroups data Use mallet built-in functions text2vectors --input dropbox…/20_newsgroups/* --skip- headers --output news3.vectors Q2: Do the same thing but build your own featues
26
Feature Creation Skip headers Read data only from first blank line Simple Tokenization: Convert a non-alphabetic chars ([a-zA-Z]) to white space Convert everything to lowercase Split tokens on white space Feature values Frequencies of tokens in documents
27
Example Xref: cantaloupe.srv.cs.cmu.edu misc.headlines:41568 talk.politics.guns:53293 … Lines: 38 hambidge@bms.com wrote: : In article, manes@magpie.linknet.com (Steve Manes) writes: Due to F. Xia
28
Tokenized Example hambidge@bms.comhambidge@bms.com wrote: :In article, manes@magpie.linknet.com(SteveManes) writes: manes@magpie.linknet.com(SteveManes) writes hambidge bms com wrote In article c psog c magpie linknet com manes magpie linknet com stevemanes writes Due to F. Xia
29
Example Feature Vector guns a:11 about:2 absurd:1 again:1 an:1 and:5 any:2 approaching:1 are:5 argument:1 article:1 as:5 associates:1 at:1 average:2 bait:1 …. Due to F. Xia
30
MaxEnt POS Tagging
31
N-gram POS tagging Bigram Model: Trigram Model:
32
MaxEnt POS Tagging POS tagging as classification What are the inputs?
33
MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified?
34
MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes?
35
MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes? POS tags
36
MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes? POS tags What information should we use?
37
MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes? POS tags What information should we use? Consider the ngram model
38
POS Feature Representation Feature templates What feature templates correspond to trigram POS?
39
POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0
40
POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1
41
POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful?
42
POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful? More word context
43
POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful? More word context Previous: w -1; Pre-pre: w -2 ; Next: w +1 ;…. Word bigram: w -1 w 0
44
POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful? More word context Previous: w -1; Pre-pre: w -2 ; Next: w +1 ;…. Word bigram: w -1 w 0 Backoff tag context: t -1
45
Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) x2 (flies) x3 (like)
46
Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) x2 (flies) Time x3 (like)flies
47
Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time x2 (flies) Timeflies x3 (like)flieslike
48
Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time x2 (flies) TimefliesTime flies x3 (like)flieslikeflies like
49
Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time flies x2 (flies) TimefliesTime flieslike x3 (like)flieslikeflies likean
50
Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOS x2 (flies) TimefliesTime flieslikeN x3 (like)flieslikeflies likeanN
51
Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV
52
Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet:
53
Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet: N prevW= :1 currw=Time:1 precurrW= -Time:1 postW=flies:1 preT=BOS:1
54
Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet: N prevW= :1 currw=Time:1 precurrW= -Time:1 postW=flies:1 preT=BOS:1 N prevW=Time:1 currw=flies:1 precurrW=Time-flies:1 postW=like:1 preT=N:1
55
Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet: N prevW= :1 currw=Time:1 precurrW= -Time:1 postW=flies:1 preT=BOS:1 N prevW=Time:1 currw=flies:1 precurrW=Time-flies:1 postW=like:1 preT=N:1 V prevW=flies:1 currw=like:1 precurrW=flies-like:1 postW=an:1 preT=N:1
56
MaxEnt Feature Template Words: Current word: w 0 Previous word: w -1 Word two back: w -2 Next word: w +1 Next next word: w +2 Tags: Previous tag: t -1 Previous tag pair: t -2 t -1 How many features?
57
MaxEnt Feature Template Words: Current word: w 0 Previous word: w -1 Word two back: w -2 Next word: w +1 Next next word: w +2 Tags: Previous tag: t -1 Previous tag pair: t -2 t -1 How many features? 5|V|+|T|+|T| 2
58
Unknown Words How can we handle unknown words?
59
Unknown Words How can we handle unknown words? Assume rare words in training similar to unknown test What similarities can we exploit?
60
Unknown Words How can we handle unknown words? Assume rare words in training similar to unknown test What similarities can we exploit? Similar in link between spelling/morphology and POS -able: JJ -tion NN -ly RB Case: John NP, etc
61
Representing Orthographic Patterns How can we represent morphological patterns as features?
62
Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences?
63
Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences? Prefixes/suffixes e.g. suffix(w i )=ing or prefix(w i )=well
64
Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences? Prefixes/suffixes e.g. suffix(w i )=ing or prefix(w i )=well Specific characters or character types Which?
65
Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences? Prefixes/suffixes e.g. suffix(w i )=ing or prefix(w i )=well Specific characters or character types Which? is-capitalized is-hyphenated
66
MaxEnt Feature Set
67
Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”?
68
Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”? 5 in paper Uncommon words better represented by spelling
69
Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”? 5 in paper Uncommon words better represented by spelling Spelling could generalize Specific words would be undertrained Intuition: Rare features = features less than X times in training
70
Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”? 5 in paper Uncommon words better represented by spelling Spelling could generalize Specific words would be undertrained Intuition: Rare features = features less than X times in training Infrequent features unlikely to be informative Skip
71
Examples well-heeled: rare word
72
Examples well-heeled: rare word JJ prevW=about:1 prev2W=stories-about:1 nextW=communities:1 next2W=and:1 pref=w:1 pref=we:1 pref=wel:1 pref=well:1 suff=d:1 suff=ed:1 suff=led:1 suff=eled:1 is-hyphenated:1 preT=IN:1 pre2T=NNS- IN:1
73
Finding Features In training, where do features come from? Where do features come from in testing? w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV
74
Finding Features In training, where do features come from? Where do features come from in testing? tag features come from classification of prior word w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV
75
Sequence Labeling
76
Goal: Find most probable labeling of a sequence Many sequence labeling tasks POS tagging Word segmentation Named entity tagging Story/spoken sentence segmentation Pitch accent detection Dialog act tagging
77
Solving Sequence Labeling
78
Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM
79
Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features?
80
Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions:
81
Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions: Don’t use features that depend on class labels (loses info)
82
Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions: Don’t use features that depend on class labels (loses info) Use other process to generate class labels, then use
83
Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions: Don’t use features that depend on class labels (loses info) Use other process to generate class labels, then use Perform incremental classification to get labels, use labels as features for instances later in sequence
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.