Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mallet & MaxEnt POS Tagging Shallow Processing Techniques for NLP Ling570 November 16, 2011.

Similar presentations


Presentation on theme: "Mallet & MaxEnt POS Tagging Shallow Processing Techniques for NLP Ling570 November 16, 2011."— Presentation transcript:

1 Mallet & MaxEnt POS Tagging Shallow Processing Techniques for NLP Ling570 November 16, 2011

2 Roadmap Mallet Classifiers Testing Resources HW #8 MaxEnt POS Tagging POS Tagging as classification Feature engineering Sequence labeling

3 Mallet Commands Mallet command types: Data preparation Data/model inspection Training Classification Command line scripts Shell scripts Set up java environment Invoke java programs --help lists command line parameters for scripts

4 Mallet Data Mallet data instances: Instance_id label f1 v1 f2 v2 ….. Stored in internal binary format: “vectors” Binary format used by learners, decoders Need to convert text files to binary format

5 Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype –input data.vector- -training-portion 0.9 -- output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc

6 Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype --training-portion 0.9 --output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc

7 Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype - -training-portion 0.9 --output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc --report: train:accuracy, test:f1:en

8 Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype - -training-portion 0.9 --output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc --report: train:accuracy, test:f1:en Can also use pre-split training & testing files e.g. output of vectors2vectors --training-file, --testing-file

9 Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype - -training-portion 0.9 --output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc --report: train:accuracy, test:f1:en Confusion Matrix, row=true, column=predicted accuracy=1.0 label 0 1 |total 0 de 1. |1 1 en. 1 |1 Summary. train accuracy mean = 1.0 stddev = 0 stderr = 0 Summary. test accuracy mean = 1.0 stddev = 0 stderr = 0

10 Accessing Classifiers classifier2info --classifier maxent.model Prints out contents of model file

11 Accessing Classifiers classifier2info --classifier maxent.model Prints out contents of model file FEATURES FOR CLASS en -0.036953801963395115 book 0.004605219133228236 the 0.24270652500835088 i 0.004605219133228236

12 Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model

13 Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model Also instance file, directories: classify-file, classify-dir

14 Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model Also instance file, directories: classify-file, classify-dir Prints class,score matrix

15 Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile -- classifier maxent.model Also instance file, directories: classify-file, classify-dir Prints class,score matrix Inst_id class1 score1 class2 score2 array:0en0.995de0.0046 array:1en0.970de0.0294 array:2en0.064de0.935 array:3en0.094de0.905

16 General Use bin/mallet import-svmlight --input svmltrain.vectors.txt -- output svmltrain.vectors Builds binary representation from feature:value pairs

17 General Use bin/mallet import-svmlight --input svmltrain.vectors.txt -- output svmltrain.vectors Builds binary representation from feature:value pairs bin/mallet train-classifier --input svmltrain.vectors –trainer MaxEnt --output-classifier svml.model Trains MaxEnt classifier and stores model

18 General Use bin/mallet import-svmlight --input svmltrain.vectors.txt -- output svmltrain.vectors Builds binary representation from feature:value pairs bin/mallet train-classifier --input svmltrain.vectors –trainer MaxEnt --output-classifier svml.model Trains MaxEnt classifier and stores model bin/mallet classify-svmlight --input svmltest.vectors.txt -- output - --classifier svml.model Tests on the new data

19 Other Information Website: Download and documentation (such as it is) http://mallet.cs.umass.edu

20 Other Information Website: Download and documentation (such as it is) http://mallet.cs.umass.edu API tutorial: http://mallet.cs.umass.edu/mallet-tutorial.pdf

21 Other Information Website: Download and documentation (such as it is) http://mallet.cs.umass.edu API tutorial: http://mallet.cs.umass.edu/mallet-tutorial.pdf Local guide (refers to older version 0.4) http://courses.washington.edu/ling572/winter07/homewor k/mallet_guide.pdf

22 HW #8

23 Goals Get experience with Mallet Import data Build and evaluate classifiers

24 Goals Get experience with Mallet Import data Build and evaluate classifiers Build your own text classification systems w/Mallet 20 Newsgroups data Build your own feature extractor Train and test classifiers

25 Text Classification Q1: Build representations of 20 Newsgroups data Use mallet built-in functions text2vectors --input dropbox…/20_newsgroups/* --skip- headers --output news3.vectors Q2: Do the same thing but build your own featues

26 Feature Creation Skip headers Read data only from first blank line Simple Tokenization: Convert a non-alphabetic chars ([a-zA-Z]) to white space Convert everything to lowercase Split tokens on white space Feature values Frequencies of tokens in documents

27 Example Xref: cantaloupe.srv.cs.cmu.edu misc.headlines:41568 talk.politics.guns:53293 … Lines: 38 hambidge@bms.com wrote: : In article, manes@magpie.linknet.com (Steve Manes) writes: Due to F. Xia

28 Tokenized Example hambidge@bms.comhambidge@bms.com wrote: :In article, manes@magpie.linknet.com(SteveManes) writes: manes@magpie.linknet.com(SteveManes) writes hambidge bms com wrote In article c psog c magpie linknet com manes magpie linknet com stevemanes writes Due to F. Xia

29 Example Feature Vector guns a:11 about:2 absurd:1 again:1 an:1 and:5 any:2 approaching:1 are:5 argument:1 article:1 as:5 associates:1 at:1 average:2 bait:1 …. Due to F. Xia

30 MaxEnt POS Tagging

31 N-gram POS tagging Bigram Model: Trigram Model:

32 MaxEnt POS Tagging POS tagging as classification What are the inputs?

33 MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified?

34 MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes?

35 MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes? POS tags

36 MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes? POS tags What information should we use?

37 MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes? POS tags What information should we use? Consider the ngram model

38 POS Feature Representation Feature templates What feature templates correspond to trigram POS?

39 POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0

40 POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1

41 POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful?

42 POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful? More word context

43 POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful? More word context Previous: w -1; Pre-pre: w -2 ; Next: w +1 ;…. Word bigram: w -1 w 0

44 POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful? More word context Previous: w -1; Pre-pre: w -2 ; Next: w +1 ;…. Word bigram: w -1 w 0 Backoff tag context: t -1

45 Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) x2 (flies) x3 (like)

46 Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) x2 (flies) Time x3 (like)flies

47 Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time x2 (flies) Timeflies x3 (like)flieslike

48 Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time x2 (flies) TimefliesTime flies x3 (like)flieslikeflies like

49 Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time flies x2 (flies) TimefliesTime flieslike x3 (like)flieslikeflies likean

50 Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOS x2 (flies) TimefliesTime flieslikeN x3 (like)flieslikeflies likeanN

51 Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV

52 Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet:

53 Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet: N prevW= :1 currw=Time:1 precurrW= -Time:1 postW=flies:1 preT=BOS:1

54 Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet: N prevW= :1 currw=Time:1 precurrW= -Time:1 postW=flies:1 preT=BOS:1 N prevW=Time:1 currw=flies:1 precurrW=Time-flies:1 postW=like:1 preT=N:1

55 Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet: N prevW= :1 currw=Time:1 precurrW= -Time:1 postW=flies:1 preT=BOS:1 N prevW=Time:1 currw=flies:1 precurrW=Time-flies:1 postW=like:1 preT=N:1 V prevW=flies:1 currw=like:1 precurrW=flies-like:1 postW=an:1 preT=N:1

56 MaxEnt Feature Template Words: Current word: w 0 Previous word: w -1 Word two back: w -2 Next word: w +1 Next next word: w +2 Tags: Previous tag: t -1 Previous tag pair: t -2 t -1 How many features?

57 MaxEnt Feature Template Words: Current word: w 0 Previous word: w -1 Word two back: w -2 Next word: w +1 Next next word: w +2 Tags: Previous tag: t -1 Previous tag pair: t -2 t -1 How many features? 5|V|+|T|+|T| 2

58 Unknown Words How can we handle unknown words?

59 Unknown Words How can we handle unknown words? Assume rare words in training similar to unknown test What similarities can we exploit?

60 Unknown Words How can we handle unknown words? Assume rare words in training similar to unknown test What similarities can we exploit? Similar in link between spelling/morphology and POS -able:  JJ -tion  NN -ly  RB Case: John  NP, etc

61 Representing Orthographic Patterns How can we represent morphological patterns as features?

62 Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences?

63 Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences? Prefixes/suffixes e.g. suffix(w i )=ing or prefix(w i )=well

64 Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences? Prefixes/suffixes e.g. suffix(w i )=ing or prefix(w i )=well Specific characters or character types Which?

65 Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences? Prefixes/suffixes e.g. suffix(w i )=ing or prefix(w i )=well Specific characters or character types Which? is-capitalized is-hyphenated

66 MaxEnt Feature Set

67 Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”?

68 Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”? 5 in paper Uncommon words better represented by spelling

69 Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”? 5 in paper Uncommon words better represented by spelling Spelling could generalize Specific words would be undertrained Intuition: Rare features = features less than X times in training

70 Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”? 5 in paper Uncommon words better represented by spelling Spelling could generalize Specific words would be undertrained Intuition: Rare features = features less than X times in training Infrequent features unlikely to be informative Skip

71 Examples well-heeled: rare word

72 Examples well-heeled: rare word JJ prevW=about:1 prev2W=stories-about:1 nextW=communities:1 next2W=and:1 pref=w:1 pref=we:1 pref=wel:1 pref=well:1 suff=d:1 suff=ed:1 suff=led:1 suff=eled:1 is-hyphenated:1 preT=IN:1 pre2T=NNS- IN:1

73 Finding Features In training, where do features come from? Where do features come from in testing? w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV

74 Finding Features In training, where do features come from? Where do features come from in testing? tag features come from classification of prior word w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV

75 Sequence Labeling

76 Goal: Find most probable labeling of a sequence Many sequence labeling tasks POS tagging Word segmentation Named entity tagging Story/spoken sentence segmentation Pitch accent detection Dialog act tagging

77 Solving Sequence Labeling

78 Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM

79 Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features?

80 Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions:

81 Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions: Don’t use features that depend on class labels (loses info)

82 Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions: Don’t use features that depend on class labels (loses info) Use other process to generate class labels, then use

83 Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions: Don’t use features that depend on class labels (loses info) Use other process to generate class labels, then use Perform incremental classification to get labels, use labels as features for instances later in sequence


Download ppt "Mallet & MaxEnt POS Tagging Shallow Processing Techniques for NLP Ling570 November 16, 2011."

Similar presentations


Ads by Google