Mallet & MaxEnt POS Tagging Shallow Processing Techniques for NLP Ling570 November 16, 2011.

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.

Advertisements

Florida International University COP 4770 Introduction of Weka.

Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.

Ling 570 Day 6: HMM POS Taggers 1. Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details.

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

Ling 570: Day 8 Classification, Mallet 1. Roadmap  Open questions?  Quick review of classification  Feature templates 2.

Maximum Entropy Advanced Statistical Methods in NLP Ling 572 January 31, 2012.

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.

Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.

Maximum Entropy Model (I) LING 572 Fei Xia Week 5: 02/05-02/07/08 1.

Introduction to Classification Shallow Processing Techniques for NLP Ling570 November 9, 2011.

MaxEnt POS Tagging Shallow Processing Techniques for NLP Ling570 November 21, 2011.

Classification & Mallet Shallow Processing Techniques for NLP Ling570 November 14, 2011.

Q/A System First Stage: Classification Project by: Abdullah Alotayq, Dong Wang, Ed Pham.

Final review LING572 Fei Xia Week 10: 03/13/08 1.

Introduction to Mallet

Deliverable #2: Question Classification Group 5 Caleb Barr Maria Alexandropoulou.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.

Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.

Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?

Sequence labeling and beam search LING 572 Fei Xia 2/15/07.

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08.

The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.

Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.

Overview of Search Engines

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

MAchine Learning for LanguagE Toolkit

SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :

Final review LING572 Fei Xia Week 10: 03/11/

1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran.

Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Overview of Machine Learning for NLP Tasks: part II Named Entity Tagging: A Phrase-Level NLP Task.

Graphical models for part of speech tagging

GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)

Ling 570 Day 17: Named Entity Recognition Chunking.

A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

HW7 Extracting Arguments for % Ang Sun March 25, 2012.

MaxEnt: Training, Smoothing, Tagging Advanced Statistical Methods in NLP Ling572 February 7,

Agenda Getting Started: Using Unix Unix Structure / Features Elements of the Unix Philosophy Unix Command Structure Command Line Editing Online Unix Command.

Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Tokenization & POS-Tagging

Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

Ling 570 Day 16: Sequence modeling Named Entity Recognition.

LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.

John Lafferty Andrew McCallum Fernando Pereira

HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

BOOTSTRAPPING INFORMATION EXTRACTION FROM SEMI-STRUCTURED WEB PAGES Andrew Carson and Charles Schafer.

Course Review #2 and Project Parts 3-6 LING 572 Fei Xia 02/14/06.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Conditional Random Fields & Table Extraction Dongfang Xu School of Information.

1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :

Information Extraction Review of Übung 2

Maximum Entropy Models and Feature Engineering CSCI-GA.2591

Classification—Practical Exercise

Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.

Word embeddings (continued)

SNoW & FEX Libraries; Document Classification

Presentation transcript:

Mallet & MaxEnt POS Tagging Shallow Processing Techniques for NLP Ling570 November 16, 2011

Roadmap Mallet Classifiers Testing Resources HW #8 MaxEnt POS Tagging POS Tagging as classification Feature engineering Sequence labeling

Mallet Commands Mallet command types: Data preparation Data/model inspection Training Classification Command line scripts Shell scripts Set up java environment Invoke java programs --help lists command line parameters for scripts

Mallet Data Mallet data instances: Instance_id label f1 v1 f2 v2 ….. Stored in internal binary format: “vectors” Binary format used by learners, decoders Need to convert text files to binary format

Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype –input data.vector- -training-portion output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc

Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype --training-portion output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc

Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype - -training-portion output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc --report: train:accuracy, test:f1:en

Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype - -training-portion output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc --report: train:accuracy, test:f1:en Can also use pre-split training & testing files e.g. output of vectors2vectors --training-file, --testing-file

Building & Accessing Models bin/mallet train-classifier --input data.vector --trainer classifiertype - -training-portion output-classifier OF Builds classifier model Can also store model, produce scores, confusion matrix, etc --trainer: MaxEnt, DecisionTree, NaiveBayes, etc --report: train:accuracy, test:f1:en Confusion Matrix, row=true, column=predicted accuracy=1.0 label 0 1 |total 0 de 1. |1 1 en. 1 |1 Summary. train accuracy mean = 1.0 stddev = 0 stderr = 0 Summary. test accuracy mean = 1.0 stddev = 0 stderr = 0

Accessing Classifiers classifier2info --classifier maxent.model Prints out contents of model file

Accessing Classifiers classifier2info --classifier maxent.model Prints out contents of model file FEATURES FOR CLASS en book the i

Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model

Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model Also instance file, directories: classify-file, classify-dir

Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model Also instance file, directories: classify-file, classify-dir Prints class,score matrix

Testing Use new data to test a previously built classifier bin/mallet classify-svmlight --input testfile --output outputfile -- classifier maxent.model Also instance file, directories: classify-file, classify-dir Prints class,score matrix Inst_id class1 score1 class2 score2 array:0en0.995de array:1en0.970de array:2en0.064de0.935 array:3en0.094de0.905

General Use bin/mallet import-svmlight --input svmltrain.vectors.txt -- output svmltrain.vectors Builds binary representation from feature:value pairs

General Use bin/mallet import-svmlight --input svmltrain.vectors.txt -- output svmltrain.vectors Builds binary representation from feature:value pairs bin/mallet train-classifier --input svmltrain.vectors –trainer MaxEnt --output-classifier svml.model Trains MaxEnt classifier and stores model

General Use bin/mallet import-svmlight --input svmltrain.vectors.txt -- output svmltrain.vectors Builds binary representation from feature:value pairs bin/mallet train-classifier --input svmltrain.vectors –trainer MaxEnt --output-classifier svml.model Trains MaxEnt classifier and stores model bin/mallet classify-svmlight --input svmltest.vectors.txt -- output - --classifier svml.model Tests on the new data

Other Information Website: Download and documentation (such as it is)

Other Information Website: Download and documentation (such as it is) API tutorial:

Other Information Website: Download and documentation (such as it is) API tutorial: Local guide (refers to older version 0.4) k/mallet_guide.pdf

HW #8

Goals Get experience with Mallet Import data Build and evaluate classifiers

Goals Get experience with Mallet Import data Build and evaluate classifiers Build your own text classification systems w/Mallet 20 Newsgroups data Build your own feature extractor Train and test classifiers

Text Classification Q1: Build representations of 20 Newsgroups data Use mallet built-in functions text2vectors --input dropbox…/20_newsgroups/* --skip- headers --output news3.vectors Q2: Do the same thing but build your own featues

Feature Creation Skip headers Read data only from first blank line Simple Tokenization: Convert a non-alphabetic chars ([a-zA-Z]) to white space Convert everything to lowercase Split tokens on white space Feature values Frequencies of tokens in documents

Example Xref: cantaloupe.srv.cs.cmu.edu misc.headlines:41568 talk.politics.guns:53293 … Lines: 38 wrote: : In article, (Steve Manes) writes: Due to F. Xia

Tokenized Example wrote: :In article, writes: writes hambidge bms com wrote In article c psog c magpie linknet com manes magpie linknet com stevemanes writes Due to F. Xia

Example Feature Vector guns a:11 about:2 absurd:1 again:1 an:1 and:5 any:2 approaching:1 are:5 argument:1 article:1 as:5 associates:1 at:1 average:2 bait:1 …. Due to F. Xia

MaxEnt POS Tagging

N-gram POS tagging Bigram Model: Trigram Model:

MaxEnt POS Tagging POS tagging as classification What are the inputs?

MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified?

MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes?

MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes? POS tags

MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes? POS tags What information should we use?

MaxEnt POS Tagging POS tagging as classification What are the inputs? What units are classified? Words What are the classes? POS tags What information should we use? Consider the ngram model

POS Feature Representation Feature templates What feature templates correspond to trigram POS?

POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0

POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1

POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful?

POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful? More word context

POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful? More word context Previous: w -1; Pre-pre: w -2 ; Next: w +1 ;…. Word bigram: w -1 w 0

POS Feature Representation Feature templates What feature templates correspond to trigram POS? Current word: w 0 Previous two tags: t -2 t -1 What other feature templates could be useful? More word context Previous: w -1; Pre-pre: w -2 ; Next: w +1 ;…. Word bigram: w -1 w 0 Backoff tag context: t -1

Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) x2 (flies) x3 (like)

Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) x2 (flies) Time x3 (like)flies

Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time x2 (flies) Timeflies x3 (like)flieslike

Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time x2 (flies) TimefliesTime flies x3 (like)flieslikeflies like

Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time flies x2 (flies) TimefliesTime flieslike x3 (like)flieslikeflies likean

Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOS x2 (flies) TimefliesTime flieslikeN x3 (like)flieslikeflies likeanN

Feature Templates Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV

Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet:

Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet: N prevW= :1 currw=Time:1 precurrW= -Time:1 postW=flies:1 preT=BOS:1

Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet: N prevW= :1 currw=Time:1 precurrW= -Time:1 postW=flies:1 preT=BOS:1 N prevW=Time:1 currw=flies:1 precurrW=Time-flies:1 postW=like:1 preT=N:1

Feature Templates w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV In mallet: N prevW= :1 currw=Time:1 precurrW= -Time:1 postW=flies:1 preT=BOS:1 N prevW=Time:1 currw=flies:1 precurrW=Time-flies:1 postW=like:1 preT=N:1 V prevW=flies:1 currw=like:1 precurrW=flies-like:1 postW=an:1 preT=N:1

MaxEnt Feature Template Words: Current word: w 0 Previous word: w -1 Word two back: w -2 Next word: w +1 Next next word: w +2 Tags: Previous tag: t -1 Previous tag pair: t -2 t -1 How many features?

MaxEnt Feature Template Words: Current word: w 0 Previous word: w -1 Word two back: w -2 Next word: w +1 Next next word: w +2 Tags: Previous tag: t -1 Previous tag pair: t -2 t -1 How many features? 5|V|+|T|+|T| 2

Unknown Words How can we handle unknown words?

Unknown Words How can we handle unknown words? Assume rare words in training similar to unknown test What similarities can we exploit?

Unknown Words How can we handle unknown words? Assume rare words in training similar to unknown test What similarities can we exploit? Similar in link between spelling/morphology and POS -able:  JJ -tion  NN -ly  RB Case: John  NP, etc

Representing Orthographic Patterns How can we represent morphological patterns as features?

Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences?

Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences? Prefixes/suffixes e.g. suffix(w i )=ing or prefix(w i )=well

Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences? Prefixes/suffixes e.g. suffix(w i )=ing or prefix(w i )=well Specific characters or character types Which?

Representing Orthographic Patterns How can we represent morphological patterns as features? Character sequences Which sequences? Prefixes/suffixes e.g. suffix(w i )=ing or prefix(w i )=well Specific characters or character types Which? is-capitalized is-hyphenated

MaxEnt Feature Set

Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”?

Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”? 5 in paper Uncommon words better represented by spelling

Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”? 5 in paper Uncommon words better represented by spelling Spelling could generalize Specific words would be undertrained Intuition: Rare features = features less than X times in training

Rare Words & Features Intuition: Rare words = infrequent words in training What qualifies as “Rare”? 5 in paper Uncommon words better represented by spelling Spelling could generalize Specific words would be undertrained Intuition: Rare features = features less than X times in training Infrequent features unlikely to be informative Skip

Examples well-heeled: rare word

Examples well-heeled: rare word JJ prevW=about:1 prev2W=stories-about:1 nextW=communities:1 next2W=and:1 pref=w:1 pref=we:1 pref=wel:1 pref=well:1 suff=d:1 suff=ed:1 suff=led:1 suff=eled:1 is-hyphenated:1 preT=IN:1 pre2T=NNS- IN:1

Finding Features In training, where do features come from? Where do features come from in testing? w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV

Finding Features In training, where do features come from? Where do features come from in testing? tag features come from classification of prior word w -1 w0w0 w -1 w 0 w +1 t -1 y x1(Time ) Time fliesBOSN x2 (flies) TimefliesTime flieslikeNN x3 (like)flieslikeflies likeanNV

Sequence Labeling

Goal: Find most probable labeling of a sequence Many sequence labeling tasks POS tagging Word segmentation Named entity tagging Story/spoken sentence segmentation Pitch accent detection Dialog act tagging

Solving Sequence Labeling

Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM

Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features?

Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions:

Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions: Don’t use features that depend on class labels (loses info)

Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions: Don’t use features that depend on class labels (loses info) Use other process to generate class labels, then use

Solving Sequence Labeling Direct: Use a sequence labeling algorithm E.g. HMM, CRF, MEMM Via classification: Use classification algorithm Issue: What about tag features? Features that use class labels – depend on classification Solutions: Don’t use features that depend on class labels (loses info) Use other process to generate class labels, then use Perform incremental classification to get labels, use labels as features for instances later in sequence