Max-margin sequential learning methods

Slides:



Advertisements
Similar presentations
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Advertisements

Linear Classifiers (perceptrons)
Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
SVM—Support Vector Machines
Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.
Support Vector Machines
Support Vector Machines
Linear Discriminators Chapter 20 From Data to Knowledge.
Online Learning Algorithms
Support Vector Machines
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
SVM by Sequential Minimal Optimization (SMO)
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
Support Vector Machine (SVM) Based on Nello Cristianini presentation
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
IE with Dictionaries Cohen & Sarawagi. Announcements Current statistics: –days with unscheduled student talks: 2 –students with unscheduled student talks:
Conditional Markov Models: MaxEnt Tagging and MEMMs William W. Cohen CALD.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
CSSE463: Image Recognition Day 33 This week This week Today: Classification by “boosting” Today: Classification by “boosting” Yoav Freund and Robert Schapire.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Conditional Markov Models: MaxEnt Tagging and MEMMs
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Perceptrons – the story continues. On-line learning/regret analysis Optimization – is a great model of what you want to do – a less good model of what.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
IE With Undirected Models: the saga continues
CSSE463: Image Recognition Day 14
Large Margin classifiers
Dan Roth Department of Computer and Information Science
Fast Effective Rule Induction
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Conditional Random Fields
Dan Roth Department of Computer and Information Science
Support Vector Machines
CIS 700 Advanced Machine Learning Structured Machine Learning:   Theory and Applications in Natural Language Processing Shyam Upadhyay Department of.
CS 4/527: Artificial Intelligence
Linear Discriminators
Combining Base Learners
CRFs for SPLODD William W. Cohen Sep 8, 2011.
Klein and Manning on CRFs vs CMMs
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
Kai-Wei Chang University of Virginia
CSCI 5832 Natural Language Processing
CSSE463: Image Recognition Day 14
Support Vector Machines
CSSE463: Image Recognition Day 14
Support Vector Machines and Kernels
CSSE463: Image Recognition Day 14
CSCI 5832 Natural Language Processing
Artificial Intelligence 9. Perceptron
Parallel Perceptrons and Iterative Parameter Mixing
IE With Undirected Models
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
The Voted Perceptron for Ranking and Structured Classification
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Support Vector Machines
Presentation transcript:

Max-margin sequential learning methods William W. Cohen CALD

Announcements Upcoming assignments: Wed 3/3: project proposal due: personnel + 1-2 page Spring break next week, no class Will get feedback on project proposals by end of break No write-ups for “Distance Metrics for Text” week are due Wed 3/17 not the Monday after spring break

Collins’ paper Notation: label (y) is a “tag” t observation (x) is word w history h is a 4-tuple <ti,ti-1,w[1:n],i> phis(h,t) is a feature of h, t

Collins’ papers Notation con’t: Phi is summation of phi for all positions i alphas is weight to give phis

Collins’ paper

The theory Claim 1: the algorithm is an instance of this perceptron variant: Claim 2: the arguments in the mistake-bounded classification results of F&S99 extend immediately to this ranking task as well.

F&S99 algorithm

F&S99 result

Collins’ result

Results Two experiments POS tagging, using the Adwait’s features NP chunking (Start,Continue,Outside tags) NER on special AT&T dataset (another paper)

Features for NP chunking

Results

More ideas The dual version of a perceptron: w is built up by repeatedly adding examples => w is a weighted sum of the examples x1,...,xn inner product <w,x> is can be rewritten:

Dual version of perceptron ranking alpha i,j = i,j range over example and correct/incorrect tag sequence

NER features for re-ranking MAXENT tagger output

NER features

NER results

Altun et al paper Starting point – dual version of Collins’ perceptron algorithm final hypothesis is weighted sum of inner products with a subset of the examples this a lot like an SVM – except that the perceptron algorithm is used to set the weights rather than quadratic optimization

SVM optimization Notation: yi is the correct tag for xi y is an incorrect tag F(xi,yi) are features Optimization problem: find weights w on the examples that maximize minimal margin, limiting ||w||=1, or minimize ||w||2 such that every margin >= 1

SVMs for ranking

SVMs for ranking Proposition: (14) and (15) are equivalent:

SVMs for ranking A binary classification problem – with xi yi the positive example and xi y’ negative examples, except that thetai varies for each example. Why? because we’re ranking.

SVMs for ranking Altun et al work give the remaining details Like for perceptron learning, “negative” data is found by running Viterbi given the learned weights and looking for errors Each mistake is a possible new support vector Need to iterate over the data repeatedly Could be exponential time before convergence if the support vectors are dense...

Altun et al results NER on 300 sentences from CoNLL2002 shared task Spanish Four entity types, nine labels (beginning-T, intermediate-T, other) POS tagging on 300 sentences from Penn TreeBank 5-CV, window of size 3, simple features

Altun et al results

Altun et al results