Download presentation
Presentation is loading. Please wait.
1
Max-margin sequential learning methods
William W. Cohen CALD
2
Announcements Upcoming assignments: Wed 3/3: project proposal due:
personnel page Spring break next week, no class Will get feedback on project proposals by end of break No write-ups for “Distance Metrics for Text” week are due Wed 3/17 not the Monday after spring break
3
Collins’ paper Notation: label (y) is a “tag” t
observation (x) is word w history h is a 4-tuple <ti,ti-1,w[1:n],i> phis(h,t) is a feature of h, t
4
Collins’ papers Notation con’t:
Phi is summation of phi for all positions i alphas is weight to give phis
5
Collins’ paper
6
The theory Claim 1: the algorithm is an instance of this perceptron variant: Claim 2: the arguments in the mistake-bounded classification results of F&S99 extend immediately to this ranking task as well.
8
F&S99 algorithm
9
F&S99 result
10
Collins’ result
11
Results Two experiments POS tagging, using the Adwait’s features
NP chunking (Start,Continue,Outside tags) NER on special AT&T dataset (another paper)
12
Features for NP chunking
13
Results
14
More ideas The dual version of a perceptron:
w is built up by repeatedly adding examples => w is a weighted sum of the examples x1,...,xn inner product <w,x> is can be rewritten:
15
Dual version of perceptron ranking
alpha i,j = i,j range over example and correct/incorrect tag sequence
16
NER features for re-ranking MAXENT tagger output
17
NER features
18
NER results
19
Altun et al paper Starting point – dual version of Collins’ perceptron algorithm final hypothesis is weighted sum of inner products with a subset of the examples this a lot like an SVM – except that the perceptron algorithm is used to set the weights rather than quadratic optimization
20
SVM optimization Notation: yi is the correct tag for xi
y is an incorrect tag F(xi,yi) are features Optimization problem: find weights w on the examples that maximize minimal margin, limiting ||w||=1, or minimize ||w||2 such that every margin >= 1
21
SVMs for ranking
22
SVMs for ranking Proposition: (14) and (15) are equivalent:
23
SVMs for ranking A binary classification problem – with xi yi the positive example and xi y’ negative examples, except that thetai varies for each example. Why? because we’re ranking.
24
SVMs for ranking Altun et al work give the remaining details
Like for perceptron learning, “negative” data is found by running Viterbi given the learned weights and looking for errors Each mistake is a possible new support vector Need to iterate over the data repeatedly Could be exponential time before convergence if the support vectors are dense...
25
Altun et al results NER on 300 sentences from CoNLL2002 shared task
Spanish Four entity types, nine labels (beginning-T, intermediate-T, other) POS tagging on 300 sentences from Penn TreeBank 5-CV, window of size 3, simple features
26
Altun et al results
27
Altun et al results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.