Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A
Constituents of a structured model Feature vector f(x,y) Features: real-valued, typically binary User-defined Number of features typically very large Parameter vector w Weight of each feature Score of a prediction y for input x: s(x,y) = w. f(x,y) Many interpretations: Log unnormalized probability Negative energy
Prediction problem Predict: y * = argmax y s(x,y) Popularly known as MAP estimation Challenge: Space of possible y exponentially large Exploit decomposability of feature function over parts of y f(x,y) = c f (x,y c,c) Form of features and MAP inference algorithms is structure specific. Examples..
Sequence labeling My review of Fermat’s last theorem by S. Singh
Sequence labeling My review of Fermat’s last theorem by S. Singh MyreviewofFermat’slast theorembyS.Singh Other Title otherAuthor t x y y1y1 y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 y8y8 y9y9 Features decompose over adjacent labels.
Sequence labeling Examples of features [ x 8 =“S.” and y 8 =“Author”] [ y 8 =“Author” and y 9 =“Author”] MAP: Viterbi can find best y in O(nm 2 )
Markov models (CRFs) Application: Image segmentation and many others y is a vector y 1, y 2,.., y n of discrete labels Features decompose over cliques of a triangulated graph MAP inference algorithms for graphical models, extensively researched Junction trees for exact, many approximate algorithms Special case: Viterbi Framework of structured models subsumes graphical models
Segmentation of sequence Application: speech recognition, information extraction Output y is a sequence of segments s 1,…,s p Feature f(x,y) decomposes over segment and label of previous segment MAP: easy extension of Viterbi O(m 2 n 2 ) m = number of labels, n = length of a sequence MyreviewofFermat’slast theorembyS.Singh Other TitleotherAuthor x y
Parse tree of a sentence Input x: “John hit the ball” Output y: parse tree Features decompose over nodes of the tree MAP: Inside/outside algorithm O(n 3 )
Sentence alignment Input: sentence pair Output: alignment Features decompose over each aligned edge MAP: Maximum weight matching Image from :
Training Given Several input output pairs (x 1 y 1 ), (x 2 y 2 ), …, (x N y N ) Error of an output: E i (y ) Example: Hamming error. Also decomposable. Train parameter vector w to minimize training error Two problems: Discontinuous objective Might over-fit training data