Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before.

Similar presentations

Presentation on theme: "Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before."— Presentation transcript:

1 Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A

2 Constituents of a structured model Feature vector f(x,y) Features: real-valued, typically binary User-defined Number of features typically very large Parameter vector w Weight of each feature Score of a prediction y for input x: s(x,y) = w. f(x,y) Many interpretations: Log unnormalized probability Negative energy

3 Prediction problem Predict: y * = argmax y s(x,y) Popularly known as MAP estimation Challenge: Space of possible y exponentially large Exploit decomposability of feature function over parts of y f(x,y) =  c f (x,y c,c) Form of features and MAP inference algorithms is structure specific. Examples..

4 Sequence labeling My review of Fermat’s last theorem by S. Singh

5 Sequence labeling My review of Fermat’s last theorem by S. Singh 123456789 MyreviewofFermat’slast theorembyS.Singh Other Title otherAuthor t x y y1y1 y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 y8y8 y9y9 Features decompose over adjacent labels.

6 Sequence labeling Examples of features [ x 8 =“S.” and y 8 =“Author”] [ y 8 =“Author” and y 9 =“Author”] MAP: Viterbi can find best y in O(nm 2 )

7 Markov models (CRFs) Application: Image segmentation and many others y is a vector y 1, y 2,.., y n of discrete labels Features decompose over cliques of a triangulated graph MAP inference algorithms for graphical models, extensively researched Junction trees for exact, many approximate algorithms Special case: Viterbi Framework of structured models subsumes graphical models

8 Segmentation of sequence Application: speech recognition, information extraction Output y is a sequence of segments s 1,…,s p Feature f(x,y) decomposes over segment and label of previous segment MAP: easy extension of Viterbi O(m 2 n 2 ) m = number of labels, n = length of a sequence MyreviewofFermat’slast theorembyS.Singh Other TitleotherAuthor x y

9 Parse tree of a sentence Input x: “John hit the ball” Output y: parse tree Features decompose over nodes of the tree MAP: Inside/outside algorithm O(n 3 )

10 Sentence alignment Input: sentence pair Output: alignment Features decompose over each aligned edge MAP: Maximum weight matching Image from :

11 Training Given Several input output pairs (x 1 y 1 ), (x 2 y 2 ), …, (x N y N ) Error of an output: E i (y ) Example: Hamming error. Also decomposable. Train parameter vector w to minimize training error Two problems: Discontinuous objective Might over-fit training data

Download ppt "Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before."

Similar presentations

Ads by Google