Download presentation
Presentation is loading. Please wait.
Published byThomasina King Modified over 9 years ago
1
Structured learning: overview Sunita Sarawagi IIT Bombay http://www.cse.iitb.ac.in/~sunita TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A
2
Constituents of a structured model Feature vector f(x,y) Features: real-valued, typically binary User-defined Number of features typically very large Parameter vector w Weight of each feature Score of a prediction y for input x: s(x,y) = w. f(x,y) Many interpretations: Log unnormalized probability Negative energy
3
Prediction problem Predict: y * = argmax y s(x,y) Popularly known as MAP estimation Challenge: Space of possible y exponentially large Exploit decomposability of feature function over parts of y f(x,y) = c f (x,y c,c) Form of features and MAP inference algorithms is structure specific. Examples..
4
Sequence labeling My review of Fermat’s last theorem by S. Singh
5
Sequence labeling My review of Fermat’s last theorem by S. Singh 123456789 MyreviewofFermat’slast theorembyS.Singh Other Title otherAuthor t x y y1y1 y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 y8y8 y9y9 Features decompose over adjacent labels.
6
Sequence labeling Examples of features [ x 8 =“S.” and y 8 =“Author”] [ y 8 =“Author” and y 9 =“Author”] MAP: Viterbi can find best y in O(nm 2 )
7
Markov models (CRFs) Application: Image segmentation and many others y is a vector y 1, y 2,.., y n of discrete labels Features decompose over cliques of a triangulated graph MAP inference algorithms for graphical models, extensively researched Junction trees for exact, many approximate algorithms Special case: Viterbi Framework of structured models subsumes graphical models
8
Segmentation of sequence Application: speech recognition, information extraction Output y is a sequence of segments s 1,…,s p Feature f(x,y) decomposes over segment and label of previous segment MAP: easy extension of Viterbi O(m 2 n 2 ) m = number of labels, n = length of a sequence MyreviewofFermat’slast theorembyS.Singh Other TitleotherAuthor x y
9
Parse tree of a sentence Input x: “John hit the ball” Output y: parse tree Features decompose over nodes of the tree MAP: Inside/outside algorithm O(n 3 )
10
Sentence alignment Input: sentence pair Output: alignment Features decompose over each aligned edge MAP: Maximum weight matching Image from : http://gate.ac.uk/sale/tao/alignment-editor.png
11
Training Given Several input output pairs (x 1 y 1 ), (x 2 y 2 ), …, (x N y N ) Error of an output: E i (y ) Example: Hamming error. Also decomposable. Train parameter vector w to minimize training error Two problems: Discontinuous objective Might over-fit training data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.