Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before.

Slides:

Advertisements

Similar presentations

Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Advertisements

1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,

Slides from: Doug Gray, David Poole

Search-Based Structured Prediction

Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.

Learning for Structured Prediction Overview of the Material TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A.

Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Machine learning continued Image source:

2D matching part 2 Review of alignment methods and

An Introduction to Variational Methods for Graphical Models.

John Lafferty, Andrew McCallum, Fernando Pereira

Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi.

1 Minimum Ratio Contours For Meshes Andrew Clements Hao Zhang gruvi graphics + usability + visualization.

O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

CS774. Markov Random Field : Theory and Application Lecture 06 Kyomin Jung KAIST Sep

A Graphical Model For Simultaneous Partitioning And Labeling Philip Cowans & Martin Szummer AISTATS, Jan 2005 Cambridge.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Supervised learning Given training examples of inputs and corresponding outputs, produce the “correct” outputs for new inputs Two main scenarios: –Classification:

Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.

Convergent and Correct Message Passing Algorithms Nicholas Ruozzi and Sekhar Tatikonda Yale University TexPoint fonts used in EMF. Read the TexPoint manual.

Conditional Random Fields

Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.

Structured learning Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Experiments  Synthetic data: random linear scoring function with random constraints  Information extraction: Given a citation, extract author, book-title,

Online Learning Algorithms

Conditional Random Fields Rahul Gupta (KReSIT, IIT Bombay)

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Learning Based Hierarchical Vessel Segmentation

Graphical models for part of speech tagging

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

Lecture 2: Edge detection CS4670: Computer Vision Noah Snavely From Sandlot ScienceSandlot Science.

More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.

IE with Dictionaries Cohen & Sarawagi. Announcements Current statistics: –days with unscheduled student talks: 2 –students with unscheduled student talks:

Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

Graphical models for structure extraction and information integration Sunita Sarawagi IIT Bombay

MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran – Dikkala Sai Nishanth – Ashwin P. Paranjape

Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.

Lecture 2: Statistical learning primer for biologists

Revisiting Output Coding for Sequential Supervised Learning Guohua Hao & Alan Fern School of Electrical Engineering and Computer Science Oregon State University.

John Lafferty Andrew McCallum Fernando Pereira

Rotem Golan Department of Computer Science Ben-Gurion University of the Negev, Israel.

Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.

Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore

Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.

Quiz Week 8 Topical. Topical Quiz (Section 2) What is the difference between Computer Vision and Computer Graphics What is the difference between Computer.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

MAP Estimation in Binary MRFs using Bipartite Multi-Cuts Sashank J. Reddi Sunita Sarawagi Sundar Vishwanathan Indian Institute of Technology, Bombay TexPoint.

1 Travel Times from Mobile Sensors Ram Rajagopal, Raffi Sevlian and Pravin Varaiya University of California, Berkeley Singapore Road Traffic Control TexPoint.

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Conditional Random Fields and Its Applications Presenter: Shih-Hsiang Lin 06/25/2007.

Neural Machine Translation

Efficient Inference on Sequence Segmentation Models

Empirical risk minimization

Nonparametric Semantic Segmentation

Neural networks (3) Regularization Autoencoder

Janardhan Rao (Jana) Doppa, Alan Fern, and Prasad Tadepalli

An Introduction to Variational Methods for Graphical Models

Sunita Sarawagi IIT Bombay Team: Rahul Gupta (PhD)

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

TexPoint fonts used in EMF.

Neural networks (3) Regularization Autoencoder

Lecture 6 Dynamic Programming

Sanguthevar Rajasekaran University of Connecticut

Presentation transcript:

Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A

Constituents of a structured model Feature vector f(x,y) Features: real-valued, typically binary User-defined Number of features typically very large Parameter vector w Weight of each feature Score of a prediction y for input x: s(x,y) = w. f(x,y) Many interpretations: Log unnormalized probability Negative energy

Prediction problem Predict: y * = argmax y s(x,y) Popularly known as MAP estimation Challenge: Space of possible y exponentially large Exploit decomposability of feature function over parts of y f(x,y) =  c f (x,y c,c) Form of features and MAP inference algorithms is structure specific. Examples..

Sequence labeling My review of Fermat’s last theorem by S. Singh

Sequence labeling My review of Fermat’s last theorem by S. Singh MyreviewofFermat’slast theorembyS.Singh Other Title otherAuthor t x y y1y1 y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 y8y8 y9y9 Features decompose over adjacent labels.

Sequence labeling Examples of features [ x 8 =“S.” and y 8 =“Author”] [ y 8 =“Author” and y 9 =“Author”] MAP: Viterbi can find best y in O(nm 2 )

Markov models (CRFs) Application: Image segmentation and many others y is a vector y 1, y 2,.., y n of discrete labels Features decompose over cliques of a triangulated graph MAP inference algorithms for graphical models, extensively researched Junction trees for exact, many approximate algorithms Special case: Viterbi Framework of structured models subsumes graphical models

Segmentation of sequence Application: speech recognition, information extraction Output y is a sequence of segments s 1,…,s p Feature f(x,y) decomposes over segment and label of previous segment MAP: easy extension of Viterbi O(m 2 n 2 ) m = number of labels, n = length of a sequence MyreviewofFermat’slast theorembyS.Singh Other TitleotherAuthor x y

Parse tree of a sentence Input x: “John hit the ball” Output y: parse tree Features decompose over nodes of the tree MAP: Inside/outside algorithm O(n 3 )

Sentence alignment Input: sentence pair Output: alignment Features decompose over each aligned edge MAP: Maximum weight matching Image from :

Training Given Several input output pairs (x 1 y 1 ), (x 2 y 2 ), …, (x N y N ) Error of an output: E i (y ) Example: Hamming error. Also decomposable. Train parameter vector w to minimize training error Two problems: Discontinuous objective Might over-fit training data