Max-Margin Markov Networks by Ben Taskar, Carlos Guestrin, and Daphne Koller Presented by Michael Cafarella CSE574 May 25, 2005.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines
Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.
Machine Learning with Discriminative Methods Lecture 18 – Structured Prediction CS Spring 2015 Alex Berg.
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
Pattern Recognition and Machine Learning
Support Vector Machines
Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Support Vector Machines (and Kernel Methods in general)
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Support Vector Machines for Multiple- Instance Learning Authors: Andrews, S.; Tsochantaridis, I. & Hofmann, T. (Advances in Neural Information Processing.
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
Project  Now it is time to think about the project  It is a team work Each team will consist of 2 people  It is better to consider a project of your.
Support Vector Machines Kernel Machines
Support Vector Machines
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learing.
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.
An Introduction to Support Vector Machines Martin Law.
Crash Course on Machine Learning
Conditional Random Fields Rahul Gupta (KReSIT, IIT Bombay)
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Machine Learning Queens College Lecture 13: SVM Again.
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
An Introduction to Support Vector Machines (M. Law)
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.
CS6772 Advanced Machine Learning Fall 2006 Extending Maximum Entropy Discrimination on Mixtures of Gaussians With Transduction Final Project by Barry.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
PREDICT 422: Practical Machine Learning
An Introduction to Support Vector Machines
Markov Networks.
CS 2750: Machine Learning Support Vector Machines
Support vector machines
Support vector machines
Discriminative Probabilistic Models for Relational Data
Label and Link Prediction in Relational Data
Markov Networks.
Variable Elimination Graphical Models – Carlos Guestrin
SVMs for Document Ranking
Presentation transcript:

Max-Margin Markov Networks by Ben Taskar, Carlos Guestrin, and Daphne Koller Presented by Michael Cafarella CSE574 May 25, 2005

Introduction Kernel methods (SVMs) and max- margin are terrific for classification No way to model structure, relations Graphical models (Markov networks) can capture complex structure Not trained for discrimination Maximum Margin Markov (M3) Networks capture advantages of both

Standard classification Want to learn a classification function: f(x,y) are the features (basis functions), w are weights y is a multi-label classification. The possible assignments, Y, is exponential in number of labels l So, can’t compute argmax, can’t even represent all the features

Probabilistic classification Graphical model defines P(Y|X). Select label argmax y P(y | x) Exploit sparseness in dependencies through model design. (e.g., OCR chars are independent given neighbors) We’ll use pairwise Markov network to model: Each pot-func is log sum of basis functions

M3NM3N For regular Markov networks, we train w to maximize likelihood or cond. likelihood For M 3 N, we’ll train w to maximize margin Main contribution of this paper is how to choose w accordingly

Choosing w With SVMs, choose w to maximize margin Where Constraints ensure Maximizing margin magnifies difference between value of true label and the best runner up

Multiple labels Structured problems have multiple labels, not a single classification We extend “margin” to scale with the number of mistaken labels. So we now have: Where:

Convert to optimization prob We can remove margin term to obtain a quadratic program: We have to add slack variables, because data might not be separable We can now reformulate the whole M 3 N learning problem as the following optimization task…

Grand formulation The primal: The dual: Note extra dual vars; have no effect on sol.

Unfortunately, not enough! Constraints in primal, and #vars in dual, are exponential in #labels, l Let’s interpret variables in dual as density function over y, conditional on x Dual objective is function of expectations; we need just node, edge marginals of dual vars to compute them Define marginal dual vars as:

Now reformulate the QP But first, a pause I can’t copy any more formulae. I’m sorry. It’s making me crazy. I just can’t. Please refer to the paper, section 4! OK, now back to work…

Now reformulate the QP (2) The duals vars must arise from a legal density. Or, they must be in the marginal polytope. See equation 9! That means we must enforce consistency between pairwise and singleton marginal vars See equation 10! If network is not a forest, those constraints aren’t enough Can triangulate and add new vars, constraints Or, approximate a relaxation of the polytope using belief prop

Experiment #1: Handwriting 6100 words, 8 chars long, 150 subjects Each char is 16x8 pixels Y is classified word, each Y i is one of the 26 letters LogReg and CRFs, train by max’ing cond likelihood of labels given features SVMs and M 3 N, train by margin maximization

Experiment #2: Hypertext The usual collective classification task Four CS departments. Each page is one of course, faculty, student, project, other Each page has web & anchor text, represented as binary feature vector Also has hyperlinks to other examples RMN trained to max CP of labels, given text & links SVM and M 3 N trained w/max-margin

Conclusions M 3 N seem to work great for discriminative tasks Nice to borrow theoretical results from SVMs Not much testing so far Future work should use more complicated models, problems Future presentations should be done in Latex, not Powerpoint