Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
On-line learning and Boosting
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Linear Classifiers (perceptrons)
Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.
Data Mining Classification: Alternative Techniques
Crash Course on Machine Learning Part III
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Restrict learning to a model-dependent “easy” set of samples General form of objective: Introduce indicator of “easiness” v i : K determines threshold.
Support Vector Machine
Support Vector Machines (and Kernel Methods in general)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Project  Now it is time to think about the project  It is a team work Each team will consist of 2 people  It is better to consider a project of your.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Online Learning Algorithms
An Introduction to Support Vector Machines Martin Law.
Crash Course on Machine Learning
Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.
8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Max-Margin Markov Networks by Ben Taskar, Carlos Guestrin, and Daphne Koller Presented by Michael Cafarella CSE574 May 25, 2005.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Linear Models for Classification
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci
Learning from Big Data Lecture 5
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Learning Deep Generative Models by Ruslan Salakhutdinov
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
An Introduction to Support Vector Machines
Learning with information of features
CS 2750: Machine Learning Support Vector Machines
Recitation 6: Kernel SVM
Support Vector Machines and Kernels
Discriminative Probabilistic Models for Relational Data
Label and Link Prediction in Relational Data
Primal Sparse Max-Margin Markov Networks
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004

Topics Covered  Main Idea.  Problem Setting.  Structure in classification problems.  Markov Model.  SVM  Combining SVM and Markov Network.  Generalization Bound.  Experiments and results.

Main Idea  Combining SVM(kernel based approach) and Markov network(graphical model) for sequential,structured learning.  SVM - 1) Ability to use high dimensional feature spaces. 2) Incapable of exploiting structure in the problem.  Markov Network - 1) Ability to represent correlations between labels by exploiting structure in the problem. 2) Incapable of dealing with high dimensional feature spaces.

Problem Setting  Multilabel Classification  Training data as Input:  Target is to predict y given new x.  We take example of OCR data.

Structure in classification problems  Feature function  Hypothesis  For multilabel classification number of possible assignments to y is exponential in the number of labels making arg max over y difficult to compute.  Alternative approach is to use probabilistic graphical models.

Markov Model  Use pairwise Markov Model.  Defined as a graph G=(Y,E).  Each edge (i,j) is associated with a potential function  The network encodes a conditional probability distribution as  Now we can take = f(x,y) to predict y for x.

SVM

Combining SVM and Markov Network  For single-label binary classification,Crammer and Singer provide an extension of SVM framework by maximizing the margin. where  The constraints ensure that  Here we are predicting multiple labels so loss function won’t be simply as o-1 loss but per label loss.

 More specifically margin between t(x) and y scales linearly with number of wrong labels in y: where :  However there is a problem with the above approach which is discussed in Taskar et al. This approach may give significant weight to output values that are not even close to target values because every increase in the loss increases the required margin.

 Now using standard transformation to eliminate and introducing slack variables we will have primal and dual:

Generalization Bound  Relate training error to testing error.  Average per label loss:  margin per label loss:

 with probability at least where q is the maximum edge degree in the network,l is the number of labels,K is a constant and k is number of classes in a label.

Experiments and Results  Handwriting recognition: 1) Input corpus contains 6100 handwritten words. 2) Data set divided into 10 folds of 600 training and 5500 testing examples. 3) Accuracy results are average over 10 folds.

 Hypertext classification: 1) Dataset contains web pages from 4 different CS departments. 2) Each page is labelled as course,faculty,student,project,other. 3) Learn model from three schools and test on remaining. 4) Error rate of M^3N is 40% lower than that of RMN’s and 51% lower than multi-class SVMs.

THANK YOU