Considering Cost Asymmetry in Learning Classifiers Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007 by Bach, Heckerman and.

Slides:



Advertisements
Similar presentations
3.6 Support Vector Machines
Advertisements

ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz.
An Introduction of Support Vector Machine
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
CMPUT 466/551 Principal Source: CMU
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Support Vector Machines
Coefficient Path Algorithms Karl Sjöstrand Informatics and Mathematical Modelling, DTU.
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Active Learning with Support Vector Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Trading Convexity for Scalability Marco A. Alvarez CS7680 Department of Computer Science Utah State University.
Support Vector Machines
Model Selection via Bilevel Optimization Kristin P. Bennett, Jing Hu, Xiaoyun Ji, Gautam Kunapuli and Jong-Shi Pang Department of Mathematical Sciences.
Efficient Model Selection for Support Vector Machines
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
SVM by Sequential Minimal Optimization (SMO)
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Max-Margin Classification of Data with Absent Features Presented by Chunping Wang Machine Learning Group, Duke University July 3, 2008 by Chechik, Heitz,
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Cross Validation of SVMs for Acoustic Feature Classification using Entire Regularization Path Tianyu Tom Wang T. Hastie et al WS04.
Biointelligence Laboratory, Seoul National University
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Linear Separation and Margins. Non-Separable and.
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
FUZZ-IEEE Kernel Machines and Additive Fuzzy Systems: Classification and Function Approximation Yixin Chen and James Z. Wang The Pennsylvania State.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
Markus Uhr Feature Extraction Sparse, Flexible and Efficient Modeling using L 1 -Regularization Saharon Rosset and Ji Zhu.
Support vector machines
Computational Intelligence: Methods and Applications
Geometrical intuition behind the dual problem
Support Vector Machines
An Introduction to Support Vector Machines
Kernels Usman Roshan.
Statistical Learning Dong Liu Dept. EEIS, USTC.
Usman Roshan CS 675 Machine Learning
Support vector machines
SVMs for Document Ranking
Discriminative Training
Presentation transcript:

Considering Cost Asymmetry in Learning Classifiers Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007 by Bach, Heckerman and Horvitz

Outline Introduction SVM with Asymmetric Cost SVM Regularization Path (Hastie et al., 2005) Path with Cost Asymmetry Results Conclusions

Introduction (1) Binary classification A classifier could be defined as based on a linear decision function real-valued predictors binary response Parameters

Introduction (2) Two types of misclassification: false negative: cost false positive: cost Expected cost: In terms of 0-1 loss function Real loss function but Non-convex Non-differentiable

Introduction (3) Convex loss functions – surrogates for the 0-1 loss function (for training purpose)

Introduction (4) Empirical cost given n labeled data points Objective function regularization asymmetry Motivation: efficiently look at many training asymmetries even if the testing asymmetry is given. Since convex surrogates of the 0-1 loss function are used for training, the cost asymmetries for training and testing are mismatched.

SVM with Asymmetric Cost (1) hinge loss where SVM with asymmetric cost

SVM with Asymmetric Cost (2) The Lagrangian with dual variables Karush-Kuhn-Tucker (KKT) conditions

SVM with Asymmetric Cost (3) The dual problem where A quadratic optimization problem given a cost structure Computation will be intractable for the whole space Following the SVM regularization path algorithm (Hastie et al., 2005), the authors deal with (1)-(3) and KKT conditions instead of the dual problem.

SVM Regularization Path (1) SVM regularization path The cost is symmetric and thus searching is along the axis. Define active sets of data points: Margin: Left of margin: Right of margin: KKT conditions

SVM Regularization Path (2) Initialization ( ) Consider sufficiently large (C is very small), all the points are in L Remain Decrease One or more positive and negative examples hit the margin simultaneously with

SVM Regularization Path (3) Define The critical condition for first two points hitting the margin Initialization ( ) For, this initial condition keeps the same except the definition of.

SVM Regularization Path (4) The path: decrease, changes only for except that one of the following events happens A point from L or R has entered M; A point in M has left the set to join either R or L consider only the points on the margin where is some function of, Therefore, the for points on the margin proceed linearly in ; the function changes in a piecewise-inverse manner in

SVM Regularization Path (4) The path: decrease, changes only for except that one of the following events happens A point from L or R has entered M; A point in M has left the set to join either R or L consider only the points on the margin where is some function of, Therefore, the for points on the margin proceed linearly in ; the function changes in a piecewise-inverse manner in.

SVM Regularization Path (5) Update regularization Update active sets and solutions Stopping condition In the separable case, we terminate when L become empty; In the non-separable case, we terminate when for all the possible events

Path with Cost Asymmetry (1) Exploration in the 2-d space Path initialization: start at situations when all points are in L Follow the updating procedure in the 1-d case along the line Regularization is changing and the cost asymmetry is fixed. Among all the classifiers, find the best one, given user’s cost function Paths starting from

Path with Cost Asymmetry (2) Produce ROC Collecting R lines in the direction of, we can build three ROC curves

Results (1) For 1000 testing asymmetries, three methods are compared:  “one” – take as training cost asymmetry;  “int” – vary the intercept of “one” and build an ROC, then select the optimal classifier;  “all” – select the optimal classifier from the ROC obtained by varying both the training asymmetry and the intercept. Use a nested cross-validation:  The outer cross-validation: produce overall accuracy estimates for the classifier;  The inner cross-validation: select optimal classifier parameters (training asymmetry and/or intercept).

Results (2)

Conclusions An efficient algorithm is presented to build ROC curves by varying the training cost asymmetries for SVMs. The main contribution is generalizing the SVM regularization path (Hastie et al., 2005) from a 1-d axis to a 2-d plane. Because of the usage of a convex surrogate, using the testing asymmetry for training leads to non-optimal classifier. Results show advantages of considering more training asymmetries.