CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu
CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?
CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?
CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?
CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?
CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) Any of these would be fine....but which is best?
CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Classifier Margin f x y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.
CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Maximum Margin f x y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM
CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Maximum Margin f x y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against Linear SVM
CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Why Maximum Margin? denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against 1.Intuitively this feels safest. 2.If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification. 3.LOOCV is easy since the model is immune to removal of any non- support-vector datapoints. 4.There’s some theory (using VC dimension) that is related to (but not the same as) the proposition that this is a good thing. 5.Empirically it works very very well.
CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Estimate the Margin What is the distance expression for a point x to a line wx+b= 0? denotes +1 denotes -1 x wx +b = 0
CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Estimate the Margin What is the expression for margin? denotes +1 denotes -1 wx +b = 0 Margin
CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin
CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin Min-max problem game problem
CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin Strategy:
CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Learning via Quadratic Programming QP is a well-studied class of optimization algorithms to maximize a quadratic function of some real-valued variables subject to linear constraints.
CS685 : Special Topics in Data Mining, UKY SVM Related Links C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Knowledge Discovery and Data Mining, 2(2), 1998.A Tutorial on Support Vector Machines for Pattern Recognition SVM light – Software (in C) BOOK: An Introduction to Support Vector Machines N. Cristianini and J. Shawe-Taylor Cambridge University Press
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - CBA CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu
CS685 : Special Topics in Data Mining, UKY Association Rules Itemset X = {x 1, …, x k } Find all the rules X Y with minimum support and confidence support, s, is the probability that a transaction contains X Y confidence, c, is the conditional probability that a transaction having X also contains Y Let sup min = 50%, conf min = 50% Association rules: A C (60%, 100%) C A (60%, 75%) Customer buys diaper Customer buys both Customer buys beer Transaction- id Items bought 100f, a, c, d, g, I, m, p 200a, b, c, f, l,m, o 300b, f, h, j, o 400b, c, k, s, p 500a, f, c, e, l, p, m, n
CS685 : Special Topics in Data Mining, UKY Classification based on Association Classification rule mining versus Association rule mining Aim – A small set of rules as classifier – All rules according to minsup and minconf Syntax – X y – X Y
CS685 : Special Topics in Data Mining, UKY Why & How to Integrate Both classification rule mining and association rule mining are indispensable to practical applications. The integration is done by focusing on a special subset of association rules whose right-hand-side are restricted to the classification class attribute. – CARs: class association rules
CS685 : Special Topics in Data Mining, UKY CBA: Three Steps Discretize continuous attributes, if any Generate all class association rules (CARs) Build a classifier based on the generated CARs.
CS685 : Special Topics in Data Mining, UKY Our Objectives To generate the complete set of CARs that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf) constraints. To build a classifier from the CARs.
CS685 : Special Topics in Data Mining, UKY Rule Generator: Basic Concepts Ruleitem :condset is a set of items, y is a class label Each ruleitem represents a rule: condset->y condsupCount The number of cases in D that contain condset rulesupCount The number of cases in D that contain the condset and are labeled with class y Support =(rulesupCount/|D|)*100% Confidence =(rulesupCount/condsupCount)*100%
CS685 : Special Topics in Data Mining, UKY RG: Basic Concepts (Cont.) Frequent ruleitems – A ruleitem is frequent if its support is above minsup Accurate rule – A rule is accurate if its confidence is above minconf Possible rule – For all ruleitems that have the same condset, the ruleitem with the highest confidence is the possible rule of this set of ruleitems. The set of class association rules (CARs) consists of all the possible rules (PRs) that are both frequent and accurate.
CS685 : Special Topics in Data Mining, UKY RG: An Example A ruleitem: – assume that the support count of the condset (condsupCount) is 3, the support of this ruleitem (rulesupCount) is 2, and |D|=10 – then (A,1),(B,1) -> (class,1) supt=20% (rulesupCount/|D|)*100% confd=66.7% (rulesupCount/condsupCount)*100%
CS685 : Special Topics in Data Mining, UKY RG: The Algorithm 1 F 1 = {large 1-ruleitems}; 2 CAR 1 = genRules (F 1 ); 3 prCAR 1 = pruneRules (CAR 1 ); //count the item and class occurrences to determine the frequent 1-ruleitems and prune it 4 for (k = 2; F k-1 Ø; k++) do 5C k = candidateGen (F k-1 ); //generate the candidate ruleitems C k using the frequent ruleitems F k-1 6 for each data case d D do //scan the database 7C d = ruleSubset (C k, d); //find all the ruleitems in C k whose condsets are supported by d 8 for each candidate c C d do 9 c.condsupCount++; 10 if d.class = c.class then c.rulesupCount++; //update various support counts of the candidates in C k 11 end 12 end
CS685 : Special Topics in Data Mining, UKY RG: The Algorithm(cont.) 13F k = {c C k | c.rulesupCount minsup}; //select those new frequent ruleitems to form F k 14 CAR k = genRules(F k ); //select the ruleitems both accurate and frequent 15 prCAR k = pruneRules(CAR k ); 16 end 17 CARs = k CAR k ; 18 prCARs = k prCAR k ;