CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu.

Slides:



Advertisements
Similar presentations
Support Vector Machine & Its Applications
Advertisements

Introduction to Support Vector Machines (SVM)
Data Mining Techniques Association Rule
Support Vector Machines
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
1 Support Vector Machines Some slides were borrowed from Andrew Moore’s PowetPoint slides on SVMs. Andrew’s PowerPoint repository is here:
1 CSC 463 Fall 2010 Dr. Adam P. Anthony Class #27.
Support Vector Machine
Support Vector Machines Kernel Machines
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Support Vector Machines
Support Vector Machines
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Support Vector Machine & Image Classification Applications
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
Copyright © 2001, Andrew W. Moore Support Vector Machines Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University.
Introduction to SVMs. SVMs Geometric –Maximizing Margin Kernel Methods –Making nonlinear decision boundaries linear –Efficiently! Capacity –Structural.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Data Mining Volinsky - Columbia University Topic 9: Advanced Classification Neural Networks Support Vector Machines 1 Credits: Shawndra Hill Andrew.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE: Support Vector Machines.
1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines 2 (SVMs)
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Linear Document Classifier.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
1 CMSC 671 Fall 2010 Class #24 – Wednesday, November 24.
1 Support Vector Machines Chapter Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School.
1 Support Vector Machines. Why SVM? Very popular machine learning technique –Became popular in the late 90s (Vapnik 1995; 1998) –Invented in the late.
1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines (SVMs)
Machine Learning Lecture 7: SVM Moshe Koppel Slides adapted from Andrew Moore Copyright © 2001, 2003, Andrew W. Moore.
SVM – Support Vector Machines Presented By: Bella Specktor.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
1 Support Vector Machines Some slides were borrowed from Andrew Moore’s PowetPoint slides on SVMs. Andrew’s PowerPoint repository is here:
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
Support Vector Machines Louis Oliphant Cs540 section 2.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Support Vector Machines Chapter 18.9 and the paper “Support vector machines” by M. Hearst, ed., 1998 Acknowledgments: These slides combine and modify ones.
Support Vector Machine & Its Applications. Overview Intro. to Support Vector Machines (SVM) Properties of SVM Applications  Gene Expression Data Classification.
Classification - CBA CS 485: Special Topics in Data Mining Jinze Liu.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Support Vector Machines
CS 9633 Machine Learning Support Vector Machines
Support Vector Machines
Support Vector Machines
Support Vector Machines
Machine Learning Week 2.
Support Vector Machines
CS 2750: Machine Learning Support Vector Machines
Association Rule Mining
Introduction to Support Vector Machines
CS 485: Special Topics in Data Mining Jinze Liu
Class #212 – Thursday, November 12
CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu
Support Vector Machines
Support Vector Machines
CS 685: Special Topics in Data Mining Spring 2009 Jinze Liu
SVMs for Document Ranking
Presentation transcript:

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) Any of these would be fine....but which is best?

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Classifier Margin f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Maximum Margin f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Maximum Margin f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against Linear SVM

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Why Maximum Margin? denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against 1.Intuitively this feels safest. 2.If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification. 3.LOOCV is easy since the model is immune to removal of any non- support-vector datapoints. 4.There’s some theory (using VC dimension) that is related to (but not the same as) the proposition that this is a good thing. 5.Empirically it works very very well.

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Estimate the Margin What is the distance expression for a point x to a line wx+b= 0? denotes +1 denotes -1 x wx +b = 0

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Estimate the Margin What is the expression for margin? denotes +1 denotes -1 wx +b = 0 Margin

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin Min-max problem  game problem

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin Strategy:

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Learning via Quadratic Programming QP is a well-studied class of optimization algorithms to maximize a quadratic function of some real-valued variables subject to linear constraints.

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Suppose we’re in 1-dimension What would SVMs do with this data? x=0

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Suppose we’re in 1-dimension Not a big surprise Positive “plane” Negative “plane” x=0

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Harder 1-dimensional dataset That’s wiped the smirk off SVM’s face. What can be done about this? x=0

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Harder 1-dimensional dataset Remember how permitting non- linear basis functions made linear regression so much nicer? Let’s permit them here too x=0

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Harder 1-dimensional dataset Remember how permitting non- linear basis functions made linear regression so much nicer? Let’s permit them here too x=0

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Common SVM basis functions z k = ( polynomial terms of x k of degree 1 to q ) z k = ( radial basis functions of x k ) z k = ( sigmoid functions of x k ) This is sensible. Is that the end of the story? No…there’s one more trick!

CS685 : Special Topics in Data Mining, UKY SVM Related Links C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Knowledge Discovery and Data Mining, 2(2), 1998.A Tutorial on Support Vector Machines for Pattern Recognition SVM light – Software (in C) BOOK: An Introduction to Support Vector Machines N. Cristianini and J. Shawe-Taylor Cambridge University Press

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - CBA CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu

CS685 : Special Topics in Data Mining, UKY Association Rules Itemset X = {x 1, …, x k } Find all the rules X  Y with minimum support and confidence support, s, is the probability that a transaction contains X  Y confidence, c, is the conditional probability that a transaction having X also contains Y Let sup min = 50%, conf min = 50% Association rules: A  C (60%, 100%) C  A (60%, 75%) Customer buys diaper Customer buys both Customer buys beer Transaction- id Items bought 100f, a, c, d, g, I, m, p 200a, b, c, f, l,m, o 300b, f, h, j, o 400b, c, k, s, p 500a, f, c, e, l, p, m, n

CS685 : Special Topics in Data Mining, UKY Classification based on Association Classification rule mining versus Association rule mining Aim – A small set of rules as classifier – All rules according to minsup and minconf Syntax – X  y – X  Y

CS685 : Special Topics in Data Mining, UKY Why & How to Integrate Both classification rule mining and association rule mining are indispensable to practical applications. The integration is done by focusing on a special subset of association rules whose right-hand-side are restricted to the classification class attribute. – CARs: class association rules

CS685 : Special Topics in Data Mining, UKY CBA: Three Steps Discretize continuous attributes, if any Generate all class association rules (CARs) Build a classifier based on the generated CARs.

CS685 : Special Topics in Data Mining, UKY Our Objectives To generate the complete set of CARs that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf) constraints. To build a classifier from the CARs.

CS685 : Special Topics in Data Mining, UKY Rule Generator: Basic Concepts Ruleitem :condset is a set of items, y is a class label Each ruleitem represents a rule: condset->y condsupCount The number of cases in D that contain condset rulesupCount The number of cases in D that contain the condset and are labeled with class y Support =(rulesupCount/|D|)*100% Confidence =(rulesupCount/condsupCount)*100%

CS685 : Special Topics in Data Mining, UKY RG: Basic Concepts (Cont.) Frequent ruleitems – A ruleitem is frequent if its support is above minsup Accurate rule – A rule is accurate if its confidence is above minconf Possible rule – For all ruleitems that have the same condset, the ruleitem with the highest confidence is the possible rule of this set of ruleitems. The set of class association rules (CARs) consists of all the possible rules (PRs) that are both frequent and accurate.

CS685 : Special Topics in Data Mining, UKY RG: An Example A ruleitem: – assume that the support count of the condset (condsupCount) is 3, the support of this ruleitem (rulesupCount) is 2, and |D|=10 – then (A,1),(B,1) -> (class,1) supt=20% (rulesupCount/|D|)*100% confd=66.7% (rulesupCount/condsupCount)*100%

CS685 : Special Topics in Data Mining, UKY RG: The Algorithm 1 F 1 = {large 1-ruleitems}; 2 CAR 1 = genRules (F 1 ); 3 prCAR 1 = pruneRules (CAR 1 ); //count the item and class occurrences to determine the frequent 1-ruleitems and prune it 4 for (k = 2; F k-1  Ø; k++) do 5C k = candidateGen (F k-1 ); //generate the candidate ruleitems C k using the frequent ruleitems F k-1 6 for each data case d  D do //scan the database 7C d = ruleSubset (C k, d); //find all the ruleitems in C k whose condsets are supported by d 8 for each candidate c  C d do 9 c.condsupCount++; 10 if d.class = c.class then c.rulesupCount++; //update various support counts of the candidates in C k 11 end 12 end

CS685 : Special Topics in Data Mining, UKY RG: The Algorithm(cont.) 13F k = {c  C k | c.rulesupCount  minsup}; //select those new frequent ruleitems to form F k 14 CAR k = genRules(F k ); //select the ruleitems both accurate and frequent 15 prCAR k = pruneRules(CAR k ); 16 end 17 CARs =  k CAR k ; 18 prCARs =  k prCAR k ;