CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.

Slides:



Advertisements
Similar presentations
Support Vector Machine & Its Applications
Advertisements

Introduction to Support Vector Machines (SVM)
Data Mining Techniques Association Rule
Support Vector Machines
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
1 Support Vector Machines Some slides were borrowed from Andrew Moore’s PowetPoint slides on SVMs. Andrew’s PowerPoint repository is here:
LOGO Classification IV Lecturer: Dr. Bo Yuan
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Support Vector Machine
Support Vector Machines for Multiple- Instance Learning Authors: Andrews, S.; Tsochantaridis, I. & Hofmann, T. (Advances in Neural Information Processing.
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
Support Vector Machines Kernel Machines
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Support Vector Machines
Mining Association Rules
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Support Vector Machine & Image Classification Applications
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
Copyright © 2001, Andrew W. Moore Support Vector Machines Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Data Mining Volinsky - Columbia University Topic 9: Advanced Classification Neural Networks Support Vector Machines 1 Credits: Shawndra Hill Andrew.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE: Support Vector Machines.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 CMSC 671 Fall 2010 Class #24 – Wednesday, November 24.
1 Support Vector Machines Chapter Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School.
Association Rule Mining
1 Support Vector Machines. Why SVM? Very popular machine learning technique –Became popular in the late 90s (Vapnik 1995; 1998) –Invented in the late.
1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines (SVMs)
Machine Learning Lecture 7: SVM Moshe Koppel Slides adapted from Andrew Moore Copyright © 2001, 2003, Andrew W. Moore.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
1 Support Vector Machines Some slides were borrowed from Andrew Moore’s PowetPoint slides on SVMs. Andrew’s PowerPoint repository is here:
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Classification - CBA CS 485: Special Topics in Data Mining Jinze Liu.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Support Vector Machines
CS 9633 Machine Learning Support Vector Machines
Support Vector Machines
Support Vector Machines
Frequent Pattern Mining
Waikato Environment for Knowledge Analysis
Support Vector Machines
Machine Learning Week 2.
Support Vector Machines
Association Rule Mining
CS 485: Special Topics in Data Mining Jinze Liu
Class #212 – Thursday, November 12
CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu
Support Vector Machines
Support Vector Machines
CS 685: Special Topics in Data Mining Spring 2009 Jinze Liu
Presentation transcript:

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) Any of these would be fine....but which is best?

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Classifier Margin f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Maximum Margin f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Maximum Margin f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against Linear SVM

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Why Maximum Margin? denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against 1.Intuitively this feels safest. 2.If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification. 3.LOOCV is easy since the model is immune to removal of any non- support-vector datapoints. 4.There’s some theory (using VC dimension) that is related to (but not the same as) the proposition that this is a good thing. 5.Empirically it works very very well.

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Estimate the Margin What is the distance expression for a point x to a line wx+b= 0? denotes +1 denotes -1 x wx +b = 0

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Estimate the Margin What is the expression for margin? denotes +1 denotes -1 wx +b = 0 Margin

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin Min-max problem  game problem

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin Strategy:

CS685 : Special Topics in Data Mining, UKY Copyright © 2001, 2003, Andrew W. Moore Learning via Quadratic Programming QP is a well-studied class of optimization algorithms to maximize a quadratic function of some real-valued variables subject to linear constraints.

CS685 : Special Topics in Data Mining, UKY SVM Related Links C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Knowledge Discovery and Data Mining, 2(2), 1998.A Tutorial on Support Vector Machines for Pattern Recognition SVM light – Software (in C) BOOK: An Introduction to Support Vector Machines N. Cristianini and J. Shawe-Taylor Cambridge University Press

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - CBA CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu

CS685 : Special Topics in Data Mining, UKY Association Rules Itemset X = {x 1, …, x k } Find all the rules X  Y with minimum support and confidence support, s, is the probability that a transaction contains X  Y confidence, c, is the conditional probability that a transaction having X also contains Y Let sup min = 50%, conf min = 50% Association rules: A  C (60%, 100%) C  A (60%, 75%) Customer buys diaper Customer buys both Customer buys beer Transaction- id Items bought 100f, a, c, d, g, I, m, p 200a, b, c, f, l,m, o 300b, f, h, j, o 400b, c, k, s, p 500a, f, c, e, l, p, m, n

CS685 : Special Topics in Data Mining, UKY Classification based on Association Classification rule mining versus Association rule mining Aim – A small set of rules as classifier – All rules according to minsup and minconf Syntax – X  y – X  Y

CS685 : Special Topics in Data Mining, UKY Why & How to Integrate Both classification rule mining and association rule mining are indispensable to practical applications. The integration is done by focusing on a special subset of association rules whose right-hand-side are restricted to the classification class attribute. – CARs: class association rules

CS685 : Special Topics in Data Mining, UKY CBA: Three Steps Discretize continuous attributes, if any Generate all class association rules (CARs) Build a classifier based on the generated CARs.

CS685 : Special Topics in Data Mining, UKY Our Objectives To generate the complete set of CARs that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf) constraints. To build a classifier from the CARs.

CS685 : Special Topics in Data Mining, UKY Rule Generator: Basic Concepts Ruleitem :condset is a set of items, y is a class label Each ruleitem represents a rule: condset->y condsupCount The number of cases in D that contain condset rulesupCount The number of cases in D that contain the condset and are labeled with class y Support =(rulesupCount/|D|)*100% Confidence =(rulesupCount/condsupCount)*100%

CS685 : Special Topics in Data Mining, UKY RG: Basic Concepts (Cont.) Frequent ruleitems – A ruleitem is frequent if its support is above minsup Accurate rule – A rule is accurate if its confidence is above minconf Possible rule – For all ruleitems that have the same condset, the ruleitem with the highest confidence is the possible rule of this set of ruleitems. The set of class association rules (CARs) consists of all the possible rules (PRs) that are both frequent and accurate.

CS685 : Special Topics in Data Mining, UKY RG: An Example A ruleitem: – assume that the support count of the condset (condsupCount) is 3, the support of this ruleitem (rulesupCount) is 2, and |D|=10 – then (A,1),(B,1) -> (class,1) supt=20% (rulesupCount/|D|)*100% confd=66.7% (rulesupCount/condsupCount)*100%

CS685 : Special Topics in Data Mining, UKY RG: The Algorithm 1 F 1 = {large 1-ruleitems}; 2 CAR 1 = genRules (F 1 ); 3 prCAR 1 = pruneRules (CAR 1 ); //count the item and class occurrences to determine the frequent 1-ruleitems and prune it 4 for (k = 2; F k-1  Ø; k++) do 5C k = candidateGen (F k-1 ); //generate the candidate ruleitems C k using the frequent ruleitems F k-1 6 for each data case d  D do //scan the database 7C d = ruleSubset (C k, d); //find all the ruleitems in C k whose condsets are supported by d 8 for each candidate c  C d do 9 c.condsupCount++; 10 if d.class = c.class then c.rulesupCount++; //update various support counts of the candidates in C k 11 end 12 end

CS685 : Special Topics in Data Mining, UKY RG: The Algorithm(cont.) 13F k = {c  C k | c.rulesupCount  minsup}; //select those new frequent ruleitems to form F k 14 CAR k = genRules(F k ); //select the ruleitems both accurate and frequent 15 prCAR k = pruneRules(CAR k ); 16 end 17 CARs =  k CAR k ; 18 prCARs =  k prCAR k ;