Download presentation
Presentation is loading. Please wait.
Published byAnthony Jesse Fitzgerald Modified over 6 years ago
1
IS422P - Data Mining [2013] 3- Classification
11/16/2018 1:17 PM IS422P - Data Mining [2013] 3- Classification Mervat AbuElkheir Information Systems Department Faculty of Computer and Information Sciences Mansoura University © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
2
Decision Tree Induction Improving Classification Accuracy
11/16/2018 1:17 PM Agenda What is Classification? General Approach The Basics The Algorithm Attribute Selection Measures Tree Pruning Extracting Rules from Decision Trees Decision Tree Induction Bayes’ Theorem Naïve Bayesian Classification Bayes Classification K-Nearest Neighbor Classifiers Lazy Learners Metrics for Evaluating Classifiers Performance Holdout, Random Subsampling, and Cross-Validation Model Evaluation Bagging Improving Classification Accuracy Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
3
Decision Tree Induction Improving Classification Accuracy
11/16/2018 1:17 PM Agenda What is Classification? General Approach The Basics The Algorithm Attribute Selection Measures Tree Pruning Extracting Rules from Decision Trees Decision Tree Induction Bayes’ Theorem Naïve Bayesian Classification Bayes Classification K-Nearest Neighbor Classifiers Lazy Learners Metrics for Evaluating Classifiers Performance Holdout, Random Subsampling, and Cross-Validation Model Evaluation Bagging Improving Classification Accuracy Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
4
The Basics What is Classification
11/16/2018 1:17 PM The Basics What is Classification Motivation: Prediction Is a bank loan applicant “safe” or “risky”? Which treatment is better for patient, “treatmentX” or “treatmentY”? Classification is a data analysis task where a model is constructed to predict class labels (categories) Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
5
The Basics General Approach
11/16/2018 1:17 PM The Basics General Approach A two-step process: Learning (training) step construct classification model Build classifier for a predetermined set of classes Learn from a training dataset (data tuples + their associated classes) Supervised Learning Classification step model is used to predict class labels for given data (test set) Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
6
The Basics General Approach
11/16/2018 1:17 PM The Basics General Approach Attribute vector Class label Training data Classification Algorithm Classification rules Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
7
The Basics General Approach
11/16/2018 1:17 PM The Basics General Approach Classification rules Estimate classifier accuracy (to avoid overfitting) Predict classification of new data (Mervat Fahmy, youth, medium) Loan decision? Test data % test set tuples correctly classified Risky Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
8
Decision Tree Induction
11/16/2018 1:17 PM Decision Tree Induction Learning of decision trees from training dataset Decision tree A flowchart-like tree structure Internal node a test on an attribute Branch a test outcome Leaf node a class label Constructed tree can be binary or otherwise Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
9
Example DT: Survival of Passengers on the Titanic
11/16/2018 1:17 PM Example DT: Survival of Passengers on the Titanic Is gender male? Is age > 9.5? Died % Is sibsp >2.5? 0.05 2% Survived 0.89 2% % Yes No probability of survival % observations in leaf # spouses/siblings aboard Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
10
Decision Tree Induction
11/16/2018 1:17 PM Decision Tree Induction Benefits No domain knowledge required No parameter setting Can handle multidimensional data Easy-to-understand representation Simple and fast Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
11
Decision Tree Induction A Bit of History
11/16/2018 1:17 PM Decision Tree Induction A Bit of History Late 70s J. Ross Quinlan developed ID3 Early 80s Quinlan again developed C4.5 (ID3 successor) became benchmark algorithm 1984 L. Breiman, J. Friedman, R. Olshen, and C. Stone developed CART (Classification & Regression Trees) binary DTs All three are greedy algorithms top-down recursive divide-and-conquer tree construction ID stands for Iterative Dichotomizer Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
12
Decision Tree Induction The Algorithm
11/16/2018 1:17 PM Decision Tree Induction The Algorithm N All of D All D ∈ same class? Training Dataset D No Yes Attribute List Classification Algorithm Attribute Selection Method Attribute Selection Method Splitting attribute and Split point(s) or Splitting subsets Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
13
Decision Tree Induction The Algorithm
11/16/2018 1:17 PM Decision Tree Induction The Algorithm Splitting Criterion Outcome 1 Outcome n Training Dataset D Partition 1 Partition n All Partition n ∈ same class? Attribute List Classification Algorithm No Yes Attribute Selection Method Attribute Selection Method Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
14
Decision Tree Induction The Algorithm
11/16/2018 1:17 PM Decision Tree Induction The Algorithm Splitting Attribute Splitting Criterion Outcome 1 Outcome n Discrete Partition 1 Partition n Continuous All Partition n ∈ same class? No Yes Discrete Attribute Selection Method Binary Tree Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
15
Decision Tree Induction The Algorithm
11/16/2018 1:17 PM Decision Tree Induction The Algorithm Splitting Criterion is a test: Which attribute to test at node N What is the “best” way to partition D into mutually exclusive classes which (and how many) branches to grow from node N to represent the test outcomes Resulting partitions at each branch should be as “pure” as possible A partition is “pure” if all its tuples belong to the same class When attribute is chosen to split training data set, it’s removed from attribute list Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
16
Decision Tree Induction The Algorithm
11/16/2018 1:17 PM Decision Tree Induction The Algorithm Terminating conditions All the tuples in D (represented at node N) belong to the same class There are no remaining attributes on which the tuples may be further partitioned majority voting is employed convert node into a leaf and label it with the most common class in data partition There are no tuples for a given branch a leaf is created with the majority class in data partition Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
17
Data Mining 2013 – Classification
11/16/2018 1:17 PM DTs The Algorithm Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
18
Decision Tree Induction Attribute Selection Measures
11/16/2018 1:17 PM Decision Tree Induction Attribute Selection Measures Attribute selection measure a heuristic for selecting the splitting criterion that “best” splits a given data partition into smaller mutually exclusive classes Attributes are ranked according to a measure attribute having the best score is chosen as the splitting attribute split-point for continuous attributes splitting subset for discrete attributes with binary trees Measures: Information Gain, Gain Ratio, Gini Index Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
19
Decision Tree Induction Attribute Selection Measures
11/16/2018 1:17 PM Decision Tree Induction Attribute Selection Measures Information Gain Based on Shannon’s information theory Goal is to minimize the expected number of tests needed to classify a tuple guarantee that a simple tree is found Attribute with the highest information gain is chosen as the splitting attribute minimizes information needed to classify tuples in resulting partitions reflects least “impurity” in resulting partitions Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
20
Decision Tree Induction Attribute Selection Measures
11/16/2018 1:17 PM Decision Tree Induction Attribute Selection Measures Given m class labels (Ci, i = 1 to m) Expected information needed to classify a tuple in D 𝑖𝑛𝑓𝑜 𝐷 =𝒆𝒏𝒕𝒓𝒐𝒑𝒚=− 𝑖=1 𝑚 𝑝 𝑖 log 2 ( 𝑝 𝑖 ) 𝑝 𝑖 probability that an arbitrary tuple in D belongs to class Ci 𝑝 𝑖 = 𝐶 𝑖,𝐷 𝐷 𝐶 𝑖,𝐷 set of tuples having class label Ci in partition D Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
21
Decision Tree Induction Attribute Selection Measures
11/16/2018 1:17 PM Decision Tree Induction Attribute Selection Measures How much more information would be needed after partitioning to arrive at a “pure” classification? Expected information required to classify a tuple from D based on the partitioning by attribute A: 𝑖𝑛𝑓𝑜 𝐴 𝐷 = 𝑗=1 𝑣 𝐷 𝑗 𝐷 ×𝑖𝑛𝑓𝑜( 𝐷 𝑗 ) The smaller the expected information still required, the greater the purity of the partitions Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
22
Decision Tree Induction Attribute Selection Measures
11/16/2018 1:17 PM Decision Tree Induction Attribute Selection Measures Information gain is the difference between the original information requirement (based on proportion of classes) and the new requirement (after partitioning on A) 𝐺𝑎𝑖𝑛 𝐴 = 𝑖𝑛𝑓𝑜 𝐷 − 𝑖𝑛𝑓𝑜 𝐴 𝐷 Gain(A) tells you how much would be gained by branching on A expected reduction in the information requirement caused by knowing the value of A attribute A with the highest Gain(A) is chosen as the splitting attribute at node N Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
23
Decision Tree Induction Attribute Selection Measures
11/16/2018 1:17 PM Decision Tree Induction Attribute Selection Measures RID age income student Credit_rating Class: buys_computer 1 youth high no fair 2 excellent 3 middle aged yes 4 senior medium 5 low 6 7 8 9 10 11 12 13 14 C1 = Yes = 9 C2 = No = 5 1- compute info(D) = − 𝑖=1 𝑚 𝑝 𝑖 log 2 ( 𝑝 𝑖 ) =− log − log =0.940 bits 2- compute infoage(D) = 𝑗=1 𝑣 𝐷 𝑗 𝐷 ×𝑖𝑛𝑓𝑜( 𝐷 𝑗 ) = 5 14 × − log − log × − log × − log − log =0.694 bits Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
24
Decision Tree Induction Attribute Selection Measures
11/16/2018 1:17 PM Decision Tree Induction Attribute Selection Measures RID age income student Credit_rating Class: buys_computer 1 youth high no fair 2 excellent 3 middle aged yes 4 senior medium 5 low 6 7 8 9 10 11 12 13 14 C1 = Yes = 9 C2 = No = 5 1- compute info(D) = − 𝑖=1 𝑚 𝑝 𝑖 log 2 ( 𝑝 𝑖 ) =− log − log =0.940 bits 2- compute infoage(D) = 𝑗=1 𝑣 𝐷 𝑗 𝐷 ×𝑖𝑛𝑓𝑜( 𝐷 𝑗 ) = 5 14 × − log − log × − log × − log − log =0.694 bits Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
25
Decision Tree Induction Attribute Selection Measures
11/16/2018 1:17 PM Decision Tree Induction Attribute Selection Measures RID age income student Credit_rating Class: buys_computer 1 youth high no fair 2 excellent 3 middle aged yes 4 senior medium 5 low 6 7 8 9 10 11 12 13 14 C1 = Yes = 9 C2 = No = 5 1- compute info(D) = − 𝑖=1 𝑚 𝑝 𝑖 log 2 ( 𝑝 𝑖 ) =− log − log =0.940 bits 2- compute infoage(D) = 𝑗=1 𝑣 𝐷 𝑗 𝐷 ×𝑖𝑛𝑓𝑜( 𝐷 𝑗 ) = 5 14 × − log − log × − log × − log − log =0.694 bits 3- compute Gain(age) = = bits Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
26
Decision Tree Induction Attribute Selection Measures
11/16/2018 1:17 PM Decision Tree Induction Attribute Selection Measures RID age income student Credit_rating Class: buys_computer 1 youth high no fair 2 excellent 3 middle aged yes 4 senior medium 5 low 6 7 8 9 10 11 12 13 14 Gain(income) = bits Gain(student) = bits Gain(credit_rating) = bits Gain(age) has highest information gain Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
27
Decision Tree Induction Attribute Selection Measures
11/16/2018 1:17 PM Decision Tree Induction Attribute Selection Measures Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
28
Decision Tree Induction Attribute Selection Measures
11/16/2018 1:17 PM Decision Tree Induction Attribute Selection Measures Age? Credit_rating? No Yes yes Student? senior middle-aged youth fair excellent no Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
29
Decision Tree Induction Attribute Selection Measures
11/16/2018 1:17 PM Decision Tree Induction Attribute Selection Measures Information gain for continuous attributes Sort values in increasing order Each midpoint between two adjacent values can serve as split-point Split-point between two values vi and vi+1 = 𝑣 𝑖 + 𝑣 𝑖+1 2 For each split-point, evaluate infoA(D) with the number of partitions = 2 (A ≤ split-point & A > split-point) Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
30
Decision Tree Induction Tree Pruning
11/16/2018 1:17 PM Decision Tree Induction Tree Pruning Data may be overfitted to dataset anomalies and outliers Pruning removes the least reliable branches DT becomes less complex Prepruning statistically assess the goodness of a split before it takes place hard to choose thresholds for statistical significance Postpruning remove sub-trees from already constructed trees remove sub-tree branches and replace with leaf node leaf is labeled with most frequent class in sub-tree Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
31
Decision Tree Induction Tree Pruning
11/16/2018 1:17 PM Decision Tree Induction Tree Pruning Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
32
Decision Tree Induction Rule Extraction from a Decision Tree
11/16/2018 1:17 PM Decision Tree Induction Rule Extraction from a Decision Tree Rules represent information and knowledge IF you study well THEN you’ll succeed IF you’re a student AND you have 5000LE THEN you most probably will buy an iPad (confidence?) How to assess the goodness of a rule? 𝑐𝑜𝑣𝑒𝑟𝑎𝑔𝑒 𝑅 = 𝑛 𝑐𝑜𝑣𝑒𝑟𝑠 𝐷 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑅 = 𝑛 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑛 𝑐𝑜𝑣𝑒𝑟𝑠 Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
33
11/16/2018 1:17 PM Decision Tree Induction Rule Extraction from a Decision Tree – What are Rules? RID age income student Credit_rating Class: buys_computer 1 youth high no fair 2 excellent 3 middle aged yes 4 senior medium 5 low 6 7 8 9 10 11 12 13 14 R1: (age = youth) ^ (student = yes) (buys computer = yes) 𝑐𝑜𝑣𝑒𝑟𝑎𝑔𝑒 𝑅1 = =14.28% 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑅1 = 2 2 =100% X: (age = youth, income = medium, student = yes, credit_rating=fair) Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
34
Decision Tree Induction Rule Extraction from a Decision Tree
11/16/2018 1:17 PM Decision Tree Induction Rule Extraction from a Decision Tree Create one rule for each path from root to leaf in the decision tree Each splitting criterion is ANDed to form rule antecedent (IF) Leaf node holds class prediction (THEN) Age? Credit_rating? No Yes yes Student? senior middle-aged youth fair excellent no Can the rules resulting from decision trees have conflicts? R1: IF age =youth AND student =no THEN buys computer =no R2: IF age =youth AND student =yes THEN buys computer =yes R3: IF age =middle aged THEN buys computer =yes R4: IF age =senior AND credit rating =excellent THEN buys computer =yes R5: IF age = senior AND credit rating =fair THEN buys computer =no R1: IF age =youth AND student =no THEN buys computer =no Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
35
Data Mining 2013 – Classification
11/16/2018 1:17 PM Decision Tree Induction Rule Extraction from a DT – Resolving Rules Conflicts Rules conflicts are the result of a tuple firing more than one rule with different class predictions Two resolution strategies Size Ordering rule with largest antecedent (toughest) has highest priority fires and returns class prediction Rule Ordering rules prioritized apriori according to Class-based ordering decreasing importance (most frequent are highest – order of prevalence) Rule-based ordering measures of rule quality (e.g. accuracy, size, domain expertise) Fallback (default) rule when no rules are triggered Rule ordering - in the first strategy, overall the rules are unordered. They can be applied in any order when classifying a tuple. That is, a disjunction (logical OR) is implied between each of the rules. Each rule represents a standalone nugget or piece of knowledge. This is in contrast to the rule ordering (decision list) scheme for which rules must be applied in the prescribed order so as to avoid conflicts. Each rule in a decision list implies the negation of the rules that come before it in the list. Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
36
Decision Tree Induction Improving Classification Accuracy
11/16/2018 1:17 PM Agenda What is Classification? General Approach The Basics The Algorithm Attribute Selection Measures Tree Pruning Extracting Rules from Decision Trees Decision Tree Induction Bayes’ Theorem Naïve Bayesian Classification Bayes Classification K-Nearest Neighbor Classifiers Lazy Learners Metrics for Evaluating Classifiers Performance Holdout, Random Subsampling, and Cross-Validation Model Evaluation Bagging Improving Classification Accuracy Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
37
Bayes Classification Methods
11/16/2018 1:17 PM Bayes Classification Methods Naïve Bayesian classifier Statistical classifier that predicts the probability that a tuple belongs to a specific class Based on Bayes Theorem Bayes was an 18th century clergyman who worked on probability High accuracy Speed Class-conditional Independence Attributes’ effect on class determination is independent Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
38
Bayes Classification Methods Bayes’ Theorem
11/16/2018 1:17 PM Bayes Classification Methods Bayes’ Theorem 𝑋 is a tuple representing “evidence”, 𝐻 is the hypothesis “𝑋∈𝐶” Goal: determine posteriori probability 𝑃(𝐻|𝑋) probability that 𝐻 holds given that we “observed” 𝑋 i.e. probability that 𝑋∈𝐶 given that we know attribute description of 𝑋 𝑃(𝑋|𝐻) probability that 𝑋 has specific attribute values given that we know its class Posteriori probability is based on more information (conditional) 𝑃(𝐻) is priori probability of 𝐻 probability that any tuple belongs to a class, independent of its attribute values 𝑃(𝑋) is probability that 𝑋 has specific attribute values Bayes’ Theorem 𝑃 𝐻 𝑋 = 𝑃 𝑋 𝐻 𝑃(𝐻) 𝑃(𝑋) Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
39
Bayes Classification Methods Naïve Bayesian Classification
11/16/2018 1:17 PM Bayes Classification Methods Naïve Bayesian Classification Given tuples with n attributes and m classes, Naïve Bayes predicts that X belong to class with highest posteriori probability 𝑃 𝐶 𝑖 𝑋 >𝑃 𝐶 𝑗 𝑋 𝑓𝑜𝑟 1≤𝑗≤𝑚, 𝑗≠𝑖 𝐶 𝑖 is called the maximum posteriori hypothesis 𝑃 𝐶 𝑖 𝑋 = 𝑃 𝑋 𝐶 𝑖 𝑃( 𝐶 𝑖 ) 𝑃(𝑋) Since 𝑃(𝑋) is constant, maximize only numerator If 𝑃( 𝐶 𝑖 ) is unknown for all i, assume uniform probability Then you only have to maximize 𝑃( 𝑋|𝐶 𝑖 ) Otherwise, 𝑃 𝐶 𝑖 = 𝐶 𝑖,𝐷 𝐷 Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
40
Bayes Classification Methods Naïve Bayesian Classification
11/16/2018 1:17 PM Bayes Classification Methods Naïve Bayesian Classification To reduce computation of 𝑃( 𝑋|𝐶 𝑖 ), attributes are assumed to be independent hence the “naïve” in the name 𝑃 𝑋|𝐶 𝑖 = 𝑘=1 𝑛 𝑃( 𝑥 𝑘 |𝐶 𝑖 ) =𝑃( 𝑥 1 | 𝐶 𝑖 )×𝑃( 𝑥 2 | 𝐶 𝑖 )×⋯×𝑃( 𝑥 𝑛 | 𝐶 𝑖 ) If attribute is categorical 𝑃 𝑥 𝑘 |𝐶 𝑖 = 𝐶 𝑖,𝐷, 𝑥 𝑘 𝐶 𝑖,𝐷 If attribute is numerical assume Gaussian distribution 𝑃 𝑥 𝑘 |𝐶 𝑖 = 1 2𝜋 𝜎 𝐶 𝑖 𝑒 − ( 𝑥 𝑘 − 𝜇 𝐶 𝑖 ) 𝜎 𝐶 𝑖 2 Evaluate for each 𝐶 𝑖 , assign class label of class with max 𝑃 𝑋|𝐶 𝑖 Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
41
Bayes Classification Methods Naïve Bayesian Classification - Example
11/16/2018 1:17 PM Bayes Classification Methods Naïve Bayesian Classification - Example RID age income student Credit_rating Class: buys_computer 1 youth high no fair 2 excellent 3 middle aged yes 4 senior medium 5 low 6 7 8 9 10 11 12 13 14 C1 = Yes = 9 C2 = No = 5 X: (age = youth, income = medium, student = yes, credit_rating=fair) 1- compute P(Ci): P(C1) = 9/14 = 0.643 P(C2) = 5/14 = 0.357 2- compute P(X|Ci): P(X|C1) = P(age=youth|buys_computer=yes) ×P(income=medium|buys_computer=yes) ×P(student=yes|buys_computer=yes) ×P(credit_rating=fair|buys_computer=yes) = 2 9 × 4 9 × 6 9 × 6 9 = 0.044 P(X|C2) = 𝑋 ∈ 𝐶 1 Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
42
Lazy Learners K-Nearest Neighbor Classifiers
11/16/2018 1:17 PM Lazy Learners K-Nearest Neighbor Classifiers Delay classification until new test data is available Store training data meanwhile Use similarity measure to compute distance between test data tuple and each of the training data tuples (Euclidian, Manhattan, …) Remember to normalize if ranges vary between attributes k stands for the number of “closest” neighbors of a test data tuple according to measured distance Majority voting of their class labels used to determine class of test tuple Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
43
Lazy Learners K-Nearest Neighbor - Example
11/16/2018 1:17 PM Lazy Learners K-Nearest Neighbor - Example RID age Loan ($) Default 1 25 40000 No 2 35 60000 3 45 80000 4 20 20000 5 120000 6 52 18000 7 23 95000 Yes 8 40 62000 9 60 100000 10 48 220000 11 33 150000 142000 ? Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
44
Lazy Learners K-Nearest Neighbor - Example
11/16/2018 1:17 PM Lazy Learners K-Nearest Neighbor - Example RID age Loan ($) Default Distance 1 25 40000 No 102000 2 35 60000 82000 3 45 80000 62000 4 20 20000 122000 5 120000 22000 6 52 18000 124000 7 23 95000 Yes 47000 8 40 9 60 100000 42000 10 48 220000 78000 11 33 150000 8000 142000 ? fair 𝐷= 𝑥 1 − 𝑦 𝑥 2 − 𝑦 2 2 k=1 NN is RID 11 Default =YES k=3 NNs are RIDs 11, 5, 9 Default = YES Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
45
Model Evaluation Metrics for Evaluating Classifier Performance
11/16/2018 1:17 PM Model Evaluation Metrics for Evaluating Classifier Performance Measure Formula accuracy, recognition rate 𝑇𝑃+𝑇𝑁 𝑃+𝑁 error rate, misclassification rate 𝐹𝑃+𝐹𝑁 𝑃+𝑁 sensitivity, true positive rate, recall 𝑇𝑃 𝑃 specificity, true negative rate 𝑇𝑁 𝑁 precision 𝑇𝑃 𝑇𝑃+𝐹𝑃 Positives tuples representing class of interest Negatives tuples representing other class(es) True Positives positive tuples correctly labeled False Positives negative tuples incorrectly labeled True Negatives negative tuples correctly labeled False Negatives positive tuples incorrectly labeled Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
46
Model Evaluation Metrics for Evaluating Classifier Performance
11/16/2018 1:17 PM Model Evaluation Metrics for Evaluating Classifier Performance Predicted Yes No Total TP FN P FP TN N 𝑃 𝑁 P + N Actual Confusion Matrix Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
47
Model Evaluation Metrics for Evaluating Classifier Performance
11/16/2018 1:17 PM Model Evaluation Metrics for Evaluating Classifier Performance Balanced Classes Predicted Yes No Total 6954 46 7000 412 2588 3000 7366 2634 10000 Accuracy (%) 99.34 86.27 95.42 Actual Example Buys_Computer Confusion Matrix Use accuracy and error rate Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
48
Model Evaluation Metrics for Evaluating Classifier Performance
11/16/2018 1:17 PM Model Evaluation Metrics for Evaluating Classifier Performance Imbalanced Classes Low Predicted Yes No Total 90 210 300 140 9560 9700 230 9770 10000 Accuracy (%) 30 98.56 96.4 Actual Example Cancer Confusion Matrix An accuracy rate of, say, 97% may make the classifier seem quite accurate, but what if only, say, 3% of the training tuples are actually cancer? Clearly, an accuracy rate of 97% may not be acceptable—the classifier could be correctly labeling only the noncancer tuples, for instance, and misclassifying all the cancer tuples. Instead, we need other measures, which access how well the classifier can recognize the positive tuples (cancer = yes) and how well it can recognize the negative tuples (cancer = no). A perfect precision score of 1.0 for a class C means that every tuple that the classifier labeled as belonging to class C does indeed belong to class C. However, it does not tell us anything about the number of class C tuples that the classifier mislabeled. A perfect recall score of 1.0 for C means that every item from class C was labeled as such, but it does not tell us how many other tuples were incorrectly labeled as belonging to class C. There tends to be an inverse relationship between precision and recall, where it is possible to increase one at the cost of reducing the other. For example, our medical classifier may achieve high precision by labeling all cancer tuples that present a certain way as cancer, but may have low recall if it mislabels many other instances of cancer tuples. Precision and recall scores are typically used together, where precision values are compared for a fixed value of recall, or vice versa. For example, we may compare precision values at a recall value of, say, 0.75. High Use sensitivity (TPs or recall) and specificity Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
49
Decision Tree Induction Improving Classification Accuracy
11/16/2018 1:17 PM Agenda What is Classification? General Approach The Basics The Algorithm Attribute Selection Measures Tree Pruning Extracting Rules from Decision Trees Decision Tree Induction Bayes’ Theorem Naïve Bayesian Classification Bayes Classification K-Nearest Neighbor Classifiers Lazy Learners Metrics for Evaluating Classifiers Performance Holdout, Random Subsampling, and Cross-Validation Model Evaluation Bagging Improving Classification Accuracy Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
50
Model Evaluation Holdout and Random Subsampling
11/16/2018 1:17 PM Model Evaluation Holdout and Random Subsampling Holdout RANDOMLY allocate 2/3 of data for training and remaining 1/3 for testing Random Subsampling Repeat holdout k times and take average accuracy Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
51
Model Evaluation Cross-Validation
11/16/2018 1:17 PM Model Evaluation Cross-Validation k-fold cross-validation randomly partition dataset into k mutually exclusive folds of approximately equal size In iteration i, foldi is test set and all other folds are training set Accuracy = correct classifications for all k iterations 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 𝑠𝑖𝑧𝑒 Stratified k-fold cross-validation class distribution in each fold is approximately the same as in initial dataset Stratified 10-fold cross-validation is recommended Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
52
Improving Classification Accuracy Ensemble Methods
11/16/2018 1:17 PM Improving Classification Accuracy Ensemble Methods Ensemble a set of classifiers, each with a vote for a class label Each base classifier is produced from a different partition of the dataset Majority voting is used to compose an aggregate classification Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
53
Improving Classification Accuracy Ensemble Methods - Bagging
11/16/2018 1:17 PM Improving Classification Accuracy Ensemble Methods - Bagging Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
54
Improving Classification Accuracy Ensemble Methods - Bagging
11/16/2018 1:17 PM Improving Classification Accuracy Ensemble Methods - Bagging RID age income student Credit_rating Class: buys_computer 1 youth high no fair 2 excellent 3 middle aged yes 4 senior medium 5 low 6 7 8 9 10 11 12 13 14 3 14 5 7 4 13 9 12 6 10 Bootstrap same size as dataset, sampling with replacement Data Mining 2013 – Classification 11/16/201811/16/2018 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
55
11/16/2018 1:17 PM Project A 5-pages individual survey on a focused topic (Deadline 30/4/2013 – Oral to be held on first – or second – week of May – 10 minutes per student) Examples of topics: Data cleaning for social network analysis Data cleaning in heterogeneous information networks Quality data integration by resolving redundant or conflicting records Entity resolution for merging same entities with different names Data mining methods to detect software bugs Detection of computer network intrusions by data mining Clustering* data streams in big data Methods for clustering trajectory data Methods for clustering spatiotemporal data Clustering for recommender systems * Change Clustering to mining frequent patterns; classification, ensemble © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
56
11/16/2018 1:17 PM Questions? © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.