© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Data Mining Lecture 9.
DECISION TREES. Decision trees  One possible representation for hypotheses.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Indian Statistical Institute Kolkata
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Data Mining Techniques Outline
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification Continued
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Lecture 5 (Classification with Decision Trees)
Three kinds of learning
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Gini Index (IBM IntelligentMiner)
Chapter 5 Data mining : A Closer Look.
Classification Naïve Bayes, Decision Trees Pinker, Continuing Chapter 4.
Decision Tree Models in Data Mining
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
COMP3503 Intro to Inductive Modeling
CSE 8392 Spring CSE 8392 SPRING 1999 DATA MINING: CORE TOPICS Classification Professor Margaret H. Dunham Department of Computer Science and Engineering.
Classification and Prediction (cont.) Pertemuan 10 Matakuliah: M0614 / Data Mining & OLAP Tahun : Feb
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
CS690L Data Mining: Classification
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Decision Tree Learning
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining and Decision Support
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Classification Today: Basic Problem Decision Trees.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data Mining,
DATA MINING Introductory and Advanced Topics Part I
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Part II - Classification© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II - Classification Margaret H. Dunham Department of Computer.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
DATA MINING WITH CLUSTERING AND CLASSIFICATION
Classification.
Machine Learning Inductive Learning and Decision Trees
DATA MINING Spatial Clustering
DATA MINING © Prentice Hall.
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Science Algorithms: The Basic Methods
Data Mining – Chapter 3 Classification
DATA MINING Introductory and Advanced Topics Part II - Clustering
CSCI N317 Computation for Scientific Applications Unit Weka
CS639: Data Management for Data Science
A task of induction to find patterns
Presentation transcript:

© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002.

© Prentice Hall2 Classification Outline Classification Problem Overview Classification Problem Overview Classification Techniques Classification Techniques –Regression –Distance –Decision Trees –Rules –Neural Networks Goal: Provide an overview of the classification problem and introduce some of the basic algorithms

© Prentice Hall3 Classification Problem Given a database D={t 1,t 2,…,t n } and a set of classes C={C 1,…,C m }, the Classification Problem is to define a mapping f:D  C where each t i is assigned to one class. Given a database D={t 1,t 2,…,t n } and a set of classes C={C 1,…,C m }, the Classification Problem is to define a mapping f:D  C where each t i is assigned to one class. Actually divides D into equivalence classes. Actually divides D into equivalence classes. Prediction is similar, but may be viewed as having infinite number of classes. Prediction is similar, but may be viewed as having infinite number of classes.

© Prentice Hall4 Classification Examples Teachers classify students’ grades as A, B, C, D, or F. Teachers classify students’ grades as A, B, C, D, or F. Identify mushrooms as poisonous or edible. Identify mushrooms as poisonous or edible. Predict when a river will flood. Predict when a river will flood. Identify individuals with credit risks. Identify individuals with credit risks. Speech recognition Speech recognition Pattern recognition Pattern recognition

© Prentice Hall5 Classification Ex: Grading If x >= 90 then grade =A. If x >= 90 then grade =A. If 80<=x<90 then grade =B. If 80<=x<90 then grade =B. If 70<=x<80 then grade =C. If 70<=x<80 then grade =C. If 60<=x<70 then grade =D. If 60<=x<70 then grade =D. If x<50 then grade =F. If x<50 then grade =F. >=90<90 x >=80<80 x >=70<70 x F B A >=60<50 x C D

© Prentice Hall6 Classification Ex: Letter Recognition View letters as constructed from 5 components: Letter C Letter E Letter A Letter D Letter F Letter B

© Prentice Hall7 Classification Techniques Approach: Approach: 1.Create specific model by evaluating training data (or using domain experts’ knowledge). 2.Apply model developed to new data. Classes must be predefined Classes must be predefined Most common techniques use DTs, NNs, or are based on distances or statistical methods. Most common techniques use DTs, NNs, or are based on distances or statistical methods.

© Prentice Hall8 Defining Classes Partitioning Based Distance Based

© Prentice Hall9 Issues in Classification Missing Data Missing Data –Ignore –Replace with assumed value Measuring Performance Measuring Performance –Classification accuracy on test data –Confusion matrix –OC Curve

© Prentice Hall10 Height Example Data

© Prentice Hall11 Classification Performance True Positive True NegativeFalse Positive False Negative

© Prentice Hall12 Confusion Matrix Example Using height data example with Output1 correct and Output2 actual assignment

© Prentice Hall13 Operating Characteristic Curve

© Prentice Hall14 Regression Assume data fits a predefined function Assume data fits a predefined function Determine best values for regression coefficients c 0,c 1,…,c n. Determine best values for regression coefficients c 0,c 1,…,c n. Assume an error: y = c 0 +c 1 x 1 +…+c n x n Assume an error: y = c 0 +c 1 x 1 +…+c n x n +  Estimate error using mean squared error for training set:

© Prentice Hall15 Linear Regression Poor Fit

© Prentice Hall16 Classification Using Regression Division: Use regression function to divide area into regions. Division: Use regression function to divide area into regions. Prediction: Use regression function to predict a class membership function. Input includes desired class. Prediction: Use regression function to predict a class membership function. Input includes desired class.

© Prentice Hall17Division

© Prentice Hall18Prediction

© Prentice Hall19 Classification Using Decision Trees Partitioning based: Divide search space into rectangular regions. Partitioning based: Divide search space into rectangular regions. Tuple placed into class based on the region within which it falls. Tuple placed into class based on the region within which it falls. DT approaches differ in how the tree is built: DT Induction DT approaches differ in how the tree is built: DT Induction Internal nodes associated with attribute and arcs with values for that attribute. Internal nodes associated with attribute and arcs with values for that attribute. Algorithms: ID3, C4.5, CART Algorithms: ID3, C4.5, CART

© Prentice Hall20 Decision Tree Given: –D = {t 1, …, t n } where t i = –D = {t 1, …, t n } where t i = –Database schema contains {A 1, A 2, …, A h } –Classes C={C 1, …., C m } Decision or Classification Tree is a tree associated with D such that –Each internal node is labeled with attribute, A i –Each arc is labeled with predicate which can be applied to attribute at parent –Each leaf node is labeled with a class, C j

© Prentice Hall21 DT Induction

© Prentice Hall22 DT Splits Area Gender Height M F

© Prentice Hall23 Comparing DTs Balanced Deep

© Prentice Hall24 DT Issues Choosing Splitting Attributes Choosing Splitting Attributes Ordering of Splitting Attributes Ordering of Splitting Attributes Splits Splits Tree Structure Tree Structure Stopping Criteria Stopping Criteria Training Data Training Data Pruning Pruning

© Prentice Hall25 Decision Tree Induction is often based on Information Theory So

© Prentice Hall26 Information

© Prentice Hall27 DT Induction When all the marbles in the bowl are mixed up, little information is given. When all the marbles in the bowl are mixed up, little information is given. When the marbles in the bowl are all from one class and those in the other two classes are on either side, more information is given. When the marbles in the bowl are all from one class and those in the other two classes are on either side, more information is given. Use this approach with DT Induction !

© Prentice Hall28 Information/Entropy Given probabilitites p 1, p 2,.., p s whose sum is 1, Entropy is defined as: Given probabilitites p 1, p 2,.., p s whose sum is 1, Entropy is defined as: Entropy measures the amount of randomness or surprise or uncertainty. Entropy measures the amount of randomness or surprise or uncertainty. Goal in classification Goal in classification – no surprise – entropy = 0

© Prentice Hall29 Entropy log (1/p)H(p,1-p)

© Prentice Hall30 ID3 Creates tree using information theory concepts and tries to reduce expected number of comparison.. Creates tree using information theory concepts and tries to reduce expected number of comparison.. ID3 chooses split attribute with the highest information gain: ID3 chooses split attribute with the highest information gain:

© Prentice Hall31 Height Example Data

© Prentice Hall32 ID3 Example (Output1) Starting state entropy: Starting state entropy: 4/15 log(15/4) + 8/15 log(15/8) + 3/15 log(15/3) = Gain using gender: Gain using gender: –Female: 3/9 log(9/3)+6/9 log(9/6)= –Male: 1/6 (log 6/1) + 2/6 log(6/2) + 3/6 log(6/3) = –Weighted sum: (9/15)(0.2764) + (6/15)(0.4392) = –Gain: – =

© Prentice Hall33 Looking at the height attribute, Looking at the height attribute, (0. 1.6] : 2 (0. 1.6] : 2 (1.6, 1.7] : 2 (1.6, 1.7] : 2 (1.7, 1.8] : 3 (1.7, 1.8] : 3 (1.8, 1.9] : 4 (1.8, 1.9] : 4 (1.9, 2.0] : 2 (1.9, 2.0] : 2 (2.0, ∞ ) : 2 (2.0, ∞ ) : 2

© Prentice Hall34 Looking at the height attribute, Looking at the height attribute, (0. 1.6] : 2 2/2(0) = 0 (0. 1.6] : 2 2/2(0) = 0 (1.6, 1.7] : 2 (2/2(0) ) = 0 (1.6, 1.7] : 2 (2/2(0) ) = 0 (1.7, 1.8] : 3 (0 + 3/3(0) + 0) = 0 (1.7, 1.8] : 3 (0 + 3/3(0) + 0) = 0 (1.8, 1.9] : 4 (0 + 4/4(0) + 0) = 0 (1.8, 1.9] : 4 (0 + 4/4(0) + 0) = 0 (1.9, 2.0] : 2 (0 + ½(0.301) + ½ (0.301) = (1.9, 2.0] : 2 (0 + ½(0.301) + ½ (0.301) = (2.0, ∞ ) : 2 ( /2(0)) = 0 (2.0, ∞ ) : 2 ( /2(0)) = 0

© Prentice Hall35 The gain in entropy by using the height attribute is thus The gain in entropy by using the height attribute is thus – 2/15 (0.301) = – 2/15 (0.301) =

© Prentice Hall36