Data Mining: A Closer Look Chapter 2. 2.1 Data Mining Strategies (p35) Moh!

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Random Forest Predrag Radenković 3237/10
IT 433 Data Warehousing and Data Mining
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Data Mining: A Closer Look Chapter Data Mining Strategies.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Statistical Methods Chichang Jou Tamkang University.
Basic Data Mining Techniques Chapter Decision Trees.
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Evaluating Hypotheses
Basic Data Mining Techniques
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Lecture 5 (Classification with Decision Trees)
Neural Networks Chapter Feed-Forward Neural Networks.
Data Mining: A Closer Look Chapter Data Mining Strategies.
Genetic Algorithm Genetic Algorithms (GA) apply an evolutionary approach to inductive learning. GA has been successfully applied to problems that are difficult.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
1 An Excel-based Data Mining Tool Chapter The iData Analyzer.
Today Concepts underlying inferential statistics
Multiple Regression – Basic Relationships
Chapter 6 Decision Trees
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
Decision Tree Models in Data Mining
Enterprise systems infrastructure and architecture DT211 4
Evaluating Performance for Data Mining Techniques
Chapter 7 Decision Tree.
1 Formal Evaluation Techniques Chapter 7. 2 test set error rates, confusion matrices, lift charts Focusing on formal evaluation methods for supervised.
Basic Data Mining Techniques
Data Mining Techniques
An Exercise in Machine Learning
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
An Excel-based Data Mining Tool Chapter The iData Analyzer.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Inductive learning Simplest form: learn a function from examples
Decision Trees.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Chapter 9 Neural Network.
Chapter 9 – Classification and Regression Trees
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Part II Tools for Knowledge Discovery Ch 5. Knowledge Discovery in Databases Ch 6. The Data Warehouse Ch 7. Formal Evaluation Technique.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 4 An Excel-based Data Mining Tool (iData Analyzer) Jason C. H. Chen, Ph.D. Professor of MIS.
DM.Lab in University of Seoul Data Mining Laboratory April 24 th, 2008 Summarized by Sungjick Lee An Excel-Based Data Mining Tool iData Analyzer.
Evaluating Classification Performance
Data Mining and Decision Support
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Classification Today: Basic Problem Decision Trees.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
An Excel-based Data Mining Tool Chapter The iData Analyzer.
Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
Chapter 6 Decision Tree.
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Data Mining Lecture 11.
An Excel-based Data Mining Tool
Decision Trees.
MIS2502: Data Analytics Classification Using Decision Trees
Presentation transcript:

Data Mining: A Closer Look Chapter 2

2.1 Data Mining Strategies (p35) Moh!

Classification Learning is supervised. The dependent variable is categorical. Well-defined classes. Current rather than future behavior.

Estimation Learning is supervised. The dependent variable is numeric. Well-defined output classes or variable. Current rather than future behavior.(???)

Prediction The emphasis is on predicting future rather than current outcomes. The output attribute may be categorical or numeric. The output variable must correspond to the variable to be predicted (the dependent variable). The input variables are the predictor variables, (or independent variables). Hence any supervised classification model, or supervised estimation model may be used for prediction if the variables are suitably chosen. That is: – if the output variable is “current” and – the input variables are previous attribute values If you can classify/estimate the present from the past, then you can predict the future from the present!!!

3.5 Choosing a Data Mining Technique Initial Considerations Is learning supervised or unsupervised? Is explanation required? What is the interaction between input and output attributes? What are the data types of the input and output attributes?

Further Considerations which we might prefer to ignore Do We Know the Distribution of the Data? Do We Know Which Attributes Best Define the Data? Does the Data Contain Missing Values? Is Time an Issue? Which Technique Is Most Likely to Give a Best Test Set Accuracy?

Methods of Supervised classification Decision Trees Production Rules Instance based methods Multiple Discriminant Analysis Naïve Bayes methods Neural methods Today we consider only the first three, which are machine learning based; the last three are statistically based.

Concept Class = output variable = Healthy/Sick = 1/0

A Healthy Class Rule for the Cardiology Patient Dataset IF 169 <= Maximum Heart Rate <=202 THEN Concept Class = Healthy Rule accuracy: 85.07% High Heart rate is quite a good predictor of health Rule coverage: 34.55% But there are other ways of being healthy. – Rule accuracy is a between-class measure. – Rule coverage is a within-class measure. A Sick Class Rule for the Cardiology Patient Dataset IF Thal = Rev & Chest Pain Type = Asymptomatic THEN Concept Class = Sick Rule accuracy: 91.14% Rule coverage: 52.17% Healthy High Heart Rate

Acceptance/rejection of the “Life Insurance Promotion” offer is the output variable. A Hypothesis for the Insurance Promotion For credit card holders, A combination of one or more of the attributes can differentiate those who say yes to the life insurance promotion from those who say no.

2.5 Evaluating Supervised Model Performance The Confusion Matrix A matrix used to summarize the results of a supervised classification. Entries along the main diagonal are correct classifications. Entries other than those on the main diagonal are classification errors.

True Classes c 11 is the number with true class “1” which are correctly classified as class “1” c 12 is the number with true class “1” which are mis-classified as class “2” Etc..

Two-Class Error Analysis Table 2.6 A Simple Confusion Matrix Computed Accept Reject Accept True False Accept Reject False True Accept Reject True

Figure 2.4 Targeted vs. mass mailing Comparing Models by Measuring Lift Representative sample Targetted Sample

Computing Lift

The population is 100,000. Consider the first Confusion Matrix. The acceptance rate of those predicted to accept is 540/23,460 = 2.3% The overall acceptance rate in the population is 1000/100,000 = 1% Therefore the lift in the response rate from using the classification model for targetted sampling/marketting is 2.3/1 = 2.3.

Basic Data Mining Techniques : Chapter Decision Trees An Algorithm for Building Decision Trees 1. Let T be the set of training instances. 2. Choose an attribute that best differentiates the instances in T. 3. Create a tree node whose value is the chosen attribute. Create child links from this node where each link represents a unique value for the chosen attribute. Use the child link values to further subdivide the instances into subclasses. 4. For each subclass created in step 3: If the instances in the subclass satisfy predefined criteria or if the set of remaining attribute choices for this path is null, specify the classification for new instances following this decision path. If the subclass does not satisfy the criteria and there is at least one attribute to further subdivide the path of the tree, let T be the current set of subclass instances and return to step 2. Don’t worry too much about this. It is just “algorithm speak”, which we do not concern ourselves with.

2.2/3.1 Supervised Data Mining Techniques Another (pseudo) Dataset

Figure 3.1 A partial decision tree with root node = income range

Figure 3.2 A partial decision tree with root node = credit card insurance

Figure 3.3 A partial decision tree with root node = age

Decision Trees for the Credit Card Promotion Database

Figure 3.4 A three-node decision tree for the credit card database

Figure 3.5 A two-node decision treee for the credit card database

Decision Tree Rules Rules for the Tree in Figure 3.4 IF Age <=43 & Sex = Male & Credit Card Insurance = No THEN Life Insurance Promotion = No IF Sex = Female & 19 <=Age <= 43 THEN Life Insurance Promotion = Yes Rule Accuracy: % Rule Coverage: 66.67%

A Production Rule for the Credit Card Promotion Database IF Sex = Female & 19 <=Age <= 43 THEN Life Insurance Promotion = Yes Rule Accuracy: % Rule Coverage: 66.67%

A Simplified Rule Obtained by Removing Attribute Age IF Sex = Male & Credit Card Insurance = No THEN Life Insurance Promotion = No

CART CHAID Other Methods for Building Decision Trees

Advantages of Decision Trees Easy to understand. Map nicely to a set of production rules. Applied to real problems. Make no prior assumptions about the data. Able to process both numerical and categorical data.

Disadvantages of Decision Trees Output attribute must be categorical. Limited to one output attribute. Decision tree algorithms are unstable. Trees created from numeric datasets can be complex.

An Excel-based Data Mining Tool Chapter The iData Analyzer 4.2 ESX: A Multipurpose Tool for Data Mining A Live Demonstration Laboratory Exercise.

Figure 4.2 A successful installation

4.5 A Six-Step Approach for Supervised Learning Step 1: Choose an Output Attribute Step 2: Perform the Mining Session Step 3: Read and Interpret Summary Results Step 4: Read and Interpret Test Set Results Step 5: Read and Interpret Class Results Step 6: Visualize and Interpret Class Rules

Figure 4.12 Test set instance classification Read and Interpret Test Set Results

4.7 Instance Typicality The typicality of instance I is the “average” “similarity” of I to the other members of its cluster or class. definitions of “average” and “similarity” in iDA are secret!!! Typicality values lie between 0 and 1. 1 indicates the class “prototype” 0 indicates the class “outlier”

Typicality Scores Identify prototypical and outlier instances. Select a best set of training instances. Used to compute individual instance classification confidence scores. CLASS SIMILARITY is the average similarity of members of a class with other members of the same class.

Figure 4.13 Instance typicality

Other Definitions Given class C and categorical attribute A with values v1, v2,…vn, then the Class C predictability score for A = v2 (say) is the proportion of instances in C with A = v2. This is concerned with the predictability of A = v2 in the class C. Class C predictiveness is the proportion of instances with A = v2 which are in class C. This is concerned with the predictabiity of the Class C from A = v2.