DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Classification: Definition l Given a collection of records (training set) l Find a model.
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Classification Techniques: Decision Tree Learning
Data Mining Classification: Naïve Bayes Classifier
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Lecture outline Classification Decision-tree classification.
Online Algorithms – II Amrinder Arora Permalink:
Classification and Prediction
Classification & Prediction
Lecture 5 (Classification with Decision Trees)
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification.
Chapter 4 Classification and Scoring
Chapter 7 Decision Tree.
Bayesian Networks. Male brain wiring Female brain wiring.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Modul 6: Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of.
Basic Data Mining Technique
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification CS 685: Special Topics in Data Mining Fall 2010 Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
Bayesian Classification
Classification And Bayesian Learning
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Classification and Prediction
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification & Prediction — Continue—. Overfitting in decision trees Small training set, noise, missing values Error rate decreases as training set.
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
Classification Today: Basic Problem Decision Trees.
Big Data Analysis and Mining Qinpei Zhao 赵钦佩 2015 Fall Decision Tree.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining By Tan, Steinbach,
Decision Trees.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Chapter 6 Decision Tree.
Chapter 6 Classification and Prediction
Bayesian Classification
Classification and Prediction
Prepared by: Mahmoud Rafeek Al-Farra
CS 685: Special Topics in Data Mining Jinze Liu
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Classification and Prediction
CS 685: Special Topics in Data Mining Jinze Liu
CSCI N317 Computation for Scientific Applications Unit Weka
©Jiawei Han and Micheline Kamber
CS 685: Special Topics in Data Mining Spring 2009 Jinze Liu
CS 685: Special Topics in Data Mining Jinze Liu
COSC 4368 Intro Supervised Learning Organization
Presentation transcript:

DATA MINING : CLASSIFICATION

Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class label attributes).  training data.  A model is created by running the algorithm on the training data.   Test the model. If accuracy is low, regenerate the model, after changing features,reconsidering samples.   Identify a class label for the incoming new data.

Applications: Classifying credit card transactions as legitimate or fraudulent. Classifying credit card transactions as legitimate or fraudulent. Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil. Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil. Categorizing news stories as finance, weather, entertainment, sports, etc. Categorizing news stories as finance, weather, entertainment, sports, etc.

Classification: A two step process Model construction: describing a set of predetermined classes. Each sample is assumed to belong to a predefined class, as determined by the class label attribute. The set of samples used for model construction is training set. The model is represented as classification rules, decision trees, or mathematical formula.

Model usage: for classifying future or unknown objects. Estimate accuracy of the model. The known label of test sample is compared with the classified result from the model. Accuracy rate is the percentage of test set samples that are correctly classified by the model. Test set is independent of training set. If the accuracy is acceptable, use the model to classify data samples whose class labels are not known.

Model Construction: Training Data Classifier (Model) IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classification Algorithms

Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured?

Classification techniques: Decision Tree based Methods Decision Tree based Methods Rule-based Methods Rule-based Methods Neural Networks Neural Networks Bayesian Classification Support Vector Machines Support Vector Machines

Algorithm for decision tree induction: Basic algorithm: Basic algorithm: Tree is constructed in a top-down recursive divide- and-conquer manner. Tree is constructed in a top-down recursive divide- and-conquer manner. At start, all the training examples are at the root. At start, all the training examples are at the root. Attributes are categorical (if continuous-valued, they are discretized in advance). Attributes are categorical (if continuous-valued, they are discretized in advance). Examples are partitioned recursively based on selected attributes. Examples are partitioned recursively based on selected attributes.

Example of Decision Tree: Training Dataset

Output: A Decision Tree for“buys_computer” age? overcast student?credit rating? noyes fair excellent <=30 >40 no yes

Advantages of decision tree based classification: Inexpensive to construct. Inexpensive to construct. Extremely fast at classifying unknown records. Extremely fast at classifying unknown records. Easy to interpret for small-sized trees. Easy to interpret for small-sized trees. Accuracy is comparable to other classification techniques for many simple data sets. Accuracy is comparable to other classification techniques for many simple data sets.

Enhancements to basic decision tree induction : Allow for continuous-valued attributes Allow for continuous-valued attributes Dynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals Dynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals Handle missing attribute values Handle missing attribute values Assign the most common value of the attribute Assign the most common value of the attribute Assign probability to each of the possible values Assign probability to each of the possible values Attribute construction Attribute construction Create new attributes based on existing ones that are sparsely represented Create new attributes based on existing ones that are sparsely represented This reduces fragmentation, repetition, and replication This reduces fragmentation, repetition, and replication

Potential Problem: Over fitting: This is when the generated model does not apply to the new incoming data. » Either too small of training data, not covering many cases. » Wrong assumptions Over fitting results in decision trees that are more complex than necessary Over fitting results in decision trees that are more complex than necessary Training error no longer provides a good estimate of how well the tree will perform on previously unseen records Training error no longer provides a good estimate of how well the tree will perform on previously unseen records Need new ways for estimating errors Need new ways for estimating errors

How to avoid Over fitting: Two ways to avoid over fitting are –  Pre-pruning  Post-pruning Pre-pruning:  Stop the algorithm before it becomes a fully grown tree.  Stop if all instances belong to the same class.  Stop if no. of instances is less than some user specified threshold

Post-pruning: Post-pruning: Grow decision tree to its entirety. Grow decision tree to its entirety. Trim the nodes of the decision tree in a bottom-up fashion. Trim the nodes of the decision tree in a bottom-up fashion. If generalization error improves after trimming, replace sub-tree by a leaf node. If generalization error improves after trimming, replace sub-tree by a leaf node. Class label of leaf node is determined from majority class of instances in the sub-tree. Class label of leaf node is determined from majority class of instances in the sub-tree.

Bayesian Classification Algorithm: Let X be a data sample whose class label is unknown Let X be a data sample whose class label is unknown Let H be a hypothesis that X belongs to class C Let H be a hypothesis that X belongs to class C For classification problems, determine P(H/X): the probability that the hypothesis holds given the observed data sample X For classification problems, determine P(H/X): the probability that the hypothesis holds given the observed data sample X P(H): prior probability of hypothesis H (i.e. the initial probability before we observe any data, reflects the background knowledge) P(H): prior probability of hypothesis H (i.e. the initial probability before we observe any data, reflects the background knowledge) P(X): probability that sample data is observed P(X): probability that sample data is observed P(X|H) : probability of observing the sample X, given that the hypothesis holds P(X|H) : probability of observing the sample X, given that the hypothesis holds

Training dataset for Bayesian Classification: Class: C1:buys_compute r= ‘yes’ C2:buys_compute r= ‘no’ Data sample X =(age<=30, Income=medium, Student=yes Credit_rating= Fair)

Advantages & Disadvantages of Bayesian Classification: Advantages : Advantages : Easy to implement Easy to implement Good results obtained in most of the cases Good results obtained in most of the cases Disadvantages: Disadvantages: Due to assumption there is loss of accuracy. Due to assumption there is loss of accuracy. Practically, dependencies exist among variables Practically, dependencies exist among variables E.g., hospitals: patients: Profile: age, family history etc,Symptoms: fever, cough etc., Disease: lung cancer, diabetes etc E.g., hospitals: patients: Profile: age, family history etc,Symptoms: fever, cough etc., Disease: lung cancer, diabetes etc Dependencies among these cannot be modeled by Bayesian Classifier Dependencies among these cannot be modeled by Bayesian Classifier

Conclusion: Training data is an important factor in building a model in supervised algorithms. The classification results generated by each of the algorithms (Naïve Bayes, Decision Tree, Neural Networks,…) is not considerably different from each other. Different classification algorithms can take different time to train and build models. Mechanical classification is faster Mechanical classification is faster

References: C. Apte and S. Weiss. Data mining with decision trees and decision rules. Future Generation Computer Systems, 13, L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.

Thank you !!!