Classification II.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Rule Generation from Decision Tree Decision tree classifiers are popular method of classification due to it is easy understanding However, decision tree.
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
Decision Tree Approach in Data Mining
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Classification Techniques: Decision Tree Learning
Chapter 7 – Classification and Regression Trees
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Lecture outline Classification Decision-tree classification.
1 Decision Tree Classification Tomi Yiu CS 632 — Advanced Database Systems April 5, 2001.
Classification and Prediction
Classification and Prediction
Spatial and Temporal Data Mining V. Megalooikonomou Introduction to Decision Trees ( based on notes by Jiawei Han and Micheline Kamber and on notes by.
Induction of Decision Trees
Lecture 5 (Classification with Decision Trees)
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification.
Chapter 7 Decision Tree.
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Classification supplemental. Scalable Decision Tree Induction Methods in Data Mining Studies SLIQ (EDBT’96 — Mehta et al.) – builds an index for each.
Basics of Decision Trees  A flow-chart-like hierarchical tree structure –Often restricted to a binary structure  Root: represents the entire dataset.
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification CS 685: Special Topics in Data Mining Fall 2010 Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
1 Appendix D: Application of Genetic Algorithm in Classification Duong Tuan Anh 5/2014.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
CIS671-Knowledge Discovery and Data Mining Vasileios Megalooikonomou Dept. of Computer and Information Sciences Temple University AI reminders (based on.
Lecture Notes for Chapter 4 Introduction to Data Mining
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
Bootstrapped Optimistic Algorithm for Tree Construction
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Decision Trees.
SLIQ (SUPERVISED LEARNING IN QUEST) STUDENT: NIKOLA TERZIĆ PROFESOR: VELJKO MILUTINOVIĆ.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
SLIQ and SPRINT for disk resident data. Shortcommings of ID3 Scalability ? requires lot of computation at every stage of construction of decision tree.
Chapter 6 Decision Tree.
Classification and Prediction
DECISION TREES An internal node represents a test on an attribute.
Chapter 6 Classification and Prediction
Introduction to Data Mining, 2nd Edition by
Classification and Prediction
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Basic Concepts and Decision Trees
CS 685: Special Topics in Data Mining Jinze Liu
Data Mining – Chapter 3 Classification
CS 685: Special Topics in Data Mining Jinze Liu
CSCI N317 Computation for Scientific Applications Unit Weka
©Jiawei Han and Micheline Kamber
MIS2502: Data Analytics Classification Using Decision Trees
CS 685: Special Topics in Data Mining Spring 2009 Jinze Liu
CS 685: Special Topics in Data Mining Jinze Liu
Presentation transcript:

Classification II

Data Mining Overview Data Mining Data warehouses and OLAP (On Line Analytical Processing.) Association Rules Mining Clustering: Hierarchical and Partitional approaches Classification: Decision Trees and Bayesian classifiers

Setting Given old data about customers and payments, predict new applicant’s loan eligibility. Previous customers Classifier Decision rules Age Salary Profession Location Customer type Salary > 5 L Good/ bad Prof. = Exec New applicant’s data

Decision trees Tree where internal nodes are simple decision rules on one or more attributes and leaf nodes are predicted class labels. Salary < 1 M Prof = teaching Age < 30 Good Bad Bad Good

SLIQ (Supervised Learning In Quest) Decision-tree classifier for data mining Design goals: Able to handle large disk-resident training sets No restrictions on training-set size

Building tree GrowTree(TrainingData D) Partition(D); Partition(Data D) if (all points in D belong to the same class) then return; for each attribute A do evaluate splits on attribute A; use best split found to partition D into D1 and D2; Partition(D1); Partition(D2); 2 major problems: o How to find split points that define node tests o How to partition the data having found split points o CART, C4.5: - DFS - Repeated sorting at each node (continuous attributes) o SLIQ - SortOnce technique using attribute lists

Data Setup One list for each attribute Entries in an Attribute List consist of: attribute value class list index A list for the classes with pointers to the tree nodes Lists for continuous attributes are in sorted order Attribute lists may be disk-resident Class List must be in main memory

Data Setup Class list Attribute lists N1

Evaluating Split Points Gini Index if data D contains examples from c classes Gini(D) = 1 -  pj2 where pj is the relative frequency of class j in D If D split into D1 & D2 with n1 & n2 tuples each Ginisplit(D) = n1* gini(D1) + n2* gini(D2) n n Note: Only class frequencies are needed to compute index

Finding Split Points For each attribute A do evaluate splits on attribute A using attribute list Key idea: To evaluate a split on numerical attributes we need to sort the set at each node. But, if we have all attributes pre-sorted we don’t need to do that at the tree construction phase Keep split with lowest GINI index 2 major problems: o How to find split points that define node tests o How to partition the data having found split points o CART, C4.5: - DFS - Repeated sorting at each node (continuous attributes) o SLIQ - SortOnce technique using attribute lists

Finding Split Points: Continuous Attrib. Consider splits of form: value(A) < x Example: Age < 17 Evaluate this split-form for every value in an attribute list To evaluate splits on attribute A for a given tree-node: Initialize class-histograms of left and right children; for each record in the attribute list do find the corresponding entry in Class List and the class and Leaf node evaluate splitting index for value(A) < record.value; update the class histogram in the leaf

N1 GINI Index: und 0.33 0.22 0.5 High Low L R 4 2 1 High Low L 1 R 3 2 R 4 2 und 1 High Low L 1 R 3 2 0.33 3 1 4 High Low L 3 R 1 2 3 0.22 1: Age < 20 3: Age < 32 Age < 32 High Low L 3 1 R 4: Age < 43 0.5 4

Finding Split Points: Categorical Attrib. Consider splits of the form: value(A)  {x1, x2, ..., xn} Example: CarType {family, sports} Evaluate this split-form for subsets of domain(A) To evaluate splits on attribute A for a given tree node: initialize class/value matrix of node to zeroes; for each record in the attribute list do increment appropriate count in matrix; evaluate splitting index for various subsets using the constructed matrix;

class/value matrix Left Child Right Child GINI Index: CarType in {family} GINI = 0.444 CarType in {sports} GINI = 0.333 CarType in {truck} GINI = 0.267

Updating the Class List Next step is to update the Class List with the new nodes Scan the attr list that is used to split and update the corresponding leaf entry in the Class List For each attribute A in a split traverse the attribute list for each value u in the attr list find the corresponding entry in the class list (e) find the new class c to which u belongs update the class list for e to c update node reference in e to the node corresponding to class c

Preventing overfitting A tree T overfits if there is another tree T’ that gives higher error on the training data yet gives lower error on unseen data. An overfitted tree does not generalize to unseen instances. Happens when data contains noise or irrelevant attributes and training size is small. Overfitting can reduce accuracy drastically: 10-25% as reported in Minger’s 1989 Machine learning

Approaches to prevent overfitting Two Approaches: Stop growing the tree beyond a certain point First over-fit, then post prune. (More widely used) Tree building divided into phases: Growth phase Prune phase Hard to decide when to stop growing the tree, so second appraoch more widely used.

Criteria for finding correct final tree size: Three criteria: Cross validation with separate test data Use some criteria function to choose best size Example: Minimum description length (MDL) criteria Statistical bounds: use all data for training but apply statistical test to decide right size.

The minimum description length principle (MDL) MDL: paradigm for statistical estimation particularly model selection Given data D and a class of models M, our choose is to choose a model m in M such that data and model can be encoded using the smallest total length L(D) = L(D|m) + L(m) How to find encoding length? Answer in Information Theory Consider the problem of transmitting n messages where pi is probability of seeing message i Shannon’s theorem: minimum expected length when -log pi bits to message i

Encoding data Assume t records of training data D First send tree m using L(m|M) bits Assume all but the class labels of training data known. Goal: transmit class labels using L(D|m) If tree correctly predicts an instance, 0 bits Otherwise, log k bits where k is number of classes. Thus, if e errors on training data: total cost e log k + L(m|M) bits. Complex tree will have higher L(m|M) but lower e. Question: how to encode the tree?

Extracting Classification Rules from Trees Represent the knowledge in the form of IF-THEN rules One rule is created for each path from the root to a leaf Each attribute-value pair along a path forms a conjunction The leaf node holds the class prediction Rules are easier for humans to understand Example IF age = “<=30” AND student = “no” THEN buys_computer = “no” IF age = “<=30” AND student = “yes” THEN buys_computer = “yes” IF age = “31…40” THEN buys_computer = “yes” IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes” IF age = “<=30” AND credit_rating = “fair” THEN buys_computer = “no”

SPRINT An improvement over SLIQ Does not need to keep a list in main memory Parallel version is straightforward Attribute lists are extended with class field – no Class list is needed Uses hashing to assign records to classes and nodes

Pros and Cons of decision trees Reasonable training time Fast application Easy to interpret Easy to implement Can handle large number of features Cons Cannot handle complicated relationship between features simple decision boundaries problems with lots of missing data More information: http://www.recursive-partitioning.com/