Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
CHAPTER 9: Decision Trees
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
IT 433 Data Warehousing and Data Mining
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Bab /44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree.
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Scalable Classification Robert Neugebauer David Woo.
Classification Techniques: Decision Tree Learning
Chapter 7 – Classification and Regression Trees
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Lecture outline Classification Decision-tree classification.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Decision Tree Classification Tomi Yiu CS 632 — Advanced Database Systems April 5, 2001.
Classification and Prediction
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification Continued
Lecture 5 (Classification with Decision Trees)
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Classification II.
Classification.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Chapter 7 Decision Tree.
Data Mining: Classification
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Basics of Decision Trees  A flow-chart-like hierarchical tree structure –Often restricted to a binary structure  Root: represents the entire dataset.
Chapter 9 – Classification and Regression Trees
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Basic Data Mining Technique
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
SPRINT : A Scalable Parallel Classifier for Data Mining John Shafer, Rakesh Agrawal, Manish Mehta.
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
CS690L Data Mining: Classification
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Classification and Prediction
CIS671-Knowledge Discovery and Data Mining Vasileios Megalooikonomou Dept. of Computer and Information Sciences Temple University AI reminders (based on.
Lecture Notes for Chapter 4 Introduction to Data Mining
Bootstrapped Optimistic Algorithm for Tree Construction
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Decision Trees.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
SLIQ and SPRINT for disk resident data. Shortcommings of ID3 Scalability ? requires lot of computation at every stage of construction of decision tree.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees.
Chapter 6 Classification and Prediction
Classification and Prediction
Data Mining – Chapter 3 Classification
Classification and Prediction
©Jiawei Han and Micheline Kamber
Classification.
Classification 1.
Presentation transcript:

Decision Trees SLIQ – fast scalable classifier Group 12 -Vaibhav Chopda -Tarun Bahadur

Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen Source – http://citeseer.ifi.unizh.ch/mehta96sliq.html Material Includes: lecture notes for CSE634 – Prof. Anita Wasilewska http://www.cs.sunysb.edu/~cse634

Agenda What is classification … Why decision trees ? The ID3 algorithm Limitations of ID3 algorithm SLIQ – fast scalable classifier for DataMining SPRINT – the successor of SLIQ

Classification Process : Model Construction Algorithms Training Data Classifier (Model) IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ CSE634 course notes – Prof. Anita Wasilewska

Testing and Prediction (by a classifier) Data Unseen Data (Jeff, Professor, 4) Tenured? CSE634 course notes – Prof. Anita Wasilewska

Classification by Decision Tree Induction Decision tree (Tuples flow along the tree structure) Internal node denotes an attribute Branch represents the values of the node attribute Leaf nodes represent class labels or class distribution CSE634 course notes – Prof. Anita Wasilewska

Classification by Decision Tree Induction Decision tree generation consists of two phases Tree construction At start we choose one attribute as the root and put all its values as branches We choose recursively internal nodes (attributes) with their proper values as branches. We Stop when all the samples (records) are of the same class, then the node becomes the leaf labeled with that class or there is no more samples left or there is no more new attributes to be put as the nodes. In this case we apply MAJORITY VOTING to classify the node. Tree pruning Identify and remove branches that reflect noise or outliers CSE634 course notes – Prof. Anita Wasilewska

Classification by Decision Tree Induction Wheres the challenge ? Good choice of root attribute Good choice of the internal nodes attributes is a crucial point. Decision Tree Induction Algorithms differ on methods of evaluating and choosing the root and internal nodes attributes. CSE634 course notes – Prof. Anita Wasilewska

Basic Idea of ID3/C4.5 Algorithm - greedy algorithm - constructs decision trees in a top-down recursive divide-and-conquer manner. Tree STARTS as a single node (root) representing all training dataset (samples) IF the samples are ALL in the same class, THEN the node becomes a LEAF and is labeled with that class OTHERWISE, the algorithm uses an entropy-based measure known as information gain as a heuristic for selecting the ATTRIBUTE that will BEST separate the samples into individual classes. This attribute becomes the node-name (test, or tree split decision attribute) CSE634 course notes – Prof. Anita Wasilewska

Basic Idea of ID3/C4.5 Algorithm (2) A branch is created for each value of the node-attribute (and is labeled by this value -this is syntax) and the samples are partitioned accordingly (this is semantics; see example which follows) The algorithm uses the same process recursively to form a decision tree at each partition. Once an attribute has occurred at a node, it need not be considered in any other of the node’s descendents The recursive partitioning STOPS only when any one of the following conditions is true. All records (samples) for the given node belong to the same class or There are no remaining attributes on which the Records (samples) may be further partitioning. In this case we convert the given node into a LEAF and label it with the class in majority among samples (majority voting) There is no records (samples) left – a leaf is created with majority vote for training sample CSE634 course notes – Prof. Anita Wasilewska

Example from professor Anita’s slide This follows an example from Quinlan’s ID3 CSE634 course notes – Prof. Anita Wasilewska

Shortcommings of ID3 Scalability ? requires lot of computation at every stage of construction of decision tree Scalability ? needs all the training data to be in the memory It does not suggest any standard splitting index for range attributes

SLIQ - a decision tree classifier Features of SLIQ Applies to both numerical and categorical attributes Builds compact and accurate trees Uses a pre-sorting technique in the tree growing phase and an inexpensive pruning algorithm Suitable for classification of large disk-resident datasets, independently of the number of classes, attributes and records

SLIQ Methodology: Create decision tree by partitioning records Generate attribute list for each attribute Sort attribute lists for NUMERIC Attributes Start End

Example: Drivers Age CarType Class 23 Family HIGH 17 Sports 43 68 LOW 32 Truck 20

Attribute listing phase : Rec Id Age CarType Class 23 Family HIGH 1 17 Sports 2 43 3 68 LOW 4 32 Truck 5 20 Age Class RecId 23 HIGH 17 1 43 2 68 LOW 3 32 4 20 5 Rec Id CarType Family HIGH Sports 1 2 LOW 3 Truck 4 5 Age – NUMERIC attribute CarType – CATEGORICAL attribute

Presorting Phase: Only NUMERIC attributes sorted Age Class RecId 17 HIGH 20 5 23 32 LOW 4 43 2 68 3 CarType Class Rec Id Family HIGH Sports 1 2 LOW 3 Truck 4 5 Only NUMERIC attributes sorted CATEGORICAL attribute need not be sorted

Constructing the decision tree

Constructing the decision tree (block 20) for each leaf node being examined, the method determines a split test to best separate the records at the examined node using the attribute lists in block 21. (block 22) the records at the examined leaf node are partitioned according to the best split test at that node to form new leaf nodes, which are also child nodes of the examined node. The records at each new leaf node are checked at block 23 to see if they are of the same class. If this condition has not been achieved, the splitting process is repeated starting with block 24 for each newly formed leaf node until each leaf node contains records from one class. In finding the best split test (or split point) at a leaf node, a splitting index corresponding to a criterion used for splitting the records may be used to help evaluate possible splits. This splitting index indicates how well the criterion separates the record classes. The splitting index is preferably a gini index.

Gini Index The gini index is used to evaluate the “goodness” of the alternative splits for an attribute If a data set T contains examples from n classes, gini(T) is defined as Where pj is the relative ferquency of class j in the data set T. After splitting T into two subset T1 and T2 the gini index of the split data is defined as

Gini Index : The preferred splitting index Advantage of the gini index: Its calculation requires only the distribution of the class values in each record partition. To find the best split point for a node, the node's attribute lists are scanned to evaluate the splits for the attributes. The attribute containing the split point with the lowest value for the gini index is used for splitting the node's records. The following is the splitting test (next slide) – The flow chart will fit in the block 21 of decision tree construction

Numeric Attributes splitting index

Splitting for catergorical attributes

Determining subset of highest index Greedy algorithm may be used here The logic of finding best subset – substitute for block 39

The decision tree getting constructed (level 0)

Decision tree (level 1)

The classification at level 1

Performance:

Performance: Classification Accuracy

Performance: Decision Tree Size

Performance: Execution time

Performance: Scalability

Conclusion:

THANK YOU !!!