Decision trees MARIO REGIN.

Slides:

Advertisements

Similar presentations

Conceptual Clustering

Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.

Random Forest Predrag Radenković 3237/10

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,

Elementary Statistics MOREHEAD STATE UNIVERSITY

A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?

Chapter 7 – Classification and Regression Trees

Chapter 7 – Classification and Regression Trees

Decision Tree under MapReduce Week 14 Part II. Decision Tree.

Mining for High Complexity Regions Using Entropy and Box Counting Dimension Quad-Trees Rosanne Vetro, Wei Ding, Dan A. Simovici Computer Science Department.

© The McGraw-Hill Companies, Inc., by Marc M. Triola & Mario F. Triola SLIDES PREPARED BY LLOYD R. JAISINGH MOREHEAD STATE UNIVERSITY MOREHEAD.

Data Mining Techniques Outline

Decision Tree Algorithm

Basic Data Mining Techniques

© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.

Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology

Multiple Regression – Basic Relationships

Chapter 5 Data mining : A Closer Look.

Decision Tree Models in Data Mining

Fall 2004 TDIDT Learning CS478 - Machine Learning.

Decision Trees.

Introduction to Statistics What is Statistics? : Statistics is the sciences of conducting studies to collect, organize, summarize, analyze, and draw conclusions.

Chapter 9 – Classification and Regression Trees

Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

APPLICATION OF DATAMINING TOOL FOR CLASSIFICATION OF ORGANIZATIONAL CHANGE EXPECTATION Şule ÖZMEN Serra YURTKORU Beril SİPAHİ.

DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:

Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.

Descriptive and Inferential Statistics Descriptive Statistics – consists of the collection, organization, and overall summery of the data presented. Inferential.

Statistics & Evidence-Based Practice

Elementary Statistics

Part Four ANALYSIS AND PRESENTATION OF DATA

Chapter 14 Introduction to Multiple Regression

SNS COLLEGE OF TECHNOLOGY

Chapter(1) The Nature of Probability and Statistics

DECISION TREES An internal node represents a test on an attribute.

Introduction to Machine Learning and Tree Based Methods

Lecture Slides Elementary Statistics Twelfth Edition

C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.

C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.

Intro to Machine Learning

Artificial Intelligence

Decision Trees (suggested time: 30 min)

Predict whom survived the Titanic Disaster

Ch9: Decision Trees 9.1 Introduction A decision tree:

Multiple Regression Analysis and Model Building

Elementary Statistics MOREHEAD STATE UNIVERSITY

Issues in Decision-Tree Learning Avoiding overfitting through pruning

Introduction to Data Mining, 2nd Edition by

Classification and Prediction

(classification & regression trees)

Learning with Identification Trees

Tabulations and Statistics

Data Mining for Business Analytics

Introduction to Statistics

Data Mining – Chapter 3 Classification

Decision Trees.

The Nature of Probability and Statistics

Multiple Decision Trees ISQS7342

Decision Trees By Cole Daily CSCI 446.

Elementary Statistics MOREHEAD STATE UNIVERSITY

MIS2502: Data Analytics Clustering and Segmentation

Predicting Students’ Course Success Using Machine Learning Approach

MIS2502: Data Analytics Clustering and Segmentation

Statistical Learning Dong Liu Dept. EEIS, USTC.

Data Pre-processing Lecture Notes for Chapter 2

Diagnostics and Remedial Measures

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

STT : Intro. to Statistical Learning

Presentation transcript:

Decision trees MARIO REGIN

What is a decision tree? General purpose prediction and classification mechanism Emerged at the same time as the nascent fields of artificial intelligence and statistical computation

... ... Nomenclature Root Node Branch Level Input Level Nodes NODE ID: 1 1: 38.2% 0: 61.8% Count: 1309 Level Female Male Gender NODE ID: 23 1: 72.7% 0: 27.3% Count: 466 NODE ID: 24 1: 19.1% 0: 80.9% Count: 843 Input Level ... ... Nodes

Main Characteristic Recursive subsetting of a target field of data according to the values of associated input fields or predictors to create partitions, and associated descendent data subsets, that contain progressively similar intra-node target values and progressively dissimilar inter-node values at any given level of the tree. In other words, decision trees reduce the disorder of the data as we go from higher to lower levels of the tree.

Creating a Decision Tree 1.- Create the root node 2.- Search through the data set to discover the best partitioning input for the current node. 3.- Use the best input to partition the current node to form branches. 4.- Repeat steps 2 and 3 until one of more possible stop conditions are met.

Best Input The selection of the ‘best’ input field is an open subject of active research. Decision trees allow for a variety of computational approaches to input selection. Some possible Approaches to select the best input: High performance predictive model approach. Relationship analysis approach.

High Performance Predictive Model Approach The input that produces the most separation in the variability among the descendent nodes. Emphasis on the selection of high quality partitions that can collectively produce the best overall model.

Relationship Analysis Approach Guide the branching in order to analyst, discover, support or confirm the conditional relations that are assumed to exist among the various inputs and the component nodes that they produce. Emphasis on analyzing the interaction in the formation of the tree

Stopping Rule Generally, stopping rules consist of thresholds on diminishing returns (in terms of test statistics) or in a diminishing supply of training cases (minimum acceptable number of observations in a node).

Tree Construction

? Three Construction ? P N H Y N Y A NODE ID: 1 1: 37.5% 0: 62.5% Cast Shadow Likes Garlic Complexion Vampire ? Yes Pale No Healthy Average NODE ID: 1 1: 37.5% 0: 62.5% Count: 8 ? .5 Cast Shadow Likes Garlic Complexion .6 .68 ? P N H Y N Y A N: 2 Y: 2 N: 3 Y: 0 N: 0 Y: 1 N: 3 Y: 0 N: 2 Y: 3 N: 2 Y: 0 N: 1 Y: 2 N: 2 Y: 1

Three Construction NODE ID: 1 1: 37.5% 0: 62.5% Count: 8 Gender ? N Y Cast Shadow Likes Garlic Complexion Vampire ? Yes Pale No Healthy Average NODE ID: 1 1: 37.5% 0: 62.5% Count: 8 Gender ? N Y NODE ID: 2 1: 50.0% 0: 50.0% Count: 4 NODE ID: 3 1: 00.0% 0: 100% Count: 3 NODE ID: 4 1: 100% 0: 00.0% Count: 1

Three Construction P N H Y A NODE ID: 1 1: 37.5% 0: 62.5% Count: 8 Cast Shadow Likes Garlic Complexion Vampire ? Yes Pale No Healthy Average NODE ID: 1 1: 37.5% 0: 62.5% Count: 8 Gender ? N Y NODE ID: 2 1: 50.0% 0: 50.0% Count: 4 NODE ID: 3 1: 00.0% 0: 100% Count: 3 NODE ID: 4 1: 100% 0: 00.0% Count: 1 Likes Garlic Complexion .5 P N H Y A N: 2 Y: 0 N: 0 Y: 2 N: 1 Y: 0 N: 0 Y: 1 N: 1 Y: 1

Three Construction NODE ID: 1 1: 37.5% 0: 62.5% Count: 8 Gender ? N Y 1: 37.5% 0: 62.5% Count: 8 Gender ? N Y NODE ID: 2 1: 50.0% 0: 50.0% Count: 4 NODE ID: 3 1: 00.0% 0: 100% Count: 3 NODE ID: 4 1: 100% 0: 00.0% Count: 1 Garlic Y N NODE ID: 5 1: 00.0% 0: 100% Count: 2 NODE ID: 6 1: 100% 0: 00.0% Count: 2

Evaluation

Survival in Titanic Sinking NODE ID: 1 1: 38.2% 0: 61.8% Count: 1309 Male with missing Age Female Male Gender NODE ID: 23 1: 72.7% 0: 27.3% Count: 466 NODE ID: 24 1: 19.1% 0: 80.9% Count: 843 Age Age <30.75 or missing ≥30.75 ≥9.5 or missing <9.5 NODE ID: 30 1: 67.8% 0: 32.2% Count: 314 NODE ID: 31 1: 82.9% 0: 17.1% Count: 152 NODE ID: 33 1: 58.1% 0: 41.9% Count: 43 NODE ID: 34 1: 17.0% 0: 83.0% Count: 800

Unequal partition in age Relationship NODE ID: 1 1: 38.2% 0: 61.8% Count: 1309 Female Male Women First Gender NODE ID: 23 1: 72.7% 0: 27.3% Count: 466 NODE ID: 24 1: 19.1% 0: 80.9% Count: 843 Children First Age Age <30.75 or missing ≥30.75 ≥9.5 or missing <9.5 NODE ID: 30 1: 67.8% 0: 32.2% Count: 314 NODE ID: 31 1: 82.9% 0: 17.1% Count: 152 NODE ID: 33 1: 58.1% 0: 41.9% Count: 43 NODE ID: 34 1: 17.0% 0: 83.0% Count: 800 Unequal partition in age

Characteristics of Decision Trees Successive partitioning results in the presentation of a tree-like visual display with a top node and descendent branches. Branch partitions may be two-way or multiway branches. Partitioning fields may be nominal, ordinal, or interval measurement levels. The final result can be a class or a number

Characteristics of Decision Trees Missing values.- Can be grouped with other values or have their own partition. Symmetry.- Descendent nodes can be balanced and symmetrical, employing a matching set of predictors with each level of the subtree. Asymmetry.- Descendent nodes can be unbalanced in that subnode partitions could be based on the most powerful predictor for each node .

Advantages of Decision trees The created decision trees can detect and visually present contextual effects. There are easy to understand. The resulting model is a white box. Flexibility Cut-points for the same input can be different in each node. Missing values are allowed. Numerical and nominal data can be used as input. Output can be nominal or numerical.

Disadvantages of Decision Trees Deep trees tend to over-fit Poor generalization

Multi Trees constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

Creation Method The idea is to create multiple uncorrelated trees Select a random subset of the training set Create a tree Ti with these random subset When creating the partitions of this tree use only a random subset of the inputs to search for the best input.

Evaluation If the output is a class (Classification) Evaluate the sample in all the multiple trees Each tree votes for one class The selected class is the most voted class If the output is a number (Regression) Final result is the average of each result

Extra. Computation of Entropy