Tree-based methods, neutral networks

Slides:

Advertisements

Similar presentations

DECISION TREES. Decision trees  One possible representation for hypotheses.

Advertisements

Random Forest Predrag Radenković 3237/10

CHAPTER 9: Decision Trees

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe

Decision Tree Approach in Data Mining

Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,

Bab /44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree.

Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Classification: Definition l Given a collection of records (training set) l Find a model.

1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,

Data Mining Classification This lecture node is modified based on Lecture Notes for Chapter 4/5 of Introduction to Data Mining by Tan, Steinbach, Kumar,

Data Mining Classification: Alternative Techniques

Data Mining Classification: Alternative Techniques

Classification Techniques: Decision Tree Learning

Chapter 7 – Classification and Regression Trees

Chapter 7 – Classification and Regression Trees

Lecture Notes for Chapter 4 Introduction to Data Mining

Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,

Lecture outline Classification Decision-tree classification.

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

CSci 8980: Data Mining (Fall 2002)

1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Lecture 5 (Classification with Decision Trees)

Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.

Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.

Classification.

Data mining and statistical learning - lecture 12 Neural networks (NN) and Multivariate Adaptive Regression Splines (MARS)  Different types of neural.

Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Comp 540 Chapter 9: Additive Models, Trees, and Related Methods

Decision Tree Learning

DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.

1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,

Chapter 9 – Classification and Regression Trees

Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.

Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes.

Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.

Modul 6: Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of.

Review - Decision Trees

Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.

1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.

For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.

MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

Practical Issues of Classification Underfitting and Overfitting –Training errors –Generalization (test) errors Missing Values Costs of Classification.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

Classification: Basic Concepts, Decision Trees. Classification: Definition l Given a collection of records (training set ) –Each record contains a set.

Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.

Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.

Lecture Notes for Chapter 4 Introduction to Data Mining

Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining By Tan, Steinbach,

Classification and Regression Trees

CIS 335 CIS 335 Data Mining Classification Part I.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.

Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.

Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.

Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Decision Trees CSC 600: Data Mining Class 8.

Artificial Neural Networks

Ch9: Decision Trees 9.1 Introduction A decision tree:

Data Mining Classification: Basic Concepts and Techniques

Basic Concepts and Decision Trees

Lecture Notes for Chapter 4 Artificial Neural Networks

Statistical Learning Dong Liu Dept. EEIS, USTC.

COSC 4335: Part2: Other Classification Techniques

Presentation transcript:

Tree-based methods, neutral networks Lecture 10

Tree-based methods Statistical methods in which the input space (feature space) is partitioned into a set of cuboids (rectangles), and then a simple model is set up in each one

Why decision trees Compact representation of data Possibility to predict outcome of new observations

Tree structure Root Nodes Leaves (terminal nodes) Parent-child relationship Condition Label is assigned to a leaf Cond.1 Cond.2 Cond.3 Cond.4 Cond.6 Cond.5 N4 N5 N6 N7

Example Body temperature? Root node Warm Cold Internal node Non-mammals Gives birth? Yes No Leaf nodes Mammals Non-mammals

How to build a decision tree: Hunt’s algorithm Proc Hunt(Dt,t) Given Data set Dt={(X1i,..Xpi, Yi), i=1..n}, t-curr.node If all Yi are equal mark t as leaf with label Yi If not, use the test condition to split into Dt1…Dtn, create children t1…tn and run Hunt(Dt1,t1),…, Hunt(Dtn,tn)

Hunt’s algorithm example 20 X1 <9 >=9 X2 X2 <16 >=16 <7 >=7 1 1 X1 10 <15 >=15 1 10 20

Hunt’s algorithm What if some combinations of attributes are missing? Empty node Is assigned the label representing the majority class among the records (instances, objects, cases) in its parent node. All records in a node have identical attributes The node is declared a leaf node with the same class label as the majority class of this node

CART: Classification and regression trees Given Dt={(X1i,..Xpi, Yi), i=1..n}, Y – continuous, build a tree that will fit the data best Classification trees Given Dt={(X1i,..Xpi, Yi), i=1..n}, Y – categorical, build a tree that will classify the observations best

A CART algorithm: Regression trees Aim: Want to find ; computationally expensive to test all possible splits. Instead Splitting variables and split points Consider a splitting variable j and a split point s, and define the pair of half planes We seek the splitting variable j and split point s that solve

Post-pruning How large tree to grow? Too large – overfitting! Grow a large tree T0 Then prune this tree using post-pruning Define a subtree T and index its terminal nodes by m, with node m representing region Rm. Let |T| denote the number of terminal nodes in T and set where Then minimize this expression, using cross-validation to select the factor  that penalizes complex trees.

CART: Classification trees For each node define proportions Define measure of impurity

Design issues of decision tree induction How to split the training records We need a measure for evaluating the goodness of various test conditions How to terminate the splitting procedure 1) Continue expanding nodes until either all the records belong to the same class or all the records have identical attribute values 2) Define criteria for early termination

How to split: CART Select splitting with max information gain where I(.) is the impurity measure of a given node, N is the total number of records at the parent node, and N(vj) is the is the number of records associated with the child node vj

How to split: C4.5 Impurity measures such as Gini index tend to favour attributes that have a large number of distinct values Strategy 1: Restrict the test conditions to binary splits only Strategy 2: Use the gain ratio as splitting criterion

Constructing decision trees Home owner Yes No Defaulted = No Marital status Not married Married Defaulted = No Income ≤ 100 K > 100 K Defaulted = ?

Expressing attribute test conditions Binary attributes Binary splits Nominal attributes Binary or multiway splits Ordinal attributes Binary or multiway splits honoring the order of the attributes Continuous attributes Binary or multiway splits into disjoint interval

Characteristics of decision tree induction Nonparametric approach (no underlying probability model) Computationally inexpensive techniques have been dveloped for constructing decision trees. Once a decision tree has been built, classification is extremely fast The presence of redundant attributes will not adversely affect the accuracy of decision trees The presence of irrelevant attributes can lower the accuracy of decision trees, especially if no measures are taken to avoid overfitting At the leaf nodes, the number of records may be too small (data fragmentation)

Neural networks Joint theoretical framework for prediction and classification

Principal components regression (PCR) Extract principal components (transformation of the inputs) as derived features, and then model the target (response) as a linear function of these features y … z1 z2 zM … x1 x2 xp

Neural networks with a single target Extract linear combinations of the inputs as derived features, and then model the target (response) as a linear function of a sigmoid function of these features y z1 z2 … zM … x1 x1 xp

Artificial neural networks Introduction from biology: Neurons Axons Dendrites Synapse Capabilities of neural networks: Memorization (noise stable, fragmentary stable!) Classification

Terminology … … … Feed-forward neural network Input layer [Hidden layer(s)] Output layer … f1 fK z1 z2 … zM … x1 x2 xp

Terminology Feed-forward network Recurrent network Nodes in one layer are connected to the nodes in next layer Recurrent network Nodes in one layer may be connected to the ones in previous layer or within the same layer

Terminology Formulas for multilayer perceptron (MLP) C1, C2 combination function g, ς activation function α0m β0k bias of hidden unit αim βjk weight of connection

Recommended reading Book, paragraph 9.2 EM Reference: Tree Node Start with: Book, paragraph 11 EM Reference: Neural Network node