Tree-based methods, neutral networks

Tree-based methods, neutral networks
Lecture 10

Tree-based methods Statistical methods in which the input space (feature space) is partitioned into a set of cuboids (rectangles), and then a simple model is set up in each one

Why decision trees Compact representation of data
Possibility to predict outcome of new observations

Tree structure Root Nodes Leaves (terminal nodes)
Parent-child relationship Condition Label is assigned to a leaf Cond.1 Cond.2 Cond.3 Cond.4 Cond.6 Cond.5 N4 N5 N6 N7

Example Body temperature? Root node Warm Cold Internal node
Non-mammals Gives birth? Yes No Leaf nodes Mammals Non-mammals

How to build a decision tree: Hunt’s algorithm
Proc Hunt(Dt,t) Given Data set Dt={(X1i,..Xpi, Yi), i=1..n}, t-curr.node If all Yi are equal mark t as leaf with label Yi If not, use the test condition to split into Dt1…Dtn, create children t1…tn and run Hunt(Dt1,t1),…, Hunt(Dtn,tn)

Hunt’s algorithm example
20 X1 <9 >=9 X2 X2 <16 >=16 <7 >=7 1 1 X1 10 <15 >=15 1 10 20

Hunt’s algorithm What if some combinations of attributes are missing?
Empty node Is assigned the label representing the majority class among the records (instances, objects, cases) in its parent node. All records in a node have identical attributes The node is declared a leaf node with the same class label as the majority class of this node

CART: Classification and regression trees
Given Dt={(X1i,..Xpi, Yi), i=1..n}, Y – continuous, build a tree that will fit the data best Classification trees Given Dt={(X1i,..Xpi, Yi), i=1..n}, Y – categorical, build a tree that will classify the observations best

A CART algorithm: Regression trees
Aim: Want to find ; computationally expensive to test all possible splits. Instead Splitting variables and split points Consider a splitting variable j and a split point s, and define the pair of half planes We seek the splitting variable j and split point s that solve

Post-pruning How large tree to grow? Too large – overfitting!
Grow a large tree T0 Then prune this tree using post-pruning Define a subtree T and index its terminal nodes by m, with node m representing region Rm. Let |T| denote the number of terminal nodes in T and set where Then minimize this expression, using cross-validation to select the factor  that penalizes complex trees.

CART: Classification trees
For each node define proportions Define measure of impurity

Design issues of decision tree induction
How to split the training records We need a measure for evaluating the goodness of various test conditions How to terminate the splitting procedure 1) Continue expanding nodes until either all the records belong to the same class or all the records have identical attribute values 2) Define criteria for early termination

How to split: CART Select splitting with max information gain
where I(.) is the impurity measure of a given node, N is the total number of records at the parent node, and N(vj) is the is the number of records associated with the child node vj

How to split: C4.5 Impurity measures such as Gini index tend to favour attributes that have a large number of distinct values Strategy 1: Restrict the test conditions to binary splits only Strategy 2: Use the gain ratio as splitting criterion

Constructing decision trees
Home owner Yes No Defaulted = No Marital status Not married Married Defaulted = No Income ≤ 100 K > 100 K Defaulted = ?

Expressing attribute test conditions
Binary attributes Binary splits Nominal attributes Binary or multiway splits Ordinal attributes Binary or multiway splits honoring the order of the attributes Continuous attributes Binary or multiway splits into disjoint interval

Characteristics of decision tree induction
Nonparametric approach (no underlying probability model) Computationally inexpensive techniques have been dveloped for constructing decision trees. Once a decision tree has been built, classification is extremely fast The presence of redundant attributes will not adversely affect the accuracy of decision trees The presence of irrelevant attributes can lower the accuracy of decision trees, especially if no measures are taken to avoid overfitting At the leaf nodes, the number of records may be too small (data fragmentation)

Neural networks Joint theoretical framework for prediction and classification

Principal components regression (PCR)
Extract principal components (transformation of the inputs) as derived features, and then model the target (response) as a linear function of these features y … z1 z2 zM … x1 x2 xp

Neural networks with a single target
Extract linear combinations of the inputs as derived features, and then model the target (response) as a linear function of a sigmoid function of these features y z1 z2 … zM … x1 x1 xp

Artificial neural networks
Introduction from biology: Neurons Axons Dendrites Synapse Capabilities of neural networks: Memorization (noise stable, fragmentary stable!) Classification

Terminology … … … Feed-forward neural network Input layer
[Hidden layer(s)] Output layer … f1 fK z1 z2 … zM … x1 x2 xp

Terminology Feed-forward network Recurrent network
Nodes in one layer are connected to the nodes in next layer Recurrent network Nodes in one layer may be connected to the ones in previous layer or within the same layer

Terminology Formulas for multilayer perceptron (MLP)
C1, C2 combination function g, ς activation function α0m β0k bias of hidden unit αim βjk weight of connection

Recommended reading Book, paragraph 9.2 EM Reference: Tree Node
Start with: Book, paragraph 11 EM Reference: Neural Network node

Tree-based methods, neutral networks

Similar presentations

Presentation on theme: "Tree-based methods, neutral networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tree-based methods, neutral networks

Similar presentations

Presentation on theme: "Tree-based methods, neutral networks"— Presentation transcript:

Similar presentations

About project

Feedback