Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tree-based methods, neutral networks

Similar presentations


Presentation on theme: "Tree-based methods, neutral networks"— Presentation transcript:

1 Tree-based methods, neutral networks
Lecture 10

2 Tree-based methods Statistical methods in which the input space (feature space) is partitioned into a set of cuboids (rectangles), and then a simple model is set up in each one

3 Why decision trees Compact representation of data
Possibility to predict outcome of new observations

4 Tree structure Root Nodes Leaves (terminal nodes)
Parent-child relationship Condition Label is assigned to a leaf Cond.1 Cond.2 Cond.3 Cond.4 Cond.6 Cond.5 N4 N5 N6 N7

5 Example Body temperature? Root node Warm Cold Internal node
Non-mammals Gives birth? Yes No Leaf nodes Mammals Non-mammals

6 How to build a decision tree: Hunt’s algorithm
Proc Hunt(Dt,t) Given Data set Dt={(X1i,..Xpi, Yi), i=1..n}, t-curr.node If all Yi are equal mark t as leaf with label Yi If not, use the test condition to split into Dt1…Dtn, create children t1…tn and run Hunt(Dt1,t1),…, Hunt(Dtn,tn)

7 Hunt’s algorithm example
20 X1 <9 >=9 X2 X2 <16 >=16 <7 >=7 1 1 X1 10 <15 >=15 1 10 20

8 Hunt’s algorithm What if some combinations of attributes are missing?
Empty node Is assigned the label representing the majority class among the records (instances, objects, cases) in its parent node. All records in a node have identical attributes The node is declared a leaf node with the same class label as the majority class of this node

9 CART: Classification and regression trees
Given Dt={(X1i,..Xpi, Yi), i=1..n}, Y – continuous, build a tree that will fit the data best Classification trees Given Dt={(X1i,..Xpi, Yi), i=1..n}, Y – categorical, build a tree that will classify the observations best

10 A CART algorithm: Regression trees
Aim: Want to find ; computationally expensive to test all possible splits. Instead Splitting variables and split points Consider a splitting variable j and a split point s, and define the pair of half planes We seek the splitting variable j and split point s that solve

11 Post-pruning How large tree to grow? Too large – overfitting!
Grow a large tree T0 Then prune this tree using post-pruning Define a subtree T and index its terminal nodes by m, with node m representing region Rm. Let |T| denote the number of terminal nodes in T and set where Then minimize this expression, using cross-validation to select the factor  that penalizes complex trees.

12 CART: Classification trees
For each node define proportions Define measure of impurity

13 Design issues of decision tree induction
How to split the training records We need a measure for evaluating the goodness of various test conditions How to terminate the splitting procedure 1) Continue expanding nodes until either all the records belong to the same class or all the records have identical attribute values 2) Define criteria for early termination

14 How to split: CART Select splitting with max information gain
where I(.) is the impurity measure of a given node, N is the total number of records at the parent node, and N(vj) is the is the number of records associated with the child node vj

15 How to split: C4.5 Impurity measures such as Gini index tend to favour attributes that have a large number of distinct values Strategy 1: Restrict the test conditions to binary splits only Strategy 2: Use the gain ratio as splitting criterion

16 Constructing decision trees
Home owner Yes No Defaulted = No Marital status Not married Married Defaulted = No Income ≤ 100 K > 100 K Defaulted = ?

17 Expressing attribute test conditions
Binary attributes Binary splits Nominal attributes Binary or multiway splits Ordinal attributes Binary or multiway splits honoring the order of the attributes Continuous attributes Binary or multiway splits into disjoint interval

18 Characteristics of decision tree induction
Nonparametric approach (no underlying probability model) Computationally inexpensive techniques have been dveloped for constructing decision trees. Once a decision tree has been built, classification is extremely fast The presence of redundant attributes will not adversely affect the accuracy of decision trees The presence of irrelevant attributes can lower the accuracy of decision trees, especially if no measures are taken to avoid overfitting At the leaf nodes, the number of records may be too small (data fragmentation)

19 Neural networks Joint theoretical framework for prediction and classification

20 Principal components regression (PCR)
Extract principal components (transformation of the inputs) as derived features, and then model the target (response) as a linear function of these features y z1 z2 zM x1 x2 xp

21 Neural networks with a single target
Extract linear combinations of the inputs as derived features, and then model the target (response) as a linear function of a sigmoid function of these features y z1 z2 zM x1 x1 xp

22 Artificial neural networks
Introduction from biology: Neurons Axons Dendrites Synapse Capabilities of neural networks: Memorization (noise stable, fragmentary stable!) Classification

23 Terminology … … … Feed-forward neural network Input layer
[Hidden layer(s)] Output layer f1 fK z1 z2 zM x1 x2 xp

24 Terminology Feed-forward network Recurrent network
Nodes in one layer are connected to the nodes in next layer Recurrent network Nodes in one layer may be connected to the ones in previous layer or within the same layer

25 Terminology Formulas for multilayer perceptron (MLP)
C1, C2 combination function g, ς activation function α0m β0k bias of hidden unit αim βjk weight of connection

26 Recommended reading Book, paragraph 9.2 EM Reference: Tree Node
Start with: Book, paragraph 11 EM Reference: Neural Network node


Download ppt "Tree-based methods, neutral networks"

Similar presentations


Ads by Google