Decision Trees: Another Example Play Tennis? Training Set: Weak Rain Mild High Weak No Rain Mild High Weak No
Overfitting/Underfitting in Decision Trees
Over-fitting vs. Under-fitting Over fitting: like a botanist with photographic memory who, when presented with a new tree, concludes that it is not a tree because it has a different number of leaves from anything he/she has seen before Too many leaves. Not a Tree!
Over-fitting vs. Under-fitting Over fitting: like a botanist with photographic memory who, when presented with a new tree, concludes that it is not a tree because it has a different number of leaves from anything he/she has seen before Under fitting: like the botanist’s lazy friend, who declares that if it’s green, it’s a tree Tree!
Over-fitting vs. Under-fitting Over fitting: like a botanist with photographic memory who, when presented with a new tree, concludes that it is not a tree because it has a different number of leaves from anything he/she has seen before Under fitting: like the botanist’s lazy friend, who declares that if it’s green, it’s a tree Tree! Need a good balance between the two.
Over-fitting Typical learning curve Typical learning curve size of training set % correct on test set 100 Typical learning curve size of training set % correct on test set 100 Typical learning curve Over-fit Tree
Avoid Over-fitting: Pruning Randomly split the training data into TRAIN and TUNE, for example, 70% and 30% Build a full tree using only TRAIN Prune the tree using TUNE How to remove some of the nodes? (What can we replace a node with?)
25 10
25 10
No 25 25 10
Simple Pruning Algorithm Let T be the original tree Let A be the accuracy of T on TRAIN Starting from the lowest level in T: For each node at this level: Replace a node, n, with its majority label; n will now be a leaf node Compute the accuracy of T on TUNE If accuracy not affected (still == A) then leave n as a leaf; Otherwise keep both labels of n. Repeat from Step (1) at next level.
Case Study [W.J. Kuol, 2001] Decision trees have been shown to be at least as accurate as human experts for diagnosing breast cancer Human accuracy: 86.67% Decision Tree accuracy: 95.5%
Decision Trees Pros Cons Intuitive/Easy to understand Quick to train Quick to classify Cons Over-fitting/Pruning Required Not optimal Returns just a label (no other info)
Pr = .57
Probabilistic Learning Find likelihood of new events based on previous events. Games: Poker, Blackjack Medical Diagnosis Recommender Systems Sentiment Analysis Spam Filtering
Probability Basics Sample space: set of all possible values of an event ex: event of rolling pair of dice (fair, independent) Size of sample space? S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6) (2, 1), …. .. (6, 1)… } Probability of each? 1/36 = 1/6*1/6
Probability Basics Sample space: set of all possible values of an event ex: event of rolling pair of dice (fair, independent) Size of sample space? 36 S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6) (2, 1), …. .. (6, 1)… } Probability of each? 1/36 = 1/6*1/6
Probability Basics ex: event of rolling pair of dice and sum = 8 S = {(4, 4), (3, 5), (5, 3), (2, 6), (6, 2)} Probability of each? 5/36
Probability Basics ex: event of rolling pair of dice and sum = 8 S = {(4, 4), (3, 5), (5, 3), (2, 6), (6, 2)} Probability of each? 5/36 Unconditional Probability - does not use any information about the past For Conditional Probability, use Bayes Formula