Decision Trees: Another Example

Slides:



Advertisements
Similar presentations
Is Random Model Better? -On its accuracy and efficiency-
Advertisements

Mining High-Speed Data Streams
Imbalanced data David Kauchak CS 451 – Fall 2013.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Copyright © Cengage Learning. All rights reserved. 7 Probability.
9.7 Probability Mutually exclusive events. Definition of Probability Probability is the Outcomes divided by Sample Space. Outcomes the results of some.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Ensemble Learning: An Introduction
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
CS121 Heuristic Search Planning CSPs Adversarial Search Probabilistic Reasoning Probabilistic Belief Learning.
Classification.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Ensemble Learning (2), Tree and Forest
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Mohammad Ali Keyvanrad
CS 391L: Machine Learning: Ensembles
Today’s Topics Dealing with Noise Overfitting (the key issue in all of ML) A ‘Greedy’ Algorithm for Pruning D-Trees Generating IF-THEN Rules from D-Trees.
Let’s talk about conditional probability by considering a specific example: –suppose we roll a pair of dice and are interested in the probability of getting.
Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ Reduced-error pruning : –breaks the samples into a training set and.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
Classification Today: Basic Problem Decision Trees.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
CIS 335 CIS 335 Data Mining Classification Part I.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
8.7 – Probability. Probability Probability = the likelihood that an event will occur Outcomes = possible results of an event Probability formula: P(event)
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
By Subhasis Dasgupta Asst Professor Praxis Business School, Kolkata Classification Modeling Decision Tree (Part 2)
Section 12.3 Conditional Probability. Activity #1 Suppose five cards are drawn from a standard deck of playing cards without replacement. What is the.
Data Science Credibility: Evaluating What’s Been Learned
Machine Learning: Ensemble Methods
Essential Ideas for The Nature of Probability
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Warm Up One card is selected at random from a standard deck of 52 playing cards. What is the probability that the card is either a club or an ace?  
Decision Trees: Another Example
Artificial Intelligence
Trees, bagging, boosting, and stacking
Minds on! If you choose an answer to this question at random, what is the probability you will be correct? A) 25% B) 50% C) 100% D) 25%
4.5 – Finding Probability Using Tree Diagrams and Outcome Tables
Data Science Algorithms: The Basic Methods
Probability.
Decision Tree Saed Sayad 9/21/2018.
Classification and Prediction
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning Chapter 3. Decision Tree Learning
CSCI N317 Computation for Scientific Applications Unit Weka
Chapter 9 Probability.
MIS2502: Data Analytics Classification Using Decision Trees
Probability Year 10 IGCSE – Chapter 10.
Decision Trees Jeff Storey.
NAÏVE BAYES CLASSIFICATION
Figure 8.1 A pair of dice. Figure 8.1. Figure 8.1 A pair of dice. Figure 8.1.
Lesson 56 – Probability of Dependent Events – Conditional Probabilty
Naïve Bayes Classifier
Presentation transcript:

Decision Trees: Another Example Play Tennis? Training Set: Weak Rain Mild High Weak No Rain Mild High Weak No

Overfitting/Underfitting in Decision Trees

Over-fitting vs. Under-fitting Over fitting: like a botanist with photographic memory who, when presented with a new tree, concludes that it is not a tree because it has a different number of leaves from anything he/she has seen before Too many leaves. Not a Tree!

Over-fitting vs. Under-fitting Over fitting: like a botanist with photographic memory who, when presented with a new tree, concludes that it is not a tree because it has a different number of leaves from anything he/she has seen before Under fitting: like the botanist’s lazy friend, who declares that if it’s green, it’s a tree Tree!

Over-fitting vs. Under-fitting Over fitting: like a botanist with photographic memory who, when presented with a new tree, concludes that it is not a tree because it has a different number of leaves from anything he/she has seen before Under fitting: like the botanist’s lazy friend, who declares that if it’s green, it’s a tree Tree! Need a good balance between the two.

Over-fitting Typical learning curve Typical learning curve size of training set % correct on test set 100 Typical learning curve size of training set % correct on test set 100 Typical learning curve Over-fit Tree

Avoid Over-fitting: Pruning Randomly split the training data into TRAIN and TUNE, for example, 70% and 30% Build a full tree using only TRAIN Prune the tree using TUNE How to remove some of the nodes? (What can we replace a node with?)

25 10

25 10

No 25 25 10

Simple Pruning Algorithm Let T be the original tree Let A be the accuracy of T on TRAIN Starting from the lowest level in T: For each node at this level: Replace a node, n, with its majority label; n will now be a leaf node Compute the accuracy of T on TUNE If accuracy not affected (still == A) then leave n as a leaf; Otherwise keep both labels of n. Repeat from Step (1) at next level.

Case Study [W.J. Kuol, 2001] Decision trees have been shown to be at least as accurate as human experts for diagnosing breast cancer Human accuracy: 86.67% Decision Tree accuracy: 95.5%

Decision Trees Pros Cons Intuitive/Easy to understand Quick to train Quick to classify Cons Over-fitting/Pruning Required Not optimal Returns just a label (no other info)

Pr = .57

Probabilistic Learning Find likelihood of new events based on previous events. Games: Poker, Blackjack Medical Diagnosis Recommender Systems Sentiment Analysis Spam Filtering

Probability Basics Sample space: set of all possible values of an event ex: event of rolling pair of dice (fair, independent) Size of sample space? S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6) (2, 1), …. .. (6, 1)… } Probability of each? 1/36 = 1/6*1/6

Probability Basics Sample space: set of all possible values of an event ex: event of rolling pair of dice (fair, independent) Size of sample space? 36 S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6) (2, 1), …. .. (6, 1)… } Probability of each? 1/36 = 1/6*1/6

Probability Basics ex: event of rolling pair of dice and sum = 8 S = {(4, 4), (3, 5), (5, 3), (2, 6), (6, 2)} Probability of each? 5/36

Probability Basics ex: event of rolling pair of dice and sum = 8 S = {(4, 4), (3, 5), (5, 3), (2, 6), (6, 2)} Probability of each? 5/36 Unconditional Probability - does not use any information about the past For Conditional Probability, use Bayes Formula