Learning with Identification Trees

Slides:



Advertisements
Similar presentations
Introduction to Artificial Intelligence CS440/ECE448 Lecture 21
Advertisements

Naïve-Bayes Classifiers Business Intelligence for Managers.
CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 1.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Machine Learning Decision Trees. Exercise Solutions.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Machine learning learning... a fundamental aspect of intelligent systems – not just a short-cut to kn acquisition / complex behaviour.
Information Extraction Lecture 6 – Decision Trees (Basic Machine Learning) CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Lecture outline Classification Decision-tree classification.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Sparse vs. Ensemble Approaches to Supervised Learning
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Learning: Identification Trees Larry M. Manevitz All rights reserved.
Decision Tree Learning
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Information Theory, Classification & Decision Trees Ling 572 Advanced Statistical Methods in NLP January 5, 2012.
Learning: Nearest Neighbor Artificial Intelligence CMSC January 31, 2002.
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
For Wednesday No new reading Homework: –Chapter 18, exercises 3, 4, 7.
EIE426-AICV1 Machine Learning Filename: eie426-machine-learning-0809.ppt.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Learning with Decision Trees Artificial Intelligence CMSC February 20, 2003.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
Searching by Authority Artificial Intelligence CMSC February 12, 2008.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
Learning with Decision Trees Artificial Intelligence CMSC February 18, 2003.
Classification and Regression Trees
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Medical Decision Making Learning: Decision Trees Artificial Intelligence CMSC February 10, 2005.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Iterative Dichotomiser 3 (ID3) Algorithm
k-Nearest neighbors and decision tree
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Ch9: Decision Trees 9.1 Introduction A decision tree:
Chapter 6 Classification and Prediction
Classification Nearest Neighbor
K Nearest Neighbor Classification
Introduction to Data Mining, 2nd Edition by
Data Mining Practical Machine Learning Tools and Techniques
Classification Nearest Neighbor
Machine Learning Chapter 3. Decision Tree Learning
Instance Based Learning
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning in Practice Lecture 23
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Statistical Learning Dong Liu Dept. EEIS, USTC.
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Hubs and Authorities & Learning: Perceptrons
Decision Trees Berlin Chen
A task of induction to find patterns
A task of induction to find patterns
Presentation transcript:

Learning with Identification Trees Artificial Intelligence CMSC 25000 February 7, 2002

Agenda Midterm results Learning from examples Nearest Neighbor reminder Identification Trees: Basic characteristics Sunburn example From trees to rules Learning by minimizing heterogeneity Analysis: Pros & Cons

Midterm Results Mean: 62.5; Std. Dev. : 19.5

Machine Learning: Review Automatically acquire a function from inputs to output values, based on previously seen inputs and output values. Input: Vector of feature values Output: Value Examples: Word pronunciation, robot motion, speech recognition

Machine Learning: Review Key contrasts: Supervised versus Unsupervised With or without labeled examples (known outputs) Classification versus Regression Output values: Discrete versus continuous-valued Types of functions learned aka “Inductive Bias” Learning algorithm restricts things that can be learned

Machine Learning: Review Key issues: Feature selection: What features should be used? How do they relate to each other? How sensitive is the technique to feature selection? Irrelevant, noisy, absent feature; feature types Complexity & Generalization Tension between Matching training data Performing well on NEW UNSEEN inputs

Learning: Nearest Neighbor Supervised, Classification or Regression, Vornoi diagrams Training: Record input vectors and associated outputs Prediction: Find “nearest” training vector to NEW input Return associated output value Advantages: Fast training, Very general Disadvantages: Expensive prediction, definition of distance is complex, sensitive to feature & classification noise

Learning: Identification Trees (aka Decision Trees) Supervised learning Primarily classification Rectangular decision boundaries More restrictive than nearest neighbor Robust to irrelevant attributes, noise Fast prediction

Sunburn Example

Learning about Sunburn Goal: Train on labeled examples Predict Burn/None for new instances Solution?? Exact match: same features, same output Problem: 2*3^3 feature combinations Could be much worse Nearest Neighbor style Problem: What’s close? Which features matter? Many match on two features but differ on result

Learning about Sunburn Better Solution: Identification tree: Training: Divide examples into subsets based on feature tests Sets of samples at leaves define classification Prediction: Route NEW instance through tree to leaf based on feature tests Assign same value as samples at leaf

Sunburn Identification Tree Hair Color Blonde Brown Red Lotion Used Emily: Burn Alex: None John: None Pete: None No Yes Sarah: Burn Annie: Burn Katie: None Dana: None

Simplicity Occam’s Razor: Occam’s Razor for ID trees: Problem: Simplest explanation that covers the data is best Occam’s Razor for ID trees: Smallest tree consistent with samples will be best predictor for new data Problem: Finding all trees & finding smallest: Expensive! Solution: Greedily build a small tree

Building ID Trees Goal: Build a small tree such that all samples at leaves have same class Greedy solution: At each node, pick test such that branches are closest to having same class Split into subsets with least “disorder” (Disorder ~ Entropy) Find test that minimizes disorder

Minimizing Disorder Hair Color Height Blonde Brown Tall Short Red Average Sarah: B Dana: N Annie: B Katie: N Alex: N Pete: N John: N Alex:N Annie:B Katie:N Sarah:B Emily:B John:N Dana:N Pete:N Emily: B Lotion Weight Yes No Heavy Light Average Sarah:B Annie:B Emily:B Pete:N John:N Dana:N Alex:N Katie:N Dana:N Alex:N Annie:B Emily:B Pete:N John:N Sarah:B Katie:N

Minimizing Disorder Height Tall Short Average Lotion Weight Yes No Annie:B Katie:N Sarah:B Dana:N Lotion Weight Yes No Heavy Light Average Sarah:B Annie:B Dana:N Katie:N Dana:N Annie:B Sarah:B Katie:N

Measuring Disorder Problem: Solution: In general, tests on large DB’s don’t yield homogeneous subsets Solution: General information theoretic measure of disorder Desired features: Homogeneous set: least disorder = 0 Even split: most disorder = 1

Measuring Entropy If split m objects into 2 bins size m1 & m2, what is the entropy?

Measuring Disorder Entropy the probability of being in bin i Entropy (disorder) of a split Assume -½ log2½ - ½ log2½ = ½ +½ = 1 ½ -¼ log2¼ - ¾ log2¾ = 0.5 + 0.311 = 0.811 ¾ ¼ -1log21 - 0log20 = 0 - 0 = 0 1 Entropy p2 p1

Computing Disorder N instances Branch 2 Branch1 N2 a N1 a N2 b N1 b Disorder of class distribution on branch i Fraction of samples down branch i

Entropy in Sunburn Example Hair color = 4/8(-2/4 log 2/4 - 2/4log2/4) + 1/8*0 + 3/8 *0 = 0.5 Height = 0.69 Weight = 0.94 Lotion = 0.61

Entropy in Sunburn Example Height = 2/4(-1/2log1/2-1/2log1/2) + 1/4*0+1/4*0 = 0.5 Weight = 2/4(-1/2log1/2-1/2log1/2) +2/4(-1/2log1/2-1/2log1/2) = 1 Lotion = 0

Building ID Trees with Disorder Until each leaf is as homogeneous as possible Select an inhomogeneous leaf node Replace that leaf node by a test node creating subsets with least average disorder Effectively creates set of rectangular regions Repeatedly draws lines in different axes

Features in ID Trees: Pros Feature selection: Tests features that yield low disorder E.g. selects features that are important! Ignores irrelevant features Feature type handling: Discrete type: 1 branch per value Continuous type: Branch on >= value Need to search to find best breakpoint Absent features: Distribute uniformly

Features in ID Trees: Cons Assumed independent If want group effect, must model explicitly E.g. make new feature AorB Feature tests conjunctive

From Trees to Rules Tree: Branches from root to leaves = Tests => classifications Tests = if antecedents; Leaf labels= consequent All ID trees-> rules; Not all rules as trees

From ID Trees to Rules Hair Color Blonde Brown Red Lotion Used Emily: Burn Alex: None John: None Pete: None No Yes Sarah: Burn Annie: Burn Katie: None Dana: None (if (equal haircolor blonde) (equal lotionused yes) (then None)) (if (equal haircolor blonde) (equal lotionused no) (then Burn)) (if (equal haircolor red) (then Burn)) (if (equal haircolor brown) (then None))

Identification Trees Train: Predict: Build tree by forming subsets of least disorder Predict: Traverse tree based on feature tests Assign leaf node sample label Pros: Robust to irrelevant features, some noise, fast prediction, perspicuous rule reading Cons: Poor feature combination, dependency, optimal tree build intractable