Presentation is loading. Please wait.

Presentation is loading. Please wait.

Today’s Topics Read Chapter 3 & Section 4.1 (Skim Section 3.6 and rest of Chapter 4), Sections 5.1, 5.2, 5.3, 5,7, 5.8, & 5.9 (skim rest of Chapter 5)

Similar presentations


Presentation on theme: "Today’s Topics Read Chapter 3 & Section 4.1 (Skim Section 3.6 and rest of Chapter 4), Sections 5.1, 5.2, 5.3, 5,7, 5.8, & 5.9 (skim rest of Chapter 5)"— Presentation transcript:

1 Today’s Topics Read Chapter 3 & Section 4.1 (Skim Section 3.6 and rest of Chapter 4), Sections 5.1, 5.2, 5.3, 5,7, 5.8, & 5.9 (skim rest of Chapter 5) of textbook Reviewing the Info Gain Calc from Last Week HW0 due 11:55pm, HW1 due in one week (two with late days) Fun reading: http://homes.cs.washington.edu/~pedrod/Prologue.pdfhttp://homes.cs.washington.edu/~pedrod/Prologue.pdf Information Gain Derived (and Generalized to k Output Categories) Handling Numeric and Hierarchical Features Advanced Topic: Regression Trees The Trouble with Too Many Possible Values What if Measuring Features is Costly? 9/22/15CS 540 - Fall 2015 (© Jude Shavlik), Lecture 5, Week 31

2 ID3 Info Gain Measure Justified (Ref. C4.5, J. R. Quinlan, Morgan Kaufmann, 1993, pp 21-22) Definition of Information Info conveyed by message M depends on its probability, i.e., info(M)  -log 2 [Prob(M)] (due to Claude Shannon) Note: last week we used infoNeeded() as a more informative name for info() The Supervised Learning Task Select example from a set S and announce it belongs to class C The probability of this occurring is approx f C the fraction of C ’s in S Hence info in this announcement is, by definition, -log 2 (f C ) 9/22/152

3 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 Let there be K different classes in set S, namely C 1, C 2, …, C K What’s expected info from msg about class of an example in set S ? info(s) is the average number of bits of information (by looking at feature values) needed to classify member of set S ID3 Info Gain Measure (cont.) 9/22/153

4 9/15/15CS 540 - Fall 2015 (© Jude Shavlik), Lecture 5, Week 24

5 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 Handling Hierarchical Features in ID3 Define a new feature for each level in hierarchy, e.g., Let ID3 choose the appropriate level of abstraction! Shape CircularPolygonal Shape1 = { Circular, Polygonal } Shape2 = { } 9/22/155

6 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 5, Week 2 Handling Numeric Features in ID3 On the fly create binary features and choose best Step 1: Plot current examples (green=pos, red=neg) Step 2: Divide midway between every consecutive pair of points with different categories to create new binary features, eg feature new1  F<8 and feature new2  F<10 Step 3: Choose split with best info gain (compete with all other features) Value of Feature 5 79 1113 9/15/156 Note: “On the fly” means in each recursive call to ID3

7 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 Handling Numeric Features (cont.) Technical Note F<10 F< 5 + + - T TF F Cannot discard numeric feature after use in one portion of d-tree 9/22/157

8 Advanced Topic: Regression Trees (assume features are numerically valued) CS 540 - Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 Age > 25 Gender Output = 4 f 3 + 7 f 5 – 2 f 9 Output = 7 f 6 - 2 f 1 - 2 f 8 + f 7 Output = 100 f 4 – 2 f 8 Yes M No F 9/22/158

9 We want to return real values at the leaves - For each feature, F, “split” as done in ID3 - Use residue remaining, say using Linear Least Squares (LLS), instead of info gain to score candidate splits Why not a weighted sum in total error? Commonly models at leaves are wgt’ed sums of features (y = mx + b) Some approaches just place constants at leaves CS 540 - Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 Advanced Topic: Scoring “Splits” for Regression (Real-Valued) Problems X Output LLS 9/22/159

10 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 Unfortunate Characteristic Property of Using Info-Gain Measure FAVORS FEATURES WITH HIGH BRANCHING FACTORS (ie, many possible values) Extreme Case: At most one example per leaf and all Info(.,.) scores for leaves equals zero, so gets perfect score! But generalizes very poorly (ie, memorizes data) 1 + 0 - 0 + 0 - 0 + 1 - 1 99 999999 Student ID 9/22/1510 ……

11 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 One Fix (used in HW0/HW1) Convert all features to binary eg, Color = { Red, Blue, Green } From one N-valued feature to N binary-valued features Color = Red? Color = Blue? Color = Green? Used in Neural Nets and SVMs D-tree readability probably less, but not necessarily 9/22/1511

12 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 Considering the Cost of Measuring a Feature Want trees with high accuracy and whose tests are inexpensive to compute –take temperature vs. do CAT scan Common Heuristic –InformationGain(F)² / Cost(F) –Used in medical domains as well as robot-sensing tasks 9/22/1512


Download ppt "Today’s Topics Read Chapter 3 & Section 4.1 (Skim Section 3.6 and rest of Chapter 4), Sections 5.1, 5.2, 5.3, 5,7, 5.8, & 5.9 (skim rest of Chapter 5)"

Similar presentations


Ads by Google