Today’s Topics Read Chapter 3 & Section 4.1 (Skim Section 3.6 and rest of Chapter 4), Sections 5.1, 5.2, 5.3, 5,7, 5.8, & 5.9 (skim rest of Chapter 5)

Slides:



Advertisements
Similar presentations
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Advertisements

Decision Trees with Numeric Tests
Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Chapter 7 – Classification and Regression Trees
Regression. So far, we've been looking at classification problems, in which the y values are either 0 or 1. Now we'll briefly consider the case where.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Lecture 5 (Classification with Decision Trees)
Decision Trees Chapter 18 From Data to Knowledge.
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
CS 206 Introduction to Computer Science II 02 / 25 / 2009 Instructor: Michael Eckmann.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Learning Chapter 18 and Parts of Chapter 20
Algorithms. Problems, Algorithms, Programs Problem - a well defined task. –Sort a list of numbers. –Find a particular item in a list. –Find a winning.
6-Slide Example: Gene Chip Data © Jude Shavlik 2006, David Page 2010 CS 760 – Machine Learning (UW-Madison)
Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.
Bayesian Networks. Male brain wiring Female brain wiring.
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm.
Today’s Topics Chapter 2 in One Slide Chapter 18: Machine Learning (ML) Creating an ML Dataset –“Fixed-length feature vectors” –Relational/graph-based.
Chapter 9 – Classification and Regression Trees
Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Today’s Topics Dealing with Noise Overfitting (the key issue in all of ML) A ‘Greedy’ Algorithm for Pruning D-Trees Generating IF-THEN Rules from D-Trees.
Machine Learning Queens College Lecture 2: Decision Trees.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Today’s Topics Read –For exam: Chapter 13 of textbook –Not on exam: Sections & Genetic Algorithms (GAs) –Mutation –Crossover –Fitness-proportional.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Today’s Topics Learning Decision Trees (Chapter 18) –We’ll use d-trees to introduce/motivate many general issues in ML (eg, overfitting reduction) “Forests”
Survey – extra credits (1.5pt)! Study investigating general patterns of college students’ understanding of astronomical topics There will be 3~4 surveys.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Today’s Topics HW1 Due 11:55pm Today (no later than next Tuesday) HW2 Out, Due in Two Weeks Next Week We’ll Discuss the Make-Up Midterm Be Sure to Check.
Today’s Topics 11/10/15CS Fall 2015 (Shavlik©), Lecture 22, Week 101 Support Vector Machines (SVMs) Three Key Ideas –Max Margins –Allowing Misclassified.
Exercises Decision Trees In decision tree learning, the information gain criterion helps us select the best attribute to split the data at every node.
Today’s Topics Graded HW1 in Moodle (Testbeds used for grading are linked to class home page) HW2 due (but can still use 5 late days) at 11:55pm tonight.
Machine Learning Decision Trees. E. Keogh, UC Riverside Decision Tree Classifier Ross Quinlan Antenna Length Abdomen Length.
Today’s Topics Read: Chapters 7, 8, and 9 on Logical Representation and Reasoning HW3 due at 11:55pm THURS (ditto for your Nannon Tourney Entry) Recipe.
Lecture Notes for Chapter 4 Introduction to Data Mining
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
© Jude Shavlik 2006, David Page 2007 CS 760 – Machine Learning (UW-Madison)Lecture #12, Slide 1 5-Slide Example: Gene Chip Data.
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Decision Trees an introduction.
Ananya Das Christman CS311 Fall 2016
Ensembles (Bagging, Boosting, and all that)
Artificial Intelligence
Ch9: Decision Trees 9.1 Introduction A decision tree:
CS Fall 2016 (© Jude Shavlik), Lecture 4
Data Science Algorithms: The Basic Methods
CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4
cs540 - Fall 2016 (Shavlik©), Lecture 18, Week 10
CS Fall 2016 (© Jude Shavlik), Lecture 3
CS Fall 2016 (Shavlik©), Lecture 2
Machine Learning in Practice Lecture 7
Learning Chapter 18 and Parts of Chapter 20
Data Mining CSCI 307, Spring 2019 Lecture 6
Ensembles (Bagging, Boosting, and all that)
Presentation transcript:

Today’s Topics Read Chapter 3 & Section 4.1 (Skim Section 3.6 and rest of Chapter 4), Sections 5.1, 5.2, 5.3, 5,7, 5.8, & 5.9 (skim rest of Chapter 5) of textbook Reviewing the Info Gain Calc from Last Week HW0 due 11:55pm, HW1 due in one week (two with late days) Fun reading: Information Gain Derived (and Generalized to k Output Categories) Handling Numeric and Hierarchical Features Advanced Topic: Regression Trees The Trouble with Too Many Possible Values What if Measuring Features is Costly? 9/22/15CS Fall 2015 (© Jude Shavlik), Lecture 5, Week 31

ID3 Info Gain Measure Justified (Ref. C4.5, J. R. Quinlan, Morgan Kaufmann, 1993, pp 21-22) Definition of Information Info conveyed by message M depends on its probability, i.e., info(M)  -log 2 [Prob(M)] (due to Claude Shannon) Note: last week we used infoNeeded() as a more informative name for info() The Supervised Learning Task Select example from a set S and announce it belongs to class C The probability of this occurring is approx f C the fraction of C ’s in S Hence info in this announcement is, by definition, -log 2 (f C ) 9/22/152

CS Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 Let there be K different classes in set S, namely C 1, C 2, …, C K What’s expected info from msg about class of an example in set S ? info(s) is the average number of bits of information (by looking at feature values) needed to classify member of set S ID3 Info Gain Measure (cont.) 9/22/153

9/15/15CS Fall 2015 (© Jude Shavlik), Lecture 5, Week 24

CS Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 Handling Hierarchical Features in ID3 Define a new feature for each level in hierarchy, e.g., Let ID3 choose the appropriate level of abstraction! Shape CircularPolygonal Shape1 = { Circular, Polygonal } Shape2 = { } 9/22/155

CS Fall 2015 (© Jude Shavlik), Lecture 5, Week 2 Handling Numeric Features in ID3 On the fly create binary features and choose best Step 1: Plot current examples (green=pos, red=neg) Step 2: Divide midway between every consecutive pair of points with different categories to create new binary features, eg feature new1  F<8 and feature new2  F<10 Step 3: Choose split with best info gain (compete with all other features) Value of Feature /15/156 Note: “On the fly” means in each recursive call to ID3

CS Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 Handling Numeric Features (cont.) Technical Note F<10 F< T TF F Cannot discard numeric feature after use in one portion of d-tree 9/22/157

Advanced Topic: Regression Trees (assume features are numerically valued) CS Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 Age > 25 Gender Output = 4 f f 5 – 2 f 9 Output = 7 f f f 8 + f 7 Output = 100 f 4 – 2 f 8 Yes M No F 9/22/158

We want to return real values at the leaves - For each feature, F, “split” as done in ID3 - Use residue remaining, say using Linear Least Squares (LLS), instead of info gain to score candidate splits Why not a weighted sum in total error? Commonly models at leaves are wgt’ed sums of features (y = mx + b) Some approaches just place constants at leaves CS Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 Advanced Topic: Scoring “Splits” for Regression (Real-Valued) Problems X Output LLS 9/22/159

CS Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 Unfortunate Characteristic Property of Using Info-Gain Measure FAVORS FEATURES WITH HIGH BRANCHING FACTORS (ie, many possible values) Extreme Case: At most one example per leaf and all Info(.,.) scores for leaves equals zero, so gets perfect score! But generalizes very poorly (ie, memorizes data) Student ID 9/22/1510 ……

CS Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 One Fix (used in HW0/HW1) Convert all features to binary eg, Color = { Red, Blue, Green } From one N-valued feature to N binary-valued features Color = Red? Color = Blue? Color = Green? Used in Neural Nets and SVMs D-tree readability probably less, but not necessarily 9/22/1511

CS Fall 2015 (© Jude Shavlik), Lecture 5, Week 3 Considering the Cost of Measuring a Feature Want trees with high accuracy and whose tests are inexpensive to compute –take temperature vs. do CAT scan Common Heuristic –InformationGain(F)² / Cost(F) –Used in medical domains as well as robot-sensing tasks 9/22/1512