Data Mining Schemes in Practice. 2 Implementation: Real machine learning schemes  Decision trees: from ID3 to C4.5  missing values, numeric attributes,

Slides:



Advertisements
Similar presentations
Machine Learning in Real World: C4.5
Advertisements

Decision Trees with Numeric Tests
Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
C4.5 - pruning decision trees
Classification Techniques: Decision Tree Learning
1 Input and Output Thanks: I. Witten and E. Frank.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Lecture outline Classification Decision-tree classification.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Classification with Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification: Decision Trees
Algorithms for Classification: Notes by Gregory Piatetsky.
Decision Trees an Introduction.
Classification with Decision Trees II
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Algorithms for Classification: The Basic Methods.
Decision Tree Learning
Chapter 7 Decision Tree.
Module 04: Algorithms Topic 07: Instance-Based Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Slides for “Data Mining” by I. H. Witten and E. Frank.
Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.
Chapter 9 – Classification and Regression Trees
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Algorithms: Decision Trees 2 Outline  Introduction: Data Mining and Classification  Classification  Decision trees  Splitting attribute  Information.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Algorithms for Classification: The Basic Methods.
Practical Issues of Classification Underfitting and Overfitting –Training errors –Generalization (test) errors Missing Values Costs of Classification.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
1 Classification with Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.1: Decision Trees Rodney Nielsen Many /
Decision Trees with Numeric Tests. Industrial-strength algorithms  For an algorithm to be useful in a wide range of real- world applications it must:
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Mining Practical Machine Learning Tools and Techniques.
Data Mining Practical Machine Learning Tools and Techniques
Chapter 6 Decision Tree.
Decision Trees with Numeric Tests
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 - pruning decision trees
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Ch9: Decision Trees 9.1 Introduction A decision tree:
Decision Tree Saed Sayad 9/21/2018.
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Machine Learning: Lecture 3
Classification with Decision Trees
Machine Learning in Practice Lecture 17
Data Mining(中国人民大学) Yang qiang(香港科技大学) Han jia wei(UIUC)
INTRODUCTION TO Machine Learning
Decision Trees Jeff Storey.
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Data Mining Schemes in Practice

2 Implementation: Real machine learning schemes  Decision trees: from ID3 to C4.5  missing values, numeric attributes, pruning efficiency  Instance-based learning  Speed up, combat noise, attribute weighting, generalized exemplars

3 Numeric attributes  Standard method: binary splits  E.g. temp < 45  Unlike nominal attributes, every attribute has many possible split points  Solution is a straightforward extension:  Evaluate info gain (or other measure) for every possible split point of attribute  Choose “best” split point  Info gain for best split point is info gain for attribute  Computationally more demanding

4 Weather data (again!) OutlookTemperatureHumidityWindyPlay SunnyHotHighFalseNo SunnyHotHighTrueNo OvercastHotHighFalseYes RainyMildNormalFalseYes …………… OutlookTemperatureHumidityWindyPlay Sunny85 FalseNo Sunny8090TrueNo Overcast8386FalseYes Rainy7580FalseYes ……………

5 Example  Split on temperature attribute: Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No  E.g.temperature  71.5: yes/4, no/2 temperature  71.5: yes/5, no/3  Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3]) = bits  Place split points halfway between values  Can evaluate all split points in one pass!

6 Avoid repeated sorting!  Sort instances by the values of the numeric attribute  Time complexity for sorting: O (n log n)  Does this have to be repeated at each node of the tree?  No! Sort order for children can be derived from sort order for parent  Time complexity of derivation: O (n)  Drawback: need to create and store an array of sorted indices for each numeric attribute

7 Binary vs multiway splits  Splitting (multi-way) on a nominal attribute exhausts all information in that attribute  Nominal attribute is tested (at most) once on any path in the tree  Not so for binary splits on numeric attributes!  Numeric attribute may be tested several times along a path in the tree  Disadvantage: tree is hard to read  Remedy:  pre-discretize numeric attributes, or  use multi-way splits instead of binary ones

8 Computing multi-way splits  Dynamic programming can find optimum multi-way split in O (n 2 ) time  imp (k, i, j ) is the impurity of the best split of values x i … x j into k sub-intervals  imp (k, 1, n ) = min 1  j <n imp (k–1, 1, j ) + imp (1, j+1, n )  imp (k, 1, n ) gives us the best k-way split

9 Recursion unfolding for imp(.): E.g. imp(4,1,10) (4,1,10) (3,1,7)(1,8,10) (3,1,8)(1,8,10)…… (2,1,3)(1,4,7)……(2,1,3)(1,4,8)…… … … E.g. we better remember the result for (2,1,3) in order to not repeat its computation. If we don’t remember the previous computations, then the complexity would be exponential in k.

10 Dyn. Prog. By Memoization imp (k, 1, n ) = min 1  j <n imp (k–1, 1, j ) + imp (1, j+1, n ) double imp(int k, int i, int n) { static double memoization[k+1][n+1][n+1]; if(memoization[k][i][n] != 0) return memoization[k][i][n]; if(i==j || k==1) return entropy(considering elements i..j) double min = +infinity; for(int j=1; j<n; j++) { double t = imp(k-1,1,j)+imp(1,j+1,n); if (min > t) min = t; } return min; }

11 Missing values  Split instances with missing values into pieces  A piece going down a branch receives a weight proportional to the popularity of the branch  weights sum to 1  Info gain works with fractional instances  use sums of weights instead of counts  During classification, split the instance into pieces in the same way  Merge probability distribution using weights

12 Missing value example Info(4/13+1, 3) = -(4/13+1)/(4/13+1+3)log[(4/13+1)/(4/13+1+3)] -3/(4/13+1+3)log[3/(4/13+1+3)] And so on… What about the classification of a new instance with missing values?

13 Pruning  Prevent overfitting to noise in the data  “Prune” the decision tree  Two strategies: Postpruning take a fully-grown decision tree and discard unreliable parts Prepruning stop growing a branch when information becomes unreliable  Postpruning preferred in practice— prepruning can “stop early”

14 Prepruning  Stop growing the tree when there is no significant association between any attribute and the class at a particular node  I.e. stop if there is no significant info gain.

15 Early stopping  Pre-pruning may stop the growth process prematurely: early stopping  Classic example: XOR/Parity-problem  No individual attribute exhibits any significant association to the class  Structure is only visible in fully expanded tree  Prepruning won’t expand the root node abclass

16 Postpruning  First, build full tree  Then, prune it  Fully-grown tree shows all attribute interactions  Two pruning operations:  Subtree replacement  Subtree raising  Possible strategies:  error estimation  significance testing  MDL principle

17 Example AttributeType123…40 Duration(Number of years)1232 Wage increase first yearPercentage2%4%4.3%4.5 Wage increase second yearPercentage?5%4.4%4.0 Wage increase third yearPercentage???? Cost of living adjustment{none,tcf,tc}nonetcf?none Working hours per week(Number of hours) Pension{none,ret-allw, empl-cntr}none??? Standby payPercentage?13%?? Shift-work supplementPercentage?5%4%4 Education allowance{yes,no}yes??? Statutory holidays(Number of days) Vacation{below-avg,avg,gen}avggen avg Long-term disability assistance{yes,no}no??yes Dental plan contribution{none,half,full}none?full Bereavement assistance{yes,no}no??yes Health plan contribution{none,half,full}none?fullhalf Acceptability of contract{good,bad}badgood

18 Subtree replacement  Bottom-up  Consider replacing a tree only after considering all its subtrees If estimated error doesn’t get bigger, replace subtree.

19 Subtree raising  Delete node  Redistribute instances  Slower than subtree replacement (Worthwhile?) If estimated error doesn’t get bigger, raise subtree.