Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning
“Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.” –Herbert Simon “Learning is constructing or modifying representations of what is being experienced.” –Ryszard Michalski “Learning is making useful changes in our minds.” –Marvin Minsky Machine Learning
Decision Tree Hunt and colleagues use exhaustive search decision-tree methods (CLS) to model human concept learning in the 1960’s. In the late 70’s, Quinlan developed ID3 with the information gain heuristic to learn expert systems from examples. Quinlan’s updated decision-tree package (C4.5) released in Machine Learning
Classification: predict a categorical output from categorical and/or real inputs Decision trees are most popular data mining tool Easy to understand Easy to implement Easy to use Computationally cheap Machine Learning
Extremely popular method –Credit risk assessment –Medical diagnosis –Market analysis –Bioinformatics –Chemistry … Machine Learning
Internal decision nodes –Univariate: Uses a single attribute, x i –Multivariate: Uses all attributes, x Leaves –Classification: Class labels, or proportions –Regression: Numeric; r average, or local fit Learning is greedy; find the best split recursively Machine Learning
Occam’s razor: (year 1320) –Prefer the simplest hypothesis that fits the data. –The principle states that the explanation of any phenomenon should make as few assumptions as possible, eliminating those that make no difference in the observable predictions of the explanatory hypothesis or theory.phenomenonhypothesistheory Albert Einstein: Make everything as simple as possible, but not simpler. Why? –It’s a philosophical problem. –Simple explanation/classifiers are more robust –Simple classifiers are more understandable Machine Learning
Objective: Shorter trees are preferred over larger Trees Idea: want attributes that classifies examples well. The best attribute is selected. Select attribute which partitions the learning set into subsets as “pure” as possible. Machine Learning
Each branch corresponds to attribute value Each internal node has a splitting predicate Each leaf node assigns a classification Machine Learning
Entropy (disorder, impurity) of a set of examples, S, relative to a binary classification is: where p 1 is the fraction of positive examples in S and p 0 is the fraction of negatives. Machine Learning
If all examples are in one category, entropy is zero (we define 0 log(0)=0) If examples are equally mixed (p 1 =p 0 =0.5), entropy is a maximum of 1. Entropy can be viewed as the number of bits required on average to encode the class of an example in S where data compression (e.g. Huffman coding) is used to give shorter codes to more likely cases. For multi-class problems with c categories, entropy generalizes to: Machine Learning
Thank you!