Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information.

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Advertisements

Decision Tree Learning - ID3
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning III Decision Tree Induction
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
Classification Algorithms
Decision Tree Approach in Data Mining
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Decision Tree Learning
Machine Learning II Decision Tree Induction CSE 473.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.
Induction of Decision Trees
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Prof. Giancarlo Mauri Lezione 3 - Learning Decision.
Decision Trees Decision tree representation Top Down Construction
1 Interacting with Data Materials from a Course in Princeton University -- Hu Yan.
Ch 3. Decision Tree Learning
Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
ID3 and Decision tree by Tuan Nguyen May 2008.
Decision tree learning
By Wang Rui State Key Lab of CAD&CG
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning 1 Introduction
Artificial Intelligence 7. Decision trees
Machine Learning Decision Tree.
Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Decision-Tree Induction & Decision-Rule Induction
Decision Tree Learning
Data Mining-Knowledge Presentation—ID3 algorithm Prof. Sin-Min Lee Department of Computer Science.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29 and 30– Decision Tree Learning; ID3;Entropy.
CS690L Data Mining: Classification
Computing & Information Sciences Kansas State University Wednesday, 15 Nov 2006CIS 490 / 730: Artificial Intelligence Lecture 35 of 42 Wednesday, 15 November.
Classification Algorithms Decision trees Rule-based induction Neural networks Memory(Case) based reasoning Genetic algorithms Bayesian networks Basic Principle.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
1 Decision Trees ExampleSkyAirTempHumidityWindWaterForecastEnjoySport 1SunnyWarmNormalStrongWarmSameYes 2SunnyWarmHighStrongWarmSameYes 3RainyColdHighStrongWarmChangeNo.
Decision Tree Learning
Seminar on Machine Learning Rada Mihalcea Decision Trees Very short intro to Weka January 27, 2003.
Decision Trees Reading: Textbook, “Learning From Examples”, Section 3.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Iterative Dichotomiser 3 By Christopher Archibald.
Friday’s Deliverable As a GROUP, you need to bring 2N+1 copies of your “initial submission” –This paper should be a complete version of your paper – something.
Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05 Prof. Pushpak Bhattacharyya Fundamentals of Information.
1 By: Ashmi Banerjee (125186) Suman Datta ( ) CSE- 3rd year.
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
ICS320-Foundations of Adaptive and Learning Systems
Machine Learning Inductive Learning and Decision Trees
Università di Milano-Bicocca Laurea Magistrale in Informatica
Decision trees (concept learnig)
Machine Learning Lecture 2: Decision Tree Learning.
Decision trees (concept learnig)
Classification Algorithms
Lecture 3: Decision Tree Learning
Introduction to Machine Learning Algorithms in Bioinformatics: Part II
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees.
Decision Trees Decision tree representation ID3 learning algorithm
INTRODUCTION TO Machine Learning
Decision Tree.
Presentation transcript:

Training Examples

Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information is contained in the final answer. Scale: –1 = completely clueless – the answer to Boolean question with prior –0 bit = complete knowledge – the answer to Boolean question with prior –? = answer to Boolean question with prior –The concept of Entropy

Entropy S is a sample of training examples p + is the proportion of positive examples p - is the proportion of negative examples Entropy measures the impurity of S Entropy(S) = -p + log 2 p + - p - log 2 p -

Information Gain Gain(S,A): expected reduction in entropy due to sorting S on attribute A Gain(S,A)=Entropy(S) -  v  values(A) |S v |/|S| Entropy(S v )

Information Gain Gain(S,A): expected reduction in entropy due to sorting S on attribute A Gain(S,A)=Entropy(S) -  v  values(A) |S v |/|S| Entropy(S v )

Training Examples

Selecting the First Attribute Humidity HighNormal [3+, 4-][6+, 1-] S=[9+,5-] E=0.940 Gain(S,Humidity) =0.940-(7/14)*0.985 – (7/14)*0.592 =0.151 E=0.985 E=0.592 Wind WeakStrong [6+, 2-][3+, 3-] S=[9+,5-] E=0.940 E=0.811E=1.0 Gain(S,Wind) =0.940-(8/14)*0.811 – (6/14)*1.0 =0.048 Humidity provides greater info. gain than Wind, w.r.t target classification.

Selecting the First Attribute Outlook Sunny Rain [2+, 3-] [3+, 2-] S=[9+,5-] E=0.940 Gain(S,Outlook) =0.940-(5/14)* (4/14)*0.0 – (5/14)*0.971 =0.247 E=0.971 Over cast [4+, 0] E=0.0

Selecting the First Attribute The information gain values for the 4 attributes are: Gain(S,Outlook) =0.247 Gain(S,Humidity) =0.151 Gain(S,Wind) =0.048 Gain(S,Temperature) =0.029 where S denotes the collection of training examples

Selecting the Next Attribute Outlook SunnyOvercastRain Yes [D1,D2,…,D14] [9+,5-] S sunny =[D1,D2,D8,D9,D11] [2+,3-] ? ? [D3,D7,D12,D13] [4+,0-] [D4,D5,D6,D10,D14] [3+,2-] Gain(S sunny, Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = Gain(S sunny, Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = Gain(S sunny, Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019

ID3 Algorithm Outlook SunnyOvercastRain Humidity HighNormal Wind StrongWeak NoYes No [D3,D7,D12,D13] [D8,D9,D11] [D6,D14] [D1,D2] [D4,D5,D10]

Which attribute should we start with? ID#TextureTempSizeClassification 1SmoothColdLargeYes 2SmoothColdSmallNo 3SmoothCoolLargeYes 4SmoothCoolSmallYes 5SmoothHotSmallYes 6WavyColdMediumNo 7WavyHotLargeYes 8RoughColdLargeNo 9RoughCoolLargeYes 10RoughHotSmallNo 11RoughWarmMediumYes

Which node is the best? Texture (smooth,wavy,rough) 5/11 * ( -4/5*log4/5 – 1/5*log1/5) + 2/11 * (-1/2*log1/2 – ½ *log1/2) + 4/11 * (-2/4*log2/4 – 2/4*log2/4) = 5/11*(.722) + 2/11*1 + 4/11*1 =.874

Which node is the best? Temperature(cold,cool,hot,warm) 4/11* ( -1/4*log1/4 – 3/4*log3/4) + 3/11 * (-3/3*log3/3 – 0/3 *log0/3) + 3/11 * (-2/3*log2/3 – 1/3 *log1/3) + 1/11 * (-1/1*log1/1 – 0/1*log0/1) = 4/11*(.811) /11*(.918) + 0 =.545

Which node is the best? Size (large,medium,small) 5/11 * ( -4/5*log4/5 – 1/5*log1/5) + 2/11 * (-1/2*log1/2 – ½ *log1/2) + 4/11 * (-2/4*log2/4 – 2/4*log2/4) = 5/11*(.722) + 2/11*1 + 4/11*1 =.874

Learning over time How do you evolve knowledge over time when you learn little bit by little bit? –Abstract version – the “Frinkle”

The Question –How can we build this kind of representation over time? The Answer –Rely on the concepts of false positives and false negatives

The idea False Positive –An example which is predicted to be positive but whose known outcome is negative –The problem is that our hypothesis is too general. –The solution is to add another condition to our hypothesis. False Negative –An example which is predicted to be negative but whose known outcome is positive –The problem is that our hypothesis is too restrictive. –The solution is to remove a condition to our hypothesis [or to add disjunction]

Creating a model one “case” at a time ID#TextureTempSizeClassification 1SmoothColdLargeYes 2SmoothColdSmallNo 3SmoothCoolLargeYes 4SmoothCoolSmallYes 5SmoothHotSmallYes 6WavyColdMediumNo 7WavyHotLargeYes 8RoughColdLargeNo 9RoughCoolLargeYes 10RoughHotSmallNo 11RoughWarmMediumYes