Download presentation
Presentation is loading. Please wait.
Published bySuzan Robbins Modified over 9 years ago
1
Training Examples
2
Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information is contained in the final answer. Scale: –1 = completely clueless – the answer to Boolean question with prior –0 bit = complete knowledge – the answer to Boolean question with prior –? = answer to Boolean question with prior –The concept of Entropy
3
Entropy S is a sample of training examples p + is the proportion of positive examples p - is the proportion of negative examples Entropy measures the impurity of S Entropy(S) = -p + log 2 p + - p - log 2 p -
4
Information Gain Gain(S,A): expected reduction in entropy due to sorting S on attribute A Gain(S,A)=Entropy(S) - v values(A) |S v |/|S| Entropy(S v )
5
Information Gain Gain(S,A): expected reduction in entropy due to sorting S on attribute A Gain(S,A)=Entropy(S) - v values(A) |S v |/|S| Entropy(S v )
6
Training Examples
7
Selecting the First Attribute Humidity HighNormal [3+, 4-][6+, 1-] S=[9+,5-] E=0.940 Gain(S,Humidity) =0.940-(7/14)*0.985 – (7/14)*0.592 =0.151 E=0.985 E=0.592 Wind WeakStrong [6+, 2-][3+, 3-] S=[9+,5-] E=0.940 E=0.811E=1.0 Gain(S,Wind) =0.940-(8/14)*0.811 – (6/14)*1.0 =0.048 Humidity provides greater info. gain than Wind, w.r.t target classification.
8
Selecting the First Attribute Outlook Sunny Rain [2+, 3-] [3+, 2-] S=[9+,5-] E=0.940 Gain(S,Outlook) =0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.971 =0.247 E=0.971 Over cast [4+, 0] E=0.0
9
Selecting the First Attribute The information gain values for the 4 attributes are: Gain(S,Outlook) =0.247 Gain(S,Humidity) =0.151 Gain(S,Wind) =0.048 Gain(S,Temperature) =0.029 where S denotes the collection of training examples
10
Selecting the Next Attribute Outlook SunnyOvercastRain Yes [D1,D2,…,D14] [9+,5-] S sunny =[D1,D2,D8,D9,D11] [2+,3-] ? ? [D3,D7,D12,D13] [4+,0-] [D4,D5,D6,D10,D14] [3+,2-] Gain(S sunny, Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970 Gain(S sunny, Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570 Gain(S sunny, Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
11
ID3 Algorithm Outlook SunnyOvercastRain Humidity HighNormal Wind StrongWeak NoYes No [D3,D7,D12,D13] [D8,D9,D11] [D6,D14] [D1,D2] [D4,D5,D10]
12
Which attribute should we start with? ID#TextureTempSizeClassification 1SmoothColdLargeYes 2SmoothColdSmallNo 3SmoothCoolLargeYes 4SmoothCoolSmallYes 5SmoothHotSmallYes 6WavyColdMediumNo 7WavyHotLargeYes 8RoughColdLargeNo 9RoughCoolLargeYes 10RoughHotSmallNo 11RoughWarmMediumYes
13
Which node is the best? Texture (smooth,wavy,rough) 5/11 * ( -4/5*log4/5 – 1/5*log1/5) + 2/11 * (-1/2*log1/2 – ½ *log1/2) + 4/11 * (-2/4*log2/4 – 2/4*log2/4) = 5/11*(.722) + 2/11*1 + 4/11*1 =.874
14
Which node is the best? Temperature(cold,cool,hot,warm) 4/11* ( -1/4*log1/4 – 3/4*log3/4) + 3/11 * (-3/3*log3/3 – 0/3 *log0/3) + 3/11 * (-2/3*log2/3 – 1/3 *log1/3) + 1/11 * (-1/1*log1/1 – 0/1*log0/1) = 4/11*(.811) + 0 + 3/11*(.918) + 0 =.545
15
Which node is the best? Size (large,medium,small) 5/11 * ( -4/5*log4/5 – 1/5*log1/5) + 2/11 * (-1/2*log1/2 – ½ *log1/2) + 4/11 * (-2/4*log2/4 – 2/4*log2/4) = 5/11*(.722) + 2/11*1 + 4/11*1 =.874
17
Learning over time How do you evolve knowledge over time when you learn little bit by little bit? –Abstract version – the “Frinkle”
18
The Question –How can we build this kind of representation over time? The Answer –Rely on the concepts of false positives and false negatives
19
The idea False Positive –An example which is predicted to be positive but whose known outcome is negative –The problem is that our hypothesis is too general. –The solution is to add another condition to our hypothesis. False Negative –An example which is predicted to be negative but whose known outcome is positive –The problem is that our hypothesis is too restrictive. –The solution is to remove a condition to our hypothesis [or to add disjunction]
20
Creating a model one “case” at a time ID#TextureTempSizeClassification 1SmoothColdLargeYes 2SmoothColdSmallNo 3SmoothCoolLargeYes 4SmoothCoolSmallYes 5SmoothHotSmallYes 6WavyColdMediumNo 7WavyHotLargeYes 8RoughColdLargeNo 9RoughCoolLargeYes 10RoughHotSmallNo 11RoughWarmMediumYes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.