Presentation is loading. Please wait.

Presentation is loading. Please wait.

02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay1 Basics Of Entropy CS 621 Artificial Intelligence Lecture 13 - 02/09/05 Prof. Pushpak Bhattacharyya.

Similar presentations


Presentation on theme: "02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay1 Basics Of Entropy CS 621 Artificial Intelligence Lecture 13 - 02/09/05 Prof. Pushpak Bhattacharyya."— Presentation transcript:

1 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay1 Basics Of Entropy CS 621 Artificial Intelligence Lecture 13 - 02/09/05 Prof. Pushpak Bhattacharyya

2 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay2 Entropy Measure of uncertainty/structure in the data used in classification.

3 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay3 Tabular data A 1 A 2 A 3 A 4 ……… D V 11 V 21 V 31 ….……… Y V 21 V 22 V 32 ….……… N Concentrate on an attribute. Partition on the value of attribute.

4 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay4 Information theory through Entropy paves way for classification. E(S) = - P + log P + - P - log P - P + = Proportion of +ve examples ( D = Y ) P - = Proportion of –ve examples ( D = N )

5 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay5 Focus on attribute A i Gain( S, A i ) = E(S) = Σ E(A i )(|s v | / |s|) v v belongs to Values of A i Decision tree is constructed from GAIN using ID 3 algorithm.

6 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay6 Entropy Define the terminology. S = {s 1, s 2 … s q } Symbols from source P(S i ) = Emission problem of s i = P i

7 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay7 Notion of “Information” from S If P(s i ) = P i = 1 There is no information conveyed. “NO SURPRISE” Convention: P i = 0, I = ∞

8 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay8 I is the “Information” function 1)I(S i ) = 0 If P(s i ) = P i = 1 2)I(s i s j ) = I(s i ) + I(s j ) wanted Assuming emission of s i and s j are independent events P(s i s j ) = P(s i ) * P( s j )

9 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay9 Form of I is I(s i ) = log 1/ (P i = P(s i )) I(s i ) = infinity for P i = 0 Information Amount of surprise. Length of code to send the message.

10 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay10 Average information = Expected value of Information. = E(I(s i )) = ΣP i log(1/P i ) Entropy of S.

11 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay11 Properties of Entropy Minimum value of Entropy: If any of P i = 1 => P j = 0 for all j ≠ i E(S) = 0  Minimum value

12 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay12 Maximum value By convention E(S) = 0 when P i = 0 for all i. LimitP i log 1/P i = 0 P i  0

13 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay13 Example S = {P 1, P 2 } P 1 + P 2 = 1 Tossing of coin E(s) = - [ P 1 log P 1 + P 2 log P 2 ] where P 1 = 0.5 = P 2 E(s) = 1.0

14 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay14 If all events are equally likely then entropy is max. We expect that S = {s 1, s 2 … s q } E(S) will be max when each P i = 1/q.

15 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay15 Theorem E(S) = maximum when each P i = 1/q S = {s 1, s 2 … s q } Lemma : ln(x) = log e x <= x – 1 consider f(x) = x -1 – ln x f(1) = 0

16 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay16 df(x)/dx = 1 – 1/x equating to 0 x = 1 f(x) had extremum at x = 1 d 2 f(x)/dx 2 = 1/x 2 > 0 for x > 0 x = 1 is a minima

17 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay17 f(x) had minimum at x =1 f(x) = x -1 – ln x ln x = 1

18 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay18 Corollary Let, Σ x i = 1 i = 1 to m Σ y i = 1 and x i >= 0, y i > 0 i = 1 to m x i and y i are probability distributions. Σ x i ln 1/ x i <= Σ x i ln 1/ y i i = 1 to m i = 1 to m

19 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay19 Proof Σ x i ln 1/ x i - Σ x i ln 1/ y i i = 1 to m = Σ x i ln y i / x i i = 1 to m <= Σ x i (y i / x i - 1) i = 1 to m = Σ y i - Σ x i = 0 i = 1 to m i = 1 to m

20 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay20 This proves that Σ x i ln 1/ x i <= Σ x i ln 1/ y i i = 1 to m i = 1 to m Proof of theorem follows from this corollary.

21 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay21 S = {s 1, s 2 … s q } P(s i ) =P i, i = 1… q Choose x i = P i, y i = 1/q Σ x i ln 1/x i <= Σ x i ln1/y i i = 1 to q i = 1 to n Σ P i ln 1/P i <= Σ x i ln q i = 1 to q i = 1 to n

22 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay22 So, E(S) <= Σ P i ln q i = 1 to m = ln q. Σ P i = ln q E(S) is upper bounded by Σ P i ln q = ln q i = 1 to m Which value is reached when each P i = q. This establishes the maximum value for Entropy.

23 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay23 E(s) is defined as Σ P i log r 1/ P i i = 1 to q log r x to ln x transformation is a constant multiplying factor log r x = log r e * log e x

24 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay24 Summary Review Established the intuition for information function I(s i ). Related to the ‘surprise’. Average information is called entropy. Minimum value of E = 0. Maximum of E ? – Lemma ln x <= x – 1 – Corollary Σ x i ln 1/ x i <= Σ x i ln 1/ y i Σ x i = Σ y i = 1 Max E is ln q * k and is reached where p i = 1/q for each i

25 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay25 Shannon asked what is the “entropy of English language” ? S = {a,b,c,…. ‘,’,‘:’, ….} P(a) = Relative frequency of ‘a’ from large corpus. P(b) = … This gives P i s E(English) = Σ P i log1/P i = 4.08

26 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay26 Max entropy for tossing of coin: = 1.0 when the coin is unbiased. Interest of the reader/listener Novel is more interesting than a scientific paper for some people.

27 02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay27 Summary Application of Noisy Channel to ASR. Formulated as Bayesian Decision Making. Studied phonetic problems. Why probability model is needed?


Download ppt "02.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay1 Basics Of Entropy CS 621 Artificial Intelligence Lecture 13 - 02/09/05 Prof. Pushpak Bhattacharyya."

Similar presentations


Ads by Google