1 Chapter 5 A Measure of Information
2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties of the uncertainty function 5.4 Entropy and Coding 5.5 Shannon-Fano Coding
3 5.1 Axioms for the uncertainty measure x : discrete random variable x 1 x 2 ... x M p 1 p 2 ... p M h(p): the uncertainty of an event with probability p h(p i ): the uncertainty of { x = x i } The average uncertainty of x: If p 1 = p 2 = ... = p M =, we say
4 Axiom 1: f(M) should be a monotonically increasing function of M, that is, M < M ’ implies f(M) < f(M ’) For example, f(2) < f(6) Axiom 2: X: (x 1,..., x M ) Y: (y 1,..., y L ) (X,Y): Joint experiment has M . L equally likely outcome. f(M . L) = f(M) + f(L) independent
5 Axiom 3 (Group Axiom): X = (x 1, x 2,..., x r, x r+1,..., x M ) Construct a compound experiment X A B XrXr X1X1 X r+1 XMXM
6 AB
7 Axiom 5: H(p,1-p) is a continuous function of p, i.e., a small change in p will correspond to a small change in uncertainty. We can use four axioms above to find the H function. Thm 5.1: The only function satisfying the four given axioms is H(p 1,..., P M )=, where C > 0 and the logarithm base > 1
8 For example, C = 1, and base = 2 H(p,1-p) 0 1 ½ 1 Coin : { tail, head } ½ ▪ 1 0 Max. uncertainty Min. uncertainty
9 5.2 Two Interpretations of the uncertainty function (1) H(p 1,..., p M ) may be interpreted as the expectation of a random variable W = w(x)
10 (2) H(p 1,..., p M ) may be interpreted as the min average number of ‘yes’ ‘no’ questions required to specify the values of x For example, H(x) = H( 0.3, 0.2, 0.2, 0.15, 0.15 ) = 2.27 Does x=x 1 or x 2 ? x=x1?x=x1? x=x3?x=x3? x1x1 x2x2 x3x3 x=x4x=x4 x4x4 x5x5 Y Y Y Y N N N N x1x1 x2x2 x3x3 x4x4 x5x5
11 # of questionProbability x1x x2x x3x3 2 x4x x5x5 3 Avg # of q = 2· ·0.3 = 2.3 > 2.27 H.W. : X = { x 1, x 2 } p(x 1 ) = 0.7 p(x 2 ) = 0.3 How many questions (in average) are required to specify the outcome of a joint experiment involving 2 independent observation of x?
Properties of the uncertainty function Lemma 5.2 Let p 1,..., p M & q1,..., q M be arbitrary positive number with Then y x y = x -1 y = ln x ln x ≤ x -1
13
14 Thm 5.3 H(p 1,..., p M ) ≤ log M with equality iff p i =
Entropy and Coding Noiseless Coding Theorem X : x 1 x 2 · · · · x M p 1 p 2 · · · · p M Codeword: w 1 w 2 · · · · w M length: n 1 n 2 · · · · n M Minimize: Code Alphabet: { a 1, a 2, …, a D } Ex. D = 2, { 0, 1 }
16 Thm (Noiseless Coding Thm) –If is the average codeword length of a uniquely decodable code for X, then with equality iff, for i = 1, 2, …, M. Note: – is the uncertainty of X computed by using the base D.
17 pf:
18 A code is called “absolutely optimal” if it achieves the lower bound by the noiseless coding thm. Ex. XProb.codewords x11/20 x21/410 x31/8110 X41/8111 H(x) = 7/4 =
Shannon-Fano Coding Select the integer n i s.t. => An instantaneous code can be constructed with the lengths n 1, n 2, …, n M obtained from Shannon-Fano coding.
20 Thm: Given a random variable X with uncertainty
21 In fact, we can always approach the lower bound as closely as desired if we are allowed to use “block coding”. Take a series of observation of X Let Y = (x 1, x 2, …, x s ) Assign a codeword to Y => Block coding decrease the average codeword length per value of X
22 Ex. XPiPi codeword x1x x2x But H(X) = H(p), p = 0.3 or p = 0.7 look up table Y=(x 1, x 2 )PiPi Codeword x1 x1x1 x x1 x2x1 x x2 x1x2 x x2 x2x2 x
23
24 How do we find the actual code symbols? –We simply assign them in order. –By S-F coding: –We then assign
25 How bad is Shannon-Fano Coding?