Download presentation
Presentation is loading. Please wait.
Published byGeorgina Weaver Modified over 9 years ago
1
Information Theory Basics crandall@cs.unm.edu
2
What is information theory? A way to quantify information A lot of the theory comes from two worlds Channel coding Compression Useful for lots of other things Claude Shannon, mid- to late- 40's
3
Requirements “This data will compress to at most N bits” “This channel will allow us to transmit N bits per second” “This plaintext will require at least N bans of ciphertext” N is a number for the amount of information/uncertainty/entropy of a random variable X, that is, H(X) = N
4
??? What are the requirements for such a measure? E.g., Continuity: changing the probabilities a small amount should change the measure by only a small amount.
5
Maximum What distribution should be the maximum entropy? For equiprobable events, what should happen if we increase the number of outcomes.
6
Maximum
7
Symmetry The measure should be unchanged if the outcomes are re-ordered
8
Additivity Amount of entropy should be independent of how we divide the process into parts.
9
Entropy of Discrete RVs Expected value of the amount of information for an event
10
Flip a fair coin (-0.5 lg 0.5) + (-0.5 lg 0.5) = 1.0 Flip three fair coins?
11
Flip three fair coins (-0.5 lg 0.5) + (-0.5 lg 0.5) + (-0.5 lg 0.5) + (-0.5 lg 0.5) + (-0.5 lg 0.5) + (-0.5 lg 0.5) = 3.0 (-0.125 lg 0.125)+(-0.125 lg 0.125)+(-0.125 lg 0.125)+(-0.125 lg 0.125)+(-0.125 lg 0.125)+(- 0.125 lg 0.125)+(-0.125 lg 0.125)+(-0.125 lg 0.125) = 3.0
12
Flip biased coin A 60% heads
13
Biased coin A (-0.6 lg 0.6) + (-0.4 lg 0.4) = 0.970950594
14
Biased coin B 95% heads (-0.95 lg 0.95) + (-0.05 lg 0.05) = 0.286396957 Why is there less information in biased coins?
15
Information=uncertainty=entropy
16
Flip A, then flip B A: (-0.6 lg 0.6) + (-0.4 lg 0.4) = 0.970950594 B: (-0.95 lg 0.95) + (-0.05 lg 0.05) = 0.286396957 ((-0.6 lg 0.6) + (-0.4 lg 0.4)) + ((-0.95 lg 0.95) + (-0.05 lg 0.05)) = 0.970950594 + 0.286396957 = 1.25734755 (-(0.6*0.95)lg(0.6*0.95))+(- (0.6*0.05)lg(0.6*0.05))+(- (0.4*0.95)lg(0.4*0.95))+(-(0.4*0.05)lg(0.4*0.05)) = 1.25734755
17
Entropy (summary) Continuity, maximum, symmetry, additivity
18
Example: Maximum Entropy Wikipedia: “Maximum-likelihood estimators can lack asymptotic normality and can be inconsistent if there is a failure of one (or more) of the below regularity conditions... Estimate on boundary, Data boundary parameter- dependent, Nuisance parameters, Increasing information...” “Subject to known constraints, the probability distribution which best represents the current state of knowledge is the one with largest entropy.” What distribution maximizes entropy?
19
Beyond Entropy Flip fair coin for X if heads flip coin A for Y if tails flip coin B for Y H(X) = 1.0 H(Y) = (- (0.5*0.6+0.5*0.95)lg(0.5*0.6+0.5*0.95))+(- (0.5*0.4+0.5*0.05)lg(0.5*0.4+0.5*0.05)) = 0.769192829 Joint entropy H(X,Y) = ((-(0.5 * 0.6)) * lg(0.5 * 0.6)) + ((-(0.5 * 0.95)) * lg(0.5 * 0.95)) + ((-(0.5 * 0.4)) * lg(0.5 * 0.4)) + ((-(0.5 * 0.05)) * lg(0.5 * 0.05)) = 1.62867378 Where is the other 1.769192829 – 1.62867378 = 0.140519049 bits of information?
20
Mutual Information I(X;Y) = 0.140519049 I(X;Y) = H(X) + H(Y) – H(X,Y) What are H(X|Y) and H(Y|X)?
21
Example: sufficient statistics Students asked to flip a coin 100 times and record the result How to detect the cheaters?
22
Example: sufficient statistics f(x) is a family of probability mass functions indexed by θ, X is a sample from a distribution in this family. T(X) is a statistic Function of the sample, like sample mean, sample variance, … I(θ;T(X)) ≤ I(θ;X) Equality only if no information is lost
23
Kullback-Leibler divergence (a.k.a differential entropy) Process 1: Flip unbiased coin, if heads flip biased coin A (60% heads), if tails flip biased coin B (95% heads) Process 2: Roll a fair die. 1, 2, or 3 = (tails, heads). 4 = (heads, heads). 5 = (heads, tails). 6 = (tails, tails). Process 3: Flip two fair coins, just record the results. Which, out of 2 and 3, is a better approximate model of 1?
24
Kullback-Leibler divergence (a.k.a. Differential entropy) P is true distribution, Q is model Dkl(P1||P2) = 0.48080 Dkl(P1||P3) = 0.37133 Note that Dkl is not symmetric Dkl(P2||P1)=0.52898, Dkl(P3||P1)=0.61371
25
Conditional mutual information I(X;Y|Z) is the expected value of the mutual information between X and Y conditioned on Z
26
Interaction information I(X;Y;Z) is the information bound up in a set of variables beyond that which is present in any subset I(X;Y;Z) = I(X;Y|Z) – I(X;Y) = I(X;Z|Y) – I(X;Z) = I(Y;Z|X) - I(Y;Z) Negative interaction information: X is rain, Y is dark, Z is clouds Positive interaction information: X is fuel pump blocked, Y is battery dead, Z is car starts
27
Other fun things you should look into if you're interested... Writing on dirty paper Wire-tap channels Algorithmic complexity Chaitin's constant Goldbach's conjecture, Riemann hypothesis Portfolio theory
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.