1 Chapter 5 A Measure of Information. 2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties.

Slides:



Advertisements
Similar presentations
Lecture 3: Source Coding Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Advertisements

Sampling and Pulse Code Modulation
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Information Theory EE322 Al-Sanie.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Entropy in the Quantum World Panagiotis Aleiferis EECS 598, Fall 2001.
Chain Rules for Entropy
Data Compression.
Entropy and Shannon’s First Theorem
Chapter 6 Information Theory
Chapter 5 Basic Probability Distributions
Ref. Cryptography: theory and practice Douglas R. Stinson
Chapter 6 Continuous Random Variables and Probability Distributions
Information Theory and Security. Lecture Motivation Up to this point we have seen: –Classical Crypto –Symmetric Crypto –Asymmetric Crypto These systems.
Lossless data compression Lecture 1. Data Compression Lossless data compression: Store/Transmit big files using few bytes so that the original files.
Shannon ’ s theory part II Ref. Cryptography: theory and practice Douglas R. Stinson.
Probability Mass Function Expectation 郭俊利 2009/03/16
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Variable-Length Codes: Huffman Codes
2015/7/12VLC 2008 PART 1 Introduction on Video Coding StandardsVLC 2008 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
CSI Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability.
Information Theory and Security
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Chapter6 Jointly Distributed Random Variables
Chapter 4 Continuous Random Variables and Probability Distributions
Some basic concepts of Information Theory and Entropy
Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
Information and Coding Theory
§1 Entropy and mutual information
STATISTIC & INFORMATION THEORY (CSNB134)
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
Dr.-Ing. Khaled Shawky Hassan
Basics of Data Compression Paolo Ferragina Dipartimento di Informatica Università di Pisa.
Mathematical Preliminaries. 37 Matrix Theory Vectors nth element of vector u : u(n) Matrix mth row and nth column of A : a(m,n) column vector.
1 Two-point Sampling. 2 X,Y: discrete random variables defined over the same probability sample space. p(x,y)=Pr[{X=x}  {Y=y}]: the joint density function.
Chapter 5.1 Probability Distributions.  A variable is defined as a characteristic or attribute that can assume different values.  Recall that a variable.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
Channel Capacity.
Market Design and Analysis Lecture 5 Lecturer: Ning Chen ( 陈宁 )
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
1 Information Theory Nathanael Paul Oct. 09, 2002.
Probability Distributions, Discrete Random Variables
Analysis of Experimental Data; Introduction
Presented by Minkoo Seo March, 2006
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Chapter 6 Large Random Samples Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
Huffman Coding (2 nd Method). Huffman coding (2 nd Method)  The Huffman code is a source code. Here word length of the code word approaches the fundamental.
Compression for Fixed-Width Memories Ori Rottenstriech, Amit Berman, Yuval Cassuto and Isaac Keslassy Technion, Israel.
SEAC-3 J.Teuhola Information-Theoretic Foundations Founder: Claude Shannon, 1940’s Gives bounds for:  Ultimate data compression  Ultimate transmission.
Probability Distribution. Probability Distributions: Overview To understand probability distributions, it is important to understand variables and random.
Fractiles Given a probability distribution F(x) and a number p, we define a p-fractile x p by the following formulas.
ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
UNIT –V INFORMATION THEORY EC6402 : Communication TheoryIV Semester - ECE Prepared by: S.P.SIVAGNANA SUBRAMANIAN, Assistant Professor, Dept. of ECE, Sri.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
Information Theory Information Suppose that we have the source alphabet of q symbols s 1, s 2,.., s q, each with its probability p(s i )=p i. How much.
Ch4. Zero-Error Data Compression Yuan Luo. Content  Ch4. Zero-Error Data Compression  4.1 The Entropy Bound  4.2 Prefix Codes  Definition and.
EE465: Introduction to Digital Image Processing
Review of Probability and Estimators Arun Das, Jason Rebello
Digital Multimedia Coding
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
6.2/6.3 Probability Distributions and Distribution Mean
Chapter 5. The Duality Theorem
Lecture 11 The Noiseless Coding Theorem (Section 3.4)
Presentation transcript:

1 Chapter 5 A Measure of Information

2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties of the uncertainty function 5.4 Entropy and Coding 5.5 Shannon-Fano Coding

3 5.1 Axioms for the uncertainty measure x : discrete random variable x 1 x 2 ... x M p 1 p 2 ... p M h(p): the uncertainty of an event with probability p h(p i ): the uncertainty of { x = x i } The average uncertainty of x: If p 1 = p 2 = ... = p M =, we say

4 Axiom 1: f(M) should be a monotonically increasing function of M, that is, M < M ’ implies f(M) < f(M ’) For example, f(2) < f(6) Axiom 2: X: (x 1,..., x M ) Y: (y 1,..., y L ) (X,Y): Joint experiment has M . L equally likely outcome. f(M . L) = f(M) + f(L) independent

5 Axiom 3 (Group Axiom): X = (x 1, x 2,..., x r, x r+1,..., x M ) Construct a compound experiment X A B XrXr X1X1 X r+1 XMXM

6 AB

7 Axiom 5: H(p,1-p) is a continuous function of p, i.e., a small change in p will correspond to a small change in uncertainty. We can use four axioms above to find the H function. Thm 5.1: The only function satisfying the four given axioms is H(p 1,..., P M )=, where C > 0 and the logarithm base > 1

8 For example, C = 1, and base = 2 H(p,1-p) 0 1 ½ 1 Coin : { tail, head } ½ ▪ 1 0 Max. uncertainty Min. uncertainty

9 5.2 Two Interpretations of the uncertainty function (1) H(p 1,..., p M ) may be interpreted as the expectation of a random variable W = w(x)

10 (2) H(p 1,..., p M ) may be interpreted as the min average number of ‘yes’ ‘no’ questions required to specify the values of x For example, H(x) = H( 0.3, 0.2, 0.2, 0.15, 0.15 ) = 2.27 Does x=x 1 or x 2 ? x=x1?x=x1? x=x3?x=x3? x1x1 x2x2 x3x3 x=x4x=x4 x4x4 x5x5 Y Y Y Y N N N N x1x1 x2x2 x3x3 x4x4 x5x5

11 # of questionProbability x1x x2x x3x3 2 x4x x5x5 3 Avg # of q = 2· ·0.3 = 2.3 > 2.27 H.W. : X = { x 1, x 2 } p(x 1 ) = 0.7 p(x 2 ) = 0.3 How many questions (in average) are required to specify the outcome of a joint experiment involving 2 independent observation of x?

Properties of the uncertainty function Lemma 5.2 Let p 1,..., p M & q1,..., q M be arbitrary positive number with Then y x y = x -1 y = ln x ln x ≤ x -1

13

14 Thm 5.3 H(p 1,..., p M ) ≤ log M with equality iff p i =

Entropy and Coding Noiseless Coding Theorem X : x 1 x 2 · · · · x M p 1 p 2 · · · · p M Codeword: w 1 w 2 · · · · w M length: n 1 n 2 · · · · n M Minimize: Code Alphabet: { a 1, a 2, …, a D } Ex. D = 2, { 0, 1 }

16 Thm (Noiseless Coding Thm) –If is the average codeword length of a uniquely decodable code for X, then with equality iff, for i = 1, 2, …, M. Note: – is the uncertainty of X computed by using the base D.

17 pf:

18 A code is called “absolutely optimal” if it achieves the lower bound by the noiseless coding thm. Ex. XProb.codewords x11/20 x21/410 x31/8110 X41/8111 H(x) = 7/4 =

Shannon-Fano Coding Select the integer n i s.t. => An instantaneous code can be constructed with the lengths n 1, n 2, …, n M obtained from Shannon-Fano coding.

20 Thm: Given a random variable X with uncertainty

21 In fact, we can always approach the lower bound as closely as desired if we are allowed to use “block coding”. Take a series of observation of X Let Y = (x 1, x 2, …, x s ) Assign a codeword to Y => Block coding decrease the average codeword length per value of X

22 Ex. XPiPi codeword x1x x2x But H(X) = H(p), p = 0.3 or p = 0.7 look up table Y=(x 1, x 2 )PiPi Codeword x1 x1x1 x x1 x2x1 x x2 x1x2 x x2 x2x2 x

23

24 How do we find the actual code symbols? –We simply assign them in order. –By S-F coding: –We then assign

25 How bad is Shannon-Fano Coding?