ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X.

Slides:



Advertisements
Similar presentations
EE 4780 Huffman Coding Example. Bahadir K. Gunturk2 Huffman Coding Example Suppose X is a source producing symbols; the symbols comes from the alphabet.
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Applied Algorithmics - week7
Lecture 3: Source Coding Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Chapter 10 Shannon’s Theorem. Shannon’s Theorems First theorem:H(S) ≤ L n (S n )/n < H(S) + 1/n where L n is the length of a certain code. Second theorem:
Information Theory EE322 Al-Sanie.
An introduction to Data Compression
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Random Variable A random variable X is a function that assign a real number, X(ζ), to each outcome ζ in the sample space of a random experiment. Domain.
Chain Rules for Entropy
Data Compression.
Entropy and Shannon’s First Theorem
Image Compression, Transform Coding & the Haar Transform 4c8 – Dr. David Corrigan.
Chapter 6 Information Theory
Lecture04 Data Compression.
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Optimal Merging Of Runs
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Lossless data compression Lecture 1. Data Compression Lossless data compression: Store/Transmit big files using few bytes so that the original files.
Data Structures – LECTURE 10 Huffman coding
1 Chapter 5 A Measure of Information. 2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties.
CSE 326 Huffman coding Richard Anderson. Coding theory Conversion, Encryption, Compression Binary coding Variable length coding A B C D E F.
Variable-Length Codes: Huffman Codes
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
CSI Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability.
Noise, Information Theory, and Entropy
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
If we measured a distribution P, what is the tree- dependent distribution P t that best approximates P? Search Space: All possible trees Goal: From all.
Some basic concepts of Information Theory and Entropy
Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
Entropy and some applications in image processing Neucimar J. Leite Institute of Computing
Information and Coding Theory
STATISTIC & INFORMATION THEORY (CSNB134)
Dr.-Ing. Khaled Shawky Hassan
Huffman Coding Dr. Ying Lu RAIK 283 Data Structures & Algorithms.
Linawati Electrical Engineering Department Udayana University
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
 Rooted tree and binary tree  Theorem 5.19: A full binary tree with t leaves contains i=t-1 internal vertices.
5.5.3 Rooted tree and binary tree  Definition 25: A directed graph is a directed tree if the graph is a tree in the underlying undirected graph.  Definition.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Generalized Linear Models (GLMs) and Their Applications.
Basic Concepts of Encoding Codes and Error Correction 1.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Huffman encoding.
5.6 Prefix codes and optimal tree Definition 31: Codes with this property which the bit string for a letter never occurs as the first part of the bit string.
UNIT –V INFORMATION THEORY EC6402 : Communication TheoryIV Semester - ECE Prepared by: S.P.SIVAGNANA SUBRAMANIAN, Assistant Professor, Dept. of ECE, Sri.
Ch4. Zero-Error Data Compression Yuan Luo. Content  Ch4. Zero-Error Data Compression  4.1 The Entropy Bound  4.2 Prefix Codes  Definition and.
EE465: Introduction to Digital Image Processing
Learning Tree Structures
Introduction to Information theory
Ch9: Decision Trees 9.1 Introduction A decision tree:
Chapter 9: Huffman Codes
Analysis & Design of Algorithms (CSCE 321)
Optimal Merging Of Runs
Greedy Algorithms Alexandra Stefan.
CSE 326 Huffman coding Richard Anderson.
Information Theoretical Analysis of Digital Watermarking
Presentation transcript:

ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X (X = k) Uncertainty of Uncertainty = 0 Uncertainty → ∞

Entropy of X ≡ expected uncertainty of outcomes If log 2 is used units are bits, with ln, units are nats. By convention, if P(X= x) = 0 i.e., when P(X = x) = 1 -0 log (0) = 0

For a binary random variable X  { 0,1}, Let p ≡ P(X=1) H X is maximum when p=0.5 ↔ 0,1 are equally probable ↔ max uncertainty p = 1, p= 0  no uncertainty → H X = 0 Image:

In general, H X of n equally probable outcomes= n bits e.g., n-bit equiprobable numbers  n bits As each bit is specified, H X decreases by 1 bit. When all n-bits are specified, H X = 0 for 01for 11

Relative Entropy: If p = ( p 1, p 2,..., p K ) X ~ p q = ( q 1, q 2,..., q K ) Y ~ q are two pmf’s. H (p;q) ≡ relative entropy of q with respect to p K outcomes for both X and Y

H (p;q) is often used as a metric for probability distributions, is called the Kullback – Leibler Distance. To prove the assertions, use x log x 1 1

This is called maximum entropy (ME) or the minimum relative entropy (MRE) situation. Thus: Only one possible outcomeK equally probable outcomes

For a continuous random variable all entropies are maximally uncertain. entropy cannot be defined as for discrete random variables. Instead, differential entropy is used In fact, the integral extends only over the region where f X (x) > 0 Differential Entropy:

Information Theory Let X be a random variable with S X = { x 1, …., x k } Information about outcomes of X is to be sent over a channel. How can outcomes { x 1, …., x k } be coded so that all information is carried with maximal efficiency? XReceiver Channel Source Destination

Best code → minimum expected codeword length. Code must be instantaneously decodable, i.e. no codeword is a prefix for any other. → construct a code tree e.g. S = {x 1, x 2, x 3, x 4,x 5 } x 1 = 00 x 2 = 01 x 3 = 10 x 4 = 110 x 5 = 111 x1x1 x2x2 x3x3 x4x4 x5x

If l k = length of code for x k E (codeword length) For instantaneous binary codes Kraft Inequality For D-ary code

i.e. 1. minimum average codeword length = entropy of X 2.most efficient code is obtained when length(x k ) = - log p k i.e. word lengths are inversely proportional to their probabilities. 1  bits of information in X = entropy of X 2  a maximally efficient code can always be found when all p k are powers of 2 otherwise One such optimal code is the Huffman code constructed by a Huffman tree.

e.g. Let S X = { A, B, C, D, E } with pmf = { 0.1, 0.3, 0.25, 0.2, 0.15} At every step, combine nodes with minimal sum: 1) 2) 3) 4) A = 000 B = 01 C = 10 D = 11 E = A B B B B A A A C C C C D D D E E E E D

for any binary tree with each leaf a codeword Let l max = longest codeword  leaf at level l max (root = level 0) If all leaves are at level l max # leaves = If a leaf is at l k < l max it eliminates leaves from the full tree.

( Remember, each leaf is eliminated by exactly one codeword) A CD B

In General if the tree is complete, If not, it is < 1 e.g. A BC

Maximum Entropy Method Given random variable X, S X = { x 1, …., x k } unknown pmf p k = p(x k ) constraint E(g(x)) = r 1 Estimate p k Hypothesis : is the maximum entropy pmf. Proof: Suppose pmf q ≠ p satisfies 1

In general, given n constraints, E(g 1 (x)) = r 1 _____ 1-1. E(g n (x)) = r n _____ 1-n the ME pmf has the form