Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)

Slides:



Advertisements
Similar presentations
DCSP-7: Information Jianfeng Feng Department of Computer Science Warwick Univ., UK
Advertisements

Convolutional Codes Representation and Encoding  Many known codes can be modified by an extra code symbol or by deleting a symbol * Can create codes of.
CY2G2 Information Theory 1
Entropy and Information Theory
Lecture 3: Source Coding Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Sampling and Pulse Code Modulation
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 14 From Randomness to Probability.
Information Theory EE322 Al-Sanie.
Michael A. Nielsen University of Queensland Quantum entropy Goals: 1.To define entropy, both classical and quantum. 2.To explain data compression, and.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Entropy and Shannon’s First Theorem
Day 2 Information theory ( 信息論 ) Civil engineering ( 土木工程 ) Cultural exchange.
Chapter 6 Information Theory
ENGS Lecture 8 ENGS 4 - Lecture 8 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
UCB Claude Shannon – In Memoriam Jean Walrand U.C. Berkeley
Fundamental limits in Information Theory Chapter 10 :
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
Information Theory and Security. Lecture Motivation Up to this point we have seen: –Classical Crypto –Symmetric Crypto –Asymmetric Crypto These systems.
Lossless data compression Lecture 1. Data Compression Lossless data compression: Store/Transmit big files using few bytes so that the original files.
2/28/03 1 The Virtues of Redundancy An Introduction to Error-Correcting Codes Paul H. Siegel Director, CMRR University of California, San Diego The Virtues.
1 Chapter 1 Introduction. 2 Outline 1.1 A Very Abstract Summary 1.2 History 1.3 Model of the Signaling System 1.4 Information Source 1.5 Encoding a Source.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Copyright © Cengage Learning. All rights reserved.
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Information Theory and Security
Noise, Information Theory, and Entropy
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 14 From Randomness to Probability.
SIMS-201 Representing Information in Binary. 2  Overview Chapter 3: The search for an appropriate code Bits as building blocks of information Binary.
Noise, Information Theory, and Entropy
§1 Entropy and mutual information
STATISTIC & INFORMATION THEORY (CSNB134)
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
Institute for Experimental Mathematics Ellernstrasse Essen - Germany Data communication line codes and constrained sequences A.J. Han Vinck Revised.
Host and Application Security Lesson 3: What is Information?
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 13, Slide 1 Chapter 13 From Randomness to Probability.
Channel Capacity.
Prepared by: Amit Degada Teaching Assistant, ECED, NIT Surat
DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK
Summer 2004CS 4953 The Hidden Art of Steganography A Brief Introduction to Information Theory  Information theory is a branch of science that deals with.
Digital Communications I: Modulation and Coding Course Term Catharina Logothetis Lecture 12.
Computer Vision – Compression(1) Hanyang University Jong-Il Park.
Information Theory The Work of Claude Shannon ( ) and others.
DIGITAL COMMUNICATIONS Linear Block Codes
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Coding Theory Efficient and Reliable Transfer of Information
CHAPTER 5 SIGNAL SPACE ANALYSIS
Source Coding Efficient Data Representation A.J. Han Vinck.
Prepared by: Engr. Jo-Ann C. Viñas 1 MODULE 2 ENTROPY.
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
1 Channel Coding (III) Channel Decoding. ECED of 15 Topics today u Viterbi decoding –trellis diagram –surviving path –ending the decoding u Soft.
Timo O. Korhonen, HUT Communication Laboratory 1 Convolutional encoding u Convolutional codes are applied in applications that require good performance.
Digital Image Processing Lecture 22: Image Compression
Entropy (YAC- Ch. 6)  Introduce the thermodynamic property called Entropy (S)  Entropy is defined using the Clausius inequality  Introduce the Increase.
1 CSCD 433 Network Programming Fall 2013 Lecture 5a Digital Line Coding and other...
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Mutual Information, Joint Entropy & Conditional Entropy
Huffman Coding (2 nd Method). Huffman coding (2 nd Method)  The Huffman code is a source code. Here word length of the code word approaches the fundamental.
SEAC-3 J.Teuhola Information-Theoretic Foundations Founder: Claude Shannon, 1940’s Gives bounds for:  Ultimate data compression  Ultimate transmission.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
1 CSCD 433 Network Programming Fall 2016 Lecture 4 Digital Line Coding and other...
KHARKOV NATIONAL MEDICAL UNIVERSITY MEDICAL INFORMATICS МЕДИЧНА ІНФОРМАТИКА MEDICAL INFORMATICS.
Digital Image Processing Lecture 20: Image Compression May 16, 2005
Information Theory Michael J. Watts
A Brief Introduction to Information Theory
Lecture 2: Basic Information Theory
Presentation transcript:

Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)

Today 1.What is information theory about? 2.Stochastic (information) sources. 3.Information and entropy. 4.Entropy for stochastic sources. 5.The source coding theorem.

Part 1: Information Theory Claude Shannon Claude Shannon: A Mathematical Theory of Communication The Bell System Technical Journal, 1948 Be careful! Sometimes referred to as Shannon-Weaver, since the standalone publication has a foreword by Weaver. Be careful!

Quotes about Shannon What is information? Sidestepping questions about meaning, Shannon showed that it is a measurable commodity. What is information? Sidestepping questions about meaning, Shannon showed that it is a measurable commodity. Today, Shannons insight help shape virtually all systems that store, process, or transmit information in digital form, from compact discs to computers, from facsimile machines to deep space probes. Today, Shannons insight help shape virtually all systems that store, process, or transmit information in digital form, from compact discs to computers, from facsimile machines to deep space probes. Information theory has also infilitrated fields outside communications, including linguistics, psychology, economics, biology, even the arts. Information theory has also infilitrated fields outside communications, including linguistics, psychology, economics, biology, even the arts.

Change to an efficient representation, i.e., data compression. Source Channel coder Source coder Channel Source decoder Sink, receiver Channel decoder Channel Any source of informationChange to an efficient representation for, transmission, i.e., error control coding. Recover from channel distortion.Uncompress The channel is anything transmitting or storing information – a radio link, a cable, a disk, a CD, a piece of paper, …

Fundamental Entities Source Channel coder Source coder Channel Source decoder Sink, receiver Channel decoder Channel H H: The information content of the source. H R R: Rate from the source coder. R C C C C: Channel capacity.

Shannon 2 Shannon 2: Source coding and channel coding can be optimized independently, and binary symbols can be used as intermediate format. Assumption: Arbitrarily long delays. Fundamental Theorems Source Channel coder Source coder Channel Source decoder Sink, receiver Channel decoder ChannelHR C C Shannon 1 Shannon 1: Error-free transmission possible if R ¸ H and C ¸ R. Source coding theorem (simplified)Channel coding theorem (simplified)

Part 2: Stochastic sources A source outputs symbols X 1, X 2,... A source outputs symbols X 1, X 2,... Each symbol take its value from an alphabet A = (a 1, a 2, …). Each symbol take its value from an alphabet A = (a 1, a 2, …). Model: P(X 1,…,X N ) assumed to be known for all combinations. Model: P(X 1,…,X N ) assumed to be known for all combinations. Source X 1, X 2, … Example 1: A text is a sequence of symbols each taking its value from the alphabet A = (a, …, z, A, …, Z, 1, 2, …9, !, ?, …). Example 2: A (digitized) grayscale image is a sequence of symbols each taking its value from the alphabet A = (0,1) or A = (0, …, 255).

Two Special Cases 1.The Memoryless Source Each symbol independent of the previous ones. Each symbol independent of the previous ones. P(X 1, X 2, …, X n ) = P(X 1 ) ¢ P(X 2 ) ¢ … ¢ P(X n ) P(X 1, X 2, …, X n ) = P(X 1 ) ¢ P(X 2 ) ¢ … ¢ P(X n ) 2.The Markov Source Each symbol depends on the previous one. Each symbol depends on the previous one. P(X 1, X 2, …, X n ) = P(X 1 ) ¢ P(X 2 |X 1 ) ¢ P(X 3 |X 2 ) ¢ … ¢ P(X n |X n-1 ) P(X 1, X 2, …, X n ) = P(X 1 ) ¢ P(X 2 |X 1 ) ¢ P(X 3 |X 2 ) ¢ … ¢ P(X n |X n-1 )

The Markov Source A symbol depends only on the previous symbol, so the source can be modelled by a state diagram. A symbol depends only on the previous symbol, so the source can be modelled by a state diagram. a b c A ternary source with alphabet A = (a, b, c).

The Markov Source Assume we are in state a, i.e., X k = a. Assume we are in state a, i.e., X k = a. The probabilities for the next symbol are: The probabilities for the next symbol are: a b c P(Xk+1 = a | Xk = a) = 0.3 P(Xk+1 = b bb b | Xk = a) = 0.7 P(Xk+1 = c | Xk = a) = 0

The Markov Source So, if X k+1 = b, we know that X k+2 will equal c. So, if X k+1 = b, we know that X k+2 will equal c. a b c P(X k+2 = a | X k+1 = b) = 0 P(X k+2 =b | X k+1 = b) = 0 P(X k+2 = c | X k+1 = b) = 1 P(X k+2 = a | X k+1 = b) = 0 P(X k+2 = b | X k+1 = b) = 0 P(X k+2 = c | X k+1 = b) = 1

The Markov Source If all the states can be reached, the stationary probabilities for the states can be calculated from the given transition probabilities. If all the states can be reached, the stationary probabilities for the states can be calculated from the given transition probabilities. Markov models can be used to represent sources with dependencies more than one step back. Markov models can be used to represent sources with dependencies more than one step back. –Use a state diagram with several symbols in each state. Stationary probabilities? Thats the probabilities i = P(X k = a i ) for any k when X k-1, X k-2, … are not given.

Analysis and Synthesis Stochastic models can be used for analysing a source. Stochastic models can be used for analysing a source. –Find a model that well represents the real-world source, and then analyse the model instead of the real world. Stochastic models can be used for synthesizing a source. Stochastic models can be used for synthesizing a source. –Use a random number generator in each step of a Markov model to generate a sequence simulating the source.

Show plastic slides!

Part 3: Information and Entropy Assume a binary memoryless source, e.g., a flip of a coin. How much information do we receive when we are told that the outcome is heads? Assume a binary memoryless source, e.g., a flip of a coin. How much information do we receive when we are told that the outcome is heads? –If its a fair coin, i.e., P (heads) = P (tails) = 0.5, we say that the amount of information is 1 bit. –If we already know that it will be (or was) heads, i.e., P (heads) = 1, the amount of information is zero! –If the coin is not fair, e.g., P (heads) = 0.9, the amount of information is more than zero but less than one bit! –Intuitively, the amount of information received is the same if P (heads) = 0.9 or P (heads) = 0.1.

Self Information So, lets look at it the way Shannon did. So, lets look at it the way Shannon did. Assume a memoryless source with Assume a memoryless source with –alphabet A = (a 1, …, a n ) –symbol probabilities (p 1, …, p n ). How much information do we get when finding out that the next symbol is a i ? How much information do we get when finding out that the next symbol is a i ? According to Shannon the self information of a i is According to Shannon the self information of a i is

Why? Assume two independent events A and B, with probabilities P(A) = p A and P(B) = p B. For both the events to happen, the probability is p A ¢ p B. However, the amount of information should be added, not multiplied. Logarithms satisfy this! No, we want the information to increase with decreasing probabilities, so lets use the negative logarithm.

Self Information Example 1: Example 2: Which logarithm? Pick the one you like! If you pick the natural log, youll measure in nats, if you pick the 10-log, youll get Hartleys, if you pick the 2-log (like everyone else), youll get bits.

Self Information H(X) is called the first order entropy of the source. This can be regarded as the degree of uncertainty about the following symbol. On average over all the symbols, we get:

Entropy Example: Example: Binary Memoryless Source BMS … The uncertainty (information) is greatest when Often denoted Then Let

Entropy: Three properties 1.It can be shown that 0 · H · log N. 2.Maximum entropy ( H = log N ) is reached when all symbols are equiprobable, i.e., p i = 1/N. 3.The difference log N – H is called the redundancy of the source.

Part 4: Entropy for Memory Sources Assume a block of source symbols (X 1, …, X n ) and define the block entropy: Assume a block of source symbols (X 1, …, X n ) and define the block entropy: The entropy for a memory source is defined as: The entropy for a memory source is defined as: That is, the summation is done over all possible combinations of n symbols. That is, let the block length go towards infintity. Divide by n to get the number of bits / symbol.

Entropy for a Markov Source The entropy for a state S k can be expressed as Averaging over all states, we get the entropy for the Markov source as P kl is the transition probability from state k to state l.

The Run-length Source Certain sources generate long runs or bursts of equal symbols. Certain sources generate long runs or bursts of equal symbols. Example: Example: Probability for a burst of length r : P(r) = (1- ) r-1 ¢ Probability for a burst of length r : P(r) = (1- ) r-1 ¢ Entropy: H R = - r=1 1 P(r) log P(r) Entropy: H R = - r=1 1 P(r) log P(r) If the average run length is, then H R / = H M. If the average run length is, then H R / = H M. A B

Part 5: The Source Coding Theorem The entropy is the smallest number of bits allowing error-free representation of the source. Why is this? Lets take a look on typical sequences!

Typical Sequences Assume a long sequence from a binary memoryless source with P(1) = p. Assume a long sequence from a binary memoryless source with P(1) = p. Among n bits, there will be approximately w = n ¢ p ones. Among n bits, there will be approximately w = n ¢ p ones. Thus, there is M = (n over w) such typical sequences! Thus, there is M = (n over w) such typical sequences! Only these sequences are interesting. All other sequences will appear with smaller probability the larger is n. Only these sequences are interesting. All other sequences will appear with smaller probability the larger is n.

How many are the typical sequences? bits/symbol Enumeration needs log M bits, i.e, bits per symbol!

How many bits do we need? Thus, we need H(X) bits per symbol to code any typical sequence!

The Source Coding Theorem Does tell us Does tell us –that we can represent the output from a source X using H(X) bits/symbol. –that we cannot do better. Does not tell us Does not tell us –how to do it.

Summary The mathematical model of communication. The mathematical model of communication. –Source, source coder, channel coder, channel,… –Rate, entropy, channel capacity. Information theoretical entities Information theoretical entities –Information, self-information, uncertainty, entropy. Sources Sources –BMS, Markov, RL The Source Coding Theorem The Source Coding Theorem