DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK

Slides:



Advertisements
Similar presentations
DCSP-8: Minimal length coding II, Hamming distance, Encryption Jianfeng Feng
Advertisements

DCSP-10 Jianfeng Feng Department of Computer Science Warwick Univ., UK
DCSP-6: Signal Transmission + information theory Jianfeng Feng Department of Computer Science Warwick Univ., UK
DCSP-14 Jianfeng Feng Department of Computer Science Warwick Univ., UK
DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK
DCSP-7: Information Jianfeng Feng Department of Computer Science Warwick Univ., UK
DCSP-12 Jianfeng Feng Department of Computer Science Warwick Univ., UK
Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
EE 4780 Huffman Coding Example. Bahadir K. Gunturk2 Huffman Coding Example Suppose X is a source producing symbols; the symbols comes from the alphabet.
Lecture 4 (week 2) Source Coding and Compression
Lecture 3: Source Coding Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Sampling and Pulse Code Modulation
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Information Theory EE322 Al-Sanie.
An introduction to Data Compression
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Problem: Huffman Coding Def: binary character code = assignment of binary strings to characters e.g. ASCII code A = B = C =
Some Common Binary Signaling Formats: NRZ RZ NRZ-B AMI Manchester.
Data Compression.
Entropy and Shannon’s First Theorem
SWE 423: Multimedia Systems Chapter 7: Data Compression (3)
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
Fundamental limits in Information Theory Chapter 10 :
HUFFMAN TREES CSC 172 SPRING 2002 LECTURE 24. Prefix Codes Consider a binary trie representing a code
SWE 423: Multimedia Systems Chapter 7: Data Compression (2)
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
CS336: Intelligent Information Retrieval
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
1 Chapter 5 A Measure of Information. 2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Variable-Length Codes: Huffman Codes
CSI Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability.
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Information Theory and Security
Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
Information Theory & Coding…
Dr.-Ing. Khaled Shawky Hassan
exercise in the previous class
Information Theory and Coding System EMCS 676 Fall 2014 Prof. Dr. Md. Imdadul Islam
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
Linawati Electrical Engineering Department Udayana University
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Coding Theory Efficient and Reliable Transfer of Information
Additive White Gaussian Noise
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
Foundation of Computing Systems
Digital Image Processing Lecture 22: Image Compression
Bahareh Sarrafzadeh 6111 Fall 2009
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
INFORMATION THEORY Pui-chor Wong.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Huffman Coding (2 nd Method). Huffman coding (2 nd Method)  The Huffman code is a source code. Here word length of the code word approaches the fundamental.
ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X.
Information Theory Information Suppose that we have the source alphabet of q symbols s 1, s 2,.., s q, each with its probability p(s i )=p i. How much.
Ch4. Zero-Error Data Compression Yuan Luo. Content  Ch4. Zero-Error Data Compression  4.1 The Entropy Bound  4.2 Prefix Codes  Definition and.
CSI-447: Multimedia Systems
Increasing Information per Bit
Department of Computer Science Warwick Univ., UK
Analysis & Design of Algorithms (CSCE 321)
CSE 589 Applied Algorithms Spring 1999
CSE 589 Applied Algorithms Spring 1999
Lecture 8 Huffman Encoding (Section 2.2)
Presentation transcript:

DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK

Information sources Analog signals We have discussed in the previous weeks (ADC sampling ) Many daily life events we are going to discuss now ?

X=(lie on bed at 11 am today, in university at 11 am today, attend a lecture at 11 am today) p=(1/2,1/4,1/4), the entropy is ?????????/ The unit of entropy is bit/bit=bit/symbol.

Information source coding It seems intuitively reasonable that an information source of entropy H needs on average only H binary bits to represent each symbol. Indeed, the equi-probable BSS generate on average 1 information bit per symbol bit. However, consider the prime minister example again.

Suppose the probability of naked run is 0.1 (N) and that of office 0.9 (O). We have already noted that this source has an entropy of 0.47 bits/symbol. Suppose we identify naked run with 1 and office with zero. What is the relationship between entropy and average length?

Shannon's first theorem An instantaneous code can be found that encodes a source of entropy H(X) with an average number of bits per symbol B s such that B s >= H(X)

The replacement of the symbols naked run/office with a binary representation is termed source coding.

In any coding operation we replace the symbol with a codeword.

The replacement of the symbols naked run/office with a binary representation is termed source coding. In any coding operation we replace the symbol with a codeword. The purpose of sourse coding is to reduce the number of bits required to convey the information provided by the information source: minimize the average length of codes.

Central to source coding is the use of sequence. By this, we mean that codewords are not simply associated to a single outcome, but to a sequence outcomes.

To see why this is useful, let us return to the problem of the prime minister. Suppose we group the outcome in three, according to theory probability, and assign binary codewords to these grouped outcomes.

Table 1 shows such a code, and the probability of each code word occurring. It is easy to compute that this code will on average use 1.2 bits/symbol

0.729log 2 (0.729)+0.081log 2 (0.081)* *log 2 (0.009)* *log 2 (0.001) = The average length of coding is given by 0.729* *1+2*0.081*2+2*0.009*2+3* *0.001 =1.2

This example shows how using sequences permits us to decrease the average number of bits per symbol. Moreover, without difficulty, we have found a code that has an average bit usage less than the source entropy.

However, there is a difficulty with the code in Table 1. Before a code word can be decoded, it must be parsed. Parsing describes that activity of breaking the message string into its component codewords.

After parsing, each codeword can be decoded into its symbol sequence. An instantaneously parsable code is one that can be parsed as soon as the last bit of a codeword is received.

An instantaneous code must satisfy the prefix condition: that no codeword may be a prefix of any other code. This condition is not satisfied by the code in Table 1.

Huffman coding The code in Table 1, however, is an instantaneously parsable code. It satisfies the prefix condition.

0.729* *3* *5* *5=

Decoding Why do we require a code with the shortest average length?

The derivation of the Huffman doe tree is shown in Fig. and the tree itself is shown in Fig.. In both these figures, the letter A to H have be used in replace of the sequence in Table 2 to make them easier to read.

Like many theorem of information theory, the theorem tells us nothing of how to find the code. However, it is useful results.

For example, the code in Table 2 uses 1.6 bits/symbol which is only 0.2 bits/symbol more bits per sequence than the theorem tells us is the best we can do. We might conclude that there is little point in expending the effort in finding a code less satisfying the inequality above. An example