DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK

Slides:



Advertisements
Similar presentations
DCSP-8: Minimal length coding II, Hamming distance, Encryption Jianfeng Feng
Advertisements

DCSP-10 Jianfeng Feng Department of Computer Science Warwick Univ., UK
DCSP-17 Jianfeng Feng Department of Computer Science Warwick Univ., UK
DCSP-6: Signal Transmission + information theory Jianfeng Feng Department of Computer Science Warwick Univ., UK
DCSP-5: Noise Jianfeng Feng Department of Computer Science Warwick Univ., UK
DCSP-14 Jianfeng Feng Department of Computer Science Warwick Univ., UK
DCSP-17: Matched Filter Jianfeng Feng Department of Computer Science Warwick Univ., UK
DCSP-4: Modem Jianfeng Feng Department of Computer Science Warwick Univ., UK
DCSP-7: Information Jianfeng Feng Department of Computer Science Warwick Univ., UK
DCSP-13 Jianfeng Feng Department of Computer Science Warwick Univ., UK
DCSP-3: Fourier Transform Jianfeng Feng Department of Computer Science Warwick Univ., UK
DCSP-11 Jianfeng Feng
DCSP-12 Jianfeng Feng Department of Computer Science Warwick Univ., UK
Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Chapter 4 Variable–Length and Huffman Codes. Unique Decodability We must always be able to determine where one code word ends and the next one begins.
EE 4780 Huffman Coding Example. Bahadir K. Gunturk2 Huffman Coding Example Suppose X is a source producing symbols; the symbols comes from the alphabet.
Lecture 4 (week 2) Source Coding and Compression
Applied Algorithmics - week7
Lecture 3: Source Coding Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
Information Theory EE322 Al-Sanie.
An introduction to Data Compression
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Problem: Huffman Coding Def: binary character code = assignment of binary strings to characters e.g. ASCII code A = B = C =
Data Compression.
SWE 423: Multimedia Systems Chapter 7: Data Compression (3)
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
SWE 423: Multimedia Systems
Fundamental limits in Information Theory Chapter 10 :
HUFFMAN TREES CSC 172 SPRING 2002 LECTURE 24. Prefix Codes Consider a binary trie representing a code
SWE 423: Multimedia Systems Chapter 7: Data Compression (2)
CS336: Intelligent Information Retrieval
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
Lossless data compression Lecture 1. Data Compression Lossless data compression: Store/Transmit big files using few bytes so that the original files.
Data Structures – LECTURE 10 Huffman coding
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Variable-Length Codes: Huffman Codes
CSI Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability.
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Information Theory and Security
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
Huffman coding By: Justin Bancroft. Huffman coding is the most efficient way to code DCT coefficients.
Dr.-Ing. Khaled Shawky Hassan
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
Linawati Electrical Engineering Department Udayana University
DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Coding Theory Efficient and Reliable Transfer of Information
Foundation of Computing Systems
Digital Image Processing Lecture 22: Image Compression
Bahareh Sarrafzadeh 6111 Fall 2009
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Coding (2 nd Method). Huffman coding (2 nd Method)  The Huffman code is a source code. Here word length of the code word approaches the fundamental.
ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X.
Information Theory Information Suppose that we have the source alphabet of q symbols s 1, s 2,.., s q, each with its probability p(s i )=p i. How much.
Ch4. Zero-Error Data Compression Yuan Luo. Content  Ch4. Zero-Error Data Compression  4.1 The Entropy Bound  4.2 Prefix Codes  Definition and.
Increasing Information per Bit
Algorithms for iSNE Dr. Kenneth Cosh Week 13.
Chapter 16: Greedy Algorithms
Analysis & Design of Algorithms (CSCE 321)
Lecture 6 Instantaneous Codes and Kraft’s Theorem (Section 1.4)
Lecture 8 Huffman Encoding (Section 2.2)
Presentation transcript:

DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK

X=(lie on bed at 10 am today, in university at 10 am today, attend a lecture at 10 am today) p=(1/2,1/4,1/4), the entropy is ?????????/ The unit of entropy is bit/bit=bit/symbol.

Information source coding It seems intuitively reasonable that an information source of entropy H needs on average only H binary bits to represent each symbol. Indeed, the equi-probable BSS generate on average 1 information bit per symbol bit. However, consider the prime minister example again.

Suppose the probability of naked run is 0.1 (N) and that of office 0.9 (O). We have already noted that this source has an entropy of 0.47 bits/symbol. Suppose we identify naked run with 1 and office with zero.

This representation uses 1 binary bit per symbol, and is using more binary bits per symbol than the entropy suggests is necessary.

Shannon's first theorem an instantaneous code can be found that encodes a source of entropy H(X) with an average number of bits per symbol Bs such that Bs >= H(X)

The replacement of the symbols naked run/office with a binary representation is termed source coding. In any coding operation we replace the symbol with a codeword. The purpose of course coding is to reduce the number of bits required to convey the information provided by the information source: minimize the average length of codes.

Central to source coding is the use of sequence. By this, we mean that codewords are not simply associated to a single outcome, but to a sequence outcomes.

To see why this is useful, let us return to the problem of the prime minister. Suppose we group the outcome in three, according to theory probability, and assign binary codewords to these grouped outcomes.

Table 1 shows such a code, and the probability of each code word occurring. It is easy to compute that this code will on average use 1.2 bits/symbol

0.729log_2(0.729)+0.081log_2(0.081)* *log_2(0.009)* *log_2(0.001) = The average length of coding is given by 0.729* *1+2*0.081*2+2*0.009*2+3* *0.001=1.2

This example shows how using sequences permits us to decrease the average number of bits per symbol. Moreover, without difficulty, we have found a code that has an average bit usage less than the source entropy.

However, there is a difficulty with the code in Table 1. Before a code word can be decoded, it must be parsed. Parsing describes that activity of breaking the message string into its component codewords.

After parsing, each codeword can be decoded into its symbol sequence. An instantaneously parsable code is one that can be parsed as soon as the last bit of a codeword is received.

An instantaneous code must satisfy the prefix condition: that no codeword may be a prefix of any other code. This condition is not satisfied by the code in Table 1.

Huffman coding The code in Table 1, however, is an instantaneously parsable code. It satisfies the prefix condition.

0.729* *3* *5* *5=

Decoding Why do we require a code with the shortest average length?

The derivation of the Huffman doe tree is shown in Fig. and the tree itself is shown in Fig.. In both these figures, the letter A to H have be used in replace of the sequence in Table 2 to make them easier to read.

Like many theorem of information theory, the theorem tells us nothing of how to find the code. However, it is useful results.

For example, the code in Table 2 uses 1.6 bits/symbol which is only 0.2 bits/symbol more bits per sequence than the theorem tells us is the best we can do. We might conclude that there is little point in expending the effort in finding a code less satisfying the inequality above.