Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn

Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn hafiz@umich.edu http://www-perosnal-engin.umd.umich.edu/~hafiz

Overview Information Source Source Encode Channel Encode Pulse Modulate Encrypt Bandpass Modulate Multiplex Multiple Access Frequency Spreading XMT Source Coding – eliminate redundancy in the data, send same information in fewer bits Channel Coding – Detect/Correct errors in signaling and improve BER

What is information? – Student slips on ice in winter in Southfield, MI – Professor slips on ice, breaks arm, class cancelled – Student slips on ice in July in Southfield, MI Which statement contains the most information? Statement with unpredictability factor conveys the most new information and surprise Information (I) is related to -log(Pr{event}) Here, Pr{event}  1  I  0, & Pr{event}  0  I  ∞]

Measure of Randomness Entropy: average information per source output 0 0.5 1.0 Entropy H (bits) Probability, p 1.0 0.5 0 Let we have two events, E1 & E2 with Pr{E1} = p & Pr{E2} = q {E1 U E2} = S Here p=1-q

Comments on Entropy When log is taken to base 2, unit of H bits/event & for base e nats Entropy is maximum for a uniformly distributed random variable Example:

Example: Average Information Content in English Language Calculate the average information in bits/character in English assuming each letter is equally likely

Real-world English But English characters do not appear with the same frequency in the real-world English literature, probabilities of each character are assumes as, Pr{a}=Pr{e}=Pr{o}=Pr{t}=0.10 Pr{h}=Pr{i}=Pr{n}=Pr{r}=Pr{s}=0.07 Pr{c}=Pr{d}=Pr{f}=Pr{l}=Pr{p}=Pr{u}=Pr{y} =0.02 Pr{b}=Pr{g}=Pr{j}=Pr{k}=Pr{g}=Pr{v}=Pr{w}=Pr{x}=Pr{z} =0.01

8 Example Binary Sequence X, P(X=1) = P(X=0) = ½ P b =0.01 (1 error per 100 bits) R=1000 binary symbol/sec

Source Coding Goal is to find an efficient description of information sources – Reduce required bandwidth – Reduce memory to store Memoryless –If symbols from source are independent, one symbol does not depend on next Memory – elements of a sequence depend on one another, e.g. UNIVERSIT_?, 10-tuple contains less information since dependent

10 Source Coding (II) This means that it’s more efficient to code information with memory as groups of symbols

11 Desirable Properties Length – Fixed Length – ASCII – Variable Length – Morse Code, JPEG Uniquely Decodable – allow user to invert mapping to the original Prefix-Free – No codeword can be a prefix of any other codeword Average Code Length (n i is code length of i th symbol)

12 Uniquely Decodable and Prefix Free Codes Uniquely decodable? – Not code 1 – If “10111” sent, is code 3 ‘babbb’or ‘bacb’? Not code 3 or 6 Prefix-Free – Not code 4, – prefix contains ‘1’ Avg Code Length – Code 2: n=2 – Code 5: n=1.23 XiXi P(X i ) a0.73 b0.25 c0.02 Sym bol Code 1 Code 2 Code 3 Code 4 Code 5 Code 6 a00 0111 b 011100001 c1110111000111

Huffman Code Characteristics of Huffman Codes: – Prefix-free, variable length code that can achieve the shortest average code length for an alphabet – Most frequent symbols have short codes Procedure – List all symbols and probabilities in descending order – Merge branches with two lowest probabilities, combine their probabilities – Repeat until one branch is left

Huffman Code Example 0.4 0.2 0.1 a b c d e f 0.2 0.4 0.2 0.1 0.2 0.4 0.2 0.4 0.2 0.6 0.4 1.0 1010 1010 1010 1010 1010 The Code: A11 B00 C101 D100 E011 F010 Compression Ratio: 3.0/2.4=1.25 Entropy: 2.32

Example: Consider a random vector X = {a, b, c} with associated probabilities as listed in the Table Calculate the entropy of this symbol set Find the Huffman Code for this symbol set Find the compression ratio and efficiency of this code XiXi P(X i ) a0.73 b0.25 c0.02

Extension Codes Combine alphabet symbols to increase variability Try to combine very common 2,3 letter combinations, e.g.: th,sh, ed, and, the,ing,ion XiXi P(X i ) aa0.5329 ab0.1825 ba0.1825 bb0.0625 ac0.0146 ca0.0146 bc0.0050 cb0.0050 cc0.0002 Codenini n i P(X i ) 110.5329 0020.3650 01130.5475 010140.2500 0100050.0730 01001160.0876 010010070.0350 0100101180.0400 0100101080.0016

17 Lempel-Ziv (ZIP) Codes Huffman codes have some shortcomings – Know symbol probability information a priori – Coding tree must be known at coder/decoder Lempel-Ziv algorithm use text itself to iteratively construct a sequence of variable length code words Used in gzip, UNIX compress, LZW algorithms

18 Lempel-Ziv Algorithm Look through a code dictionary with already coded segments – If matches segment, send and store segment + new character in dictionary – If no match, store in dictionary, send

19 LZ Coding: Example Encode [a b a a b a b b b b b b b a b b b b b a] Code Dictionary Address Contents Encoded Packets 1a 2b 3aa 4ba 5bb 6bbb 7bba 8bbbb Note: 9 code words, 3 bit address, 1 bit for new character,

20 Notes on Lempel-Ziv Codes Compression gains occur for long sequences Other variations exist that encode data using packets and run length encoding:

Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn

Similar presentations

Presentation on theme: "Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn

Similar presentations

Presentation on theme: "Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn"— Presentation transcript:

Similar presentations

About project

Feedback