Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn

Slides:



Advertisements
Similar presentations
CY2G2 Information Theory 1
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Applied Algorithmics - week7
Sampling and Pulse Code Modulation
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Information Theory EE322 Al-Sanie.
Source Coding Data Compression A.J. Han Vinck. DATA COMPRESSION NO LOSS of information and exact reproduction (low compression ratio 1:4) general problem.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Compression & Huffman Codes
Huffman Encoding 16-Apr-17.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
CSc 461/561 CSc 461/561 Multimedia Systems Part B: 1. Lossless Compression.
Fundamental limits in Information Theory Chapter 10 :
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
A Data Compression Algorithm: Huffman Compression
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
Document and Query Forms Chapter 2. 2 Document & Query Forms Q 1. What is a document? A document is a stored data record in any form A document is a stored.
Chapter 9: Huffman Codes
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Basics of Compression Goals: to understand how image/audio/video signals are compressed to save storage and increase transmission efficiency to understand.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
Chapter 2 Source Coding (part 2)
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
Source Coding-Compression
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Linawati Electrical Engineering Department Udayana University
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Data Compression Meeting October 25, 2002 Arithmetic Coding.
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Multi-media Data compression
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
ECE 101 An Introduction to Information Technology Information Coding.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
Huffman Coding (2 nd Method). Huffman coding (2 nd Method)  The Huffman code is a source code. Here word length of the code word approaches the fundamental.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
UNIT –V INFORMATION THEORY EC6402 : Communication TheoryIV Semester - ECE Prepared by: S.P.SIVAGNANA SUBRAMANIAN, Assistant Professor, Dept. of ECE, Sri.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
RICO HARTONO JAHJA SOURCE CODING: PART IV.
Data Compression: Huffman Coding in Weiss (p.389)
CSE 589 Applied Algorithms Spring 1999
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
Compression & Huffman Codes
EE465: Introduction to Digital Image Processing
Increasing Information per Bit
Applied Algorithmics - week7
Chapter 9: Huffman Codes
Huffman Coding CSE 373 Data Structures.
CSE 589 Applied Algorithms Spring 1999
Huffman Encoding.
CSE 589 Applied Algorithms Spring 1999
Presentation transcript:

Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn

Overview Information Source Source Encode Channel Encode Pulse Modulate Encrypt Bandpass Modulate Multiplex Multiple Access Frequency Spreading XMT Source Coding – eliminate redundancy in the data, send same information in fewer bits Channel Coding – Detect/Correct errors in signaling and improve BER

What is information? – Student slips on ice in winter in Southfield, MI – Professor slips on ice, breaks arm, class cancelled – Student slips on ice in July in Southfield, MI Which statement contains the most information? Statement with unpredictability factor conveys the most new information and surprise Information (I) is related to -log(Pr{event}) Here, Pr{event}  1  I  0, & Pr{event}  0  I  ∞]

Measure of Randomness Entropy: average information per source output Entropy H (bits) Probability, p Let we have two events, E1 & E2 with Pr{E1} = p & Pr{E2} = q {E1 U E2} = S Here p=1-q

Comments on Entropy When log is taken to base 2, unit of H bits/event & for base e nats Entropy is maximum for a uniformly distributed random variable Example:

Example: Average Information Content in English Language Calculate the average information in bits/character in English assuming each letter is equally likely

Real-world English But English characters do not appear with the same frequency in the real-world English literature, probabilities of each character are assumes as, Pr{a}=Pr{e}=Pr{o}=Pr{t}=0.10 Pr{h}=Pr{i}=Pr{n}=Pr{r}=Pr{s}=0.07 Pr{c}=Pr{d}=Pr{f}=Pr{l}=Pr{p}=Pr{u}=Pr{y} =0.02 Pr{b}=Pr{g}=Pr{j}=Pr{k}=Pr{g}=Pr{v}=Pr{w}=Pr{x}=Pr{z} =0.01

8 Example Binary Sequence X, P(X=1) = P(X=0) = ½ P b =0.01 (1 error per 100 bits) R=1000 binary symbol/sec

Source Coding Goal is to find an efficient description of information sources – Reduce required bandwidth – Reduce memory to store Memoryless –If symbols from source are independent, one symbol does not depend on next Memory – elements of a sequence depend on one another, e.g. UNIVERSIT_?, 10-tuple contains less information since dependent

10 Source Coding (II) This means that it’s more efficient to code information with memory as groups of symbols

11 Desirable Properties Length – Fixed Length – ASCII – Variable Length – Morse Code, JPEG Uniquely Decodable – allow user to invert mapping to the original Prefix-Free – No codeword can be a prefix of any other codeword Average Code Length (n i is code length of i th symbol)

12 Uniquely Decodable and Prefix Free Codes Uniquely decodable? – Not code 1 – If “10111” sent, is code 3 ‘babbb’or ‘bacb’? Not code 3 or 6 Prefix-Free – Not code 4, – prefix contains ‘1’ Avg Code Length – Code 2: n=2 – Code 5: n=1.23 XiXi P(X i ) a0.73 b0.25 c0.02 Sym bol Code 1 Code 2 Code 3 Code 4 Code 5 Code 6 a b c

Huffman Code Characteristics of Huffman Codes: – Prefix-free, variable length code that can achieve the shortest average code length for an alphabet – Most frequent symbols have short codes Procedure – List all symbols and probabilities in descending order – Merge branches with two lowest probabilities, combine their probabilities – Repeat until one branch is left

Huffman Code Example a b c d e f The Code: A11 B00 C101 D100 E011 F010 Compression Ratio: 3.0/2.4=1.25 Entropy: 2.32

Example: Consider a random vector X = {a, b, c} with associated probabilities as listed in the Table Calculate the entropy of this symbol set Find the Huffman Code for this symbol set Find the compression ratio and efficiency of this code XiXi P(X i ) a0.73 b0.25 c0.02

Extension Codes Combine alphabet symbols to increase variability Try to combine very common 2,3 letter combinations, e.g.: th,sh, ed, and, the,ing,ion XiXi P(X i ) aa ab ba bb ac ca bc cb cc Codenini n i P(X i )

17 Lempel-Ziv (ZIP) Codes Huffman codes have some shortcomings – Know symbol probability information a priori – Coding tree must be known at coder/decoder Lempel-Ziv algorithm use text itself to iteratively construct a sequence of variable length code words Used in gzip, UNIX compress, LZW algorithms

18 Lempel-Ziv Algorithm Look through a code dictionary with already coded segments – If matches segment, send and store segment + new character in dictionary – If no match, store in dictionary, send

19 LZ Coding: Example Encode [a b a a b a b b b b b b b a b b b b b a] Code Dictionary Address Contents Encoded Packets 1a 2b 3aa 4ba 5bb 6bbb 7bba 8bbbb Note: 9 code words, 3 bit address, 1 bit for new character,

20 Notes on Lempel-Ziv Codes Compression gains occur for long sequences Other variations exist that encode data using packets and run length encoding: