Lecture 4 (week 2) Source Coding and Compression

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Applied Algorithmics - week7
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture3.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture04 Data Compression.
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
Huffman Coding: An Application of Binary Trees and Priority Queues
Chapter 9: Huffman Codes
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Lecture 4 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Data Compression Basics & Huffman Coding
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Huffman Codes Message consisting of five characters: a, b, c, d,e
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Topic 20: Huffman Coding The author should gaze at Noah, and... learn, as they did in the Ark, to crowd a great deal of matter into a very small compass.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Data Structures Week 6: Assignment #2 Problem
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
1 Huffman Codes Drozdek Chapter Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 5.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-217, ext: 1204, Lecture 4 (Week 2)
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
HUFFMAN CODES.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Madivalappagouda Patil
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Analysis & Design of Algorithms (CSCE 321)
Huffman Coding.
Advanced Algorithms Analysis and Design
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Greedy: Huffman Codes Yin Tat Lee
Data Structure and Algorithms
Topic 20: Huffman Coding The author should gaze at Noah, and ... learn, as they did in the Ark, to crowd a great deal of matter into a very small compass.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
CSE 589 Applied Algorithms Spring 1999
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

Lecture 4 (week 2) Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg 1 1

Huffman Coding Proposed by Dr. David A. Huffman in 1952 “A Method for the Construction of Minimum Redundancy Codes” Applicable to many forms of data transmission Mostly used example: text files 2 2

Huffman Coding: What We Will Discuss Huffman Coding Algorithm: The procedure to build Huffman codes Extended Huffman Codes (new) Adaptive Huffman Coding (new) Update Procedure Decoding Procedure 3 3

Shannon-Fano Coding The second code based on Shannon’s theory It is a suboptimal code (it took a graduate student (Huffman) to fix it!) Algorithm: Start with empty codes Compute frequency statistics for all symbols Order the symbols in the set by frequency Split the set (almost half-half!) to minimize*difference Add ‘0’ to the codes in the first set and ‘1’ to the rest Recursively assign the rest of the code bits for the two subsets, until sets cannot be split. 4 4

Shannon-Fano Coding Example: Assume a sequence: A={a,b,c,d,e,f} with the following occurrence weights, {9, 8, 6, 5, 4, 2}, respectively Apply Shannon-Fano Coding and discuss the sub-optimality 5 5

Shannon-Fano Coding 6 6

Shannon-Fano Coding 7 7

Shannon-Fano Coding 8 8

Shannon-Fano Coding 9 9

Shannon-Fano Coding 10 10

Shannon-Fano Coding 1 1 e f 11 4 2 11

Shannon-Fano Coding Shannon-Fano does not always produce optimal prefix codes; the ordering is performed only once at the beginning!! Huffman coding is almost as computationally simple and produces prefix codes that always achieve the lowest expected code word length, under the constraints that each symbol is represented by a code formed of an integral number of bits Sometimes prefix-free codes are called, Huffman codes Symbol-by-symbol Huffman coding (the easiest one) is only optimal if the probabilities of these symbols are independent and are some power of a half, i.e. (½)n 12 12

Huffman Coding Proposed by Dr. David A. Huffman in 1952 “A Method for the Construction of Minimum Redundancy Codes” Applicable to many forms of data transmission Our example: text files In general, Huffman coding is a form of statistical coding as not all characters occur with the same frequency! 13 13

Huffman Coding: The Same Idea Why Huffman coding (likewise all entropy coding): Code word lengths are no longer fixed like ASCII. Code word lengths vary and will be shorter for the more frequently used characters, i.e., overall shorter average code length! 14 14

Huffman Coding: The Algorithm 1. Scan the text to be compressed and compute the occurrence of all characters. 2. Sort or prioritize characters based on number of occurrences in text (from low-to-high). 3. Build Huffman code tree based on prioritized list. 4. Perform a traversal of tree to determine all code words. 5. Scan text again and create (encode) the characters in a new coded file using the Huffman codes. 15 15

E e r i space y s n a r l k . Huffman Coding: Building The Tree Example Consider the following short text: Eerie eyes seen near lake. Count up the occurrences of all characters in the text E e r i e e y e s s e e n n e a r l a k e . E e r i space y s n a r l k . 16 16

Eerie eyes seen near lake. Huffman Coding: Building The Tree Example Eerie eyes seen near lake. What is the frequency of each character in the text? Char Freq. Char Freq. Char Freq. E 1 y 1 k 1 e 8 s 2 . 1 r 2 n 2 i 1 a 2 space 4 l 1 17 17

Huffman Coding: Building The Tree 1. Create binary tree nodes with character and frequency of each character 2. Place nodes in a priority queue “??” The lower the occurrence, the higher the priority in the queue 18 18

Huffman Coding: Building The Tree Uses binary tree nodes (OOP-like View; second bonus assignment: Construct the Huffman tree as follows!!) public class HuffNode { public char myChar; public int myFrequency; public HuffNode, myLeft, myRight; } priorityQueue myQueue; 19 19

Huffman Coding: Building The Tree The queue after inserting all nodes Null Pointers are not shown E 1 i y l k . r 2 s n a sp 4 e 8 20 20

Huffman Coding: Building The Tree While priority queue contains two or more nodes Create new node Dequeue node and make it left sub-tree Dequeue next node and make it right sub-tree Frequency of new node equals sum of frequency of left and right children Enqueue new node back into queue in the right order!! 21 21

Huffman Coding: Building The Tree 1 i y l k . r 2 s n a sp 4 e 8 22 22

Huffman Coding: Building The Tree y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 E 1 i 1 23 23

Huffman Coding: Building The Tree k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 2 E 1 i 1 2 2 y 1 l 1 24 24

Huffman Coding: Building The Tree 2 k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 CS 307 25

Huffman Coding: Building The Tree 2 s 2 n 2 a 2 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 2 k 1 . 1 CS 307 26

Huffman Coding: Building The Tree 2 s 2 n 2 a 2 2 sp 4 e 8 2 2 k 1 . 1 E 1 i 1 y 1 l 1 CS 307 27

Huffman Coding: Building The Tree 2 a 2 2 sp 4 e 8 2 2 E 1 i 1 y 1 l 1 k 1 . 1 4 r 2 s 2 CS 307 28

Huffman Coding: Building The Tree 2 a 2 2 e 8 sp 4 2 4 2 k 1 . 1 r 2 s 2 E 1 i 1 y 1 l 1 CS 307 29

Huffman Coding: Building The Tree 8 2 4 2 2 sp 4 r 2 s 2 y 1 l 1 k 1 . 1 E 1 i 1 4 n 2 a 2 CS 307 30

Huffman Coding: Building The Tree 8 2 4 4 2 2 sp 4 r 2 s 2 n 2 a 2 y 1 l 1 k 1 . 1 E 1 i 1 CS 307 31

Huffman Coding: Building The Tree 8 4 4 2 sp 4 r 2 s 2 n 2 a 2 k 1 . 1 4 2 2 E 1 i 1 y 1 l 1 CS 307 32

Huffman Coding: Building The Tree 4 4 4 2 sp 4 e 8 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 CS 307 33

Huffman Coding: Building The Tree 4 4 4 e 8 2 2 r 2 s 2 n 2 a 2 E 1 i 1 y 1 l 1 6 2 sp 4 k 1 . 1 CS 307 34

Huffman Coding: Building The Tree 4 6 4 4 e 8 2 sp 4 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 What is happening to the characters with a low number of occurrences? CS 307 35

Huffman Coding: Building The Tree 4 6 e 8 2 2 2 sp 4 k 1 . 1 E 1 i 1 y 1 l 1 8 4 4 r 2 s 2 n 2 a 2 CS 307 36

Huffman Coding: Building The Tree 4 6 e 8 8 2 2 2 sp 4 4 4 k 1 . 1 E 1 i 1 y 1 l 1 r 2 s 2 n 2 a 2 CS 307 37

Huffman Coding: Building The Tree 8 e 8 4 4 10 r 2 s 2 n 2 a 2 4 6 2 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 CS 307 38

Huffman Coding: Building The Tree 8 10 e 8 4 4 4 6 2 2 r 2 s 2 n 2 a 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 CS 307 39

Huffman Coding: Building The Tree 10 16 4 6 2 2 e 8 8 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 4 4 r 2 s 2 n 2 a 2 CS 307 40

Huffman Coding: Building The Tree 10 16 4 6 e 8 8 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 CS 307 41

Huffman Coding: Building The Tree 26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 CS 307 42

Huffman Coding: Building The Tree 26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 After enqueueing this node there is only one node left in priority queue. CS 307 43

Huffman Coding: Building The Tree Dequeue the single node left in the queue. This tree contains the new code words for each character. Frequency of root node should equal number of characters in text. E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 Eerie eyes seen near lake. ----> 26 characters CS 307 44

Encoding the File Traverse Tree Perform a traversal of the tree to obtain new code words Going left is a 0 going right is a 1 code word is only completed when a leaf node is reached E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 CS 307 45

Huffman Coding (contd.) Char Code E 0000 i 0001 y 0010 l 0011 k 0100 . 0101 space 011 e 10 r 1100 s 1101 n 1110 a 1111 E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 1 1 1 1 1 1 1 1 1 1 1 46

Huffman Coding: Example (2) Init: Create a priority set out of each letter 47

Huffman Coding: Example (2) 1. Sort the sets according to probability (lowest first) 48

Huffman Coding: Example (2) 2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters 49

Huffman Coding: Example (2) 3. Insert prefix ‘0’ into the codes of the second set letters 50

Huffman Coding: Example (2) 4. Merge the top two sets 51

Huffman Coding: Example (2) 1. Sort sets according to probability (lowest first) 52

Huffman Coding: Example (2) 2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters 53

Huffman Coding: Example (2) 3. Insert prefix ‘0’ into the codes of the second set letters 54

Huffman Coding: Example (2) 4. Merge the top two sets 55

Huffman Coding: Example (2) 1. Sort sets according to probability (lowest first) 56

Huffman Coding: Example (2) 2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters 57

Huffman Coding: Example (2) 3. Insert prefix ‘0’ into the codes of the second set letters 58

Huffman Coding: Example (2) 4. Merge the top two sets 59

Huffman Coding: Example (2) 1. Sort sets according to probability (lowest first) 60

Huffman Coding: Example (2) 2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters 61

Huffman Coding: Example (2) 3. Insert prefix ‘0’ into the codes of the second set letters 62

Huffman Coding: Example (2) 4. Merge the top two sets The END 63

Example Summary Average code length l = 0.4x1 + 0.2x2 + 0.2x3 + 0.1x4 + 0.1x4 = 2.2 bits/symbol Entropy H = Σs=a..eP(s) log2P(s) = 2.122 bits/symbol Redundancy l – H = 0.078 bits/symbol 64

Example: Tree 1 Average code length l = 0.4x1 + 0.2x2 + 0.2x3 + 0.1x4 + 0.1x4 = 2.2 bits/symbol 65

Huffman Coding: Example (3) Symbols: {a,b,c,d,e,} Weights: {0.2, 0.4, 0.2,0.1, 0.1} Required: Maximum Variance Tree! b, 0.4 c, 0.2 a, 0.2 e, 0.1 d, 0.1 1 b, 0.4 b, 0.4 deac, 0.6 deacb, 1.0 1 c, 0.2 b, 0.4 dea, 0.4 1 a, 0.2 c, 0.2 1 de, 0.2

Example: Tree 1 Average code length l = 0.4x1 + 0.2x2 + 0.2x3 + 0.1x4 + 0.1x4 = 2.2 bits/symbol 67

Huffman Coding: Example (3) Symbols: {a,b,c,d,e,} Weights: {0.2, 0.4, 0.2,0.1, 0.1} Required: Minimum Variance Tree! b, 0.4 b, 0.4 ac, 0.4 deb, 0.6 deacb, 1.0 1 c, 0.2 de, 0.2 ac, 0.4 b, 0.4 1 a, 0.2 c, 0.2 de, 0.2 1 e, 0.1 a, 0.2 1 d, 0.1

Example: Minimum Variance Tree 11 00 10 011 a c 010 b e d Average code length l = 0.4x2 + (0.1 + 0.1)x3+ (0.2 + 0.2)*2 = 2.2 bits/symbol 69

Example: Yet Another Tree Average code length l = 0.4x1 + (0.2 + 0.2 + 0.1 + 0.1)x3 = 2.2 bits/symbol 70

Min Variance Huffman Trees Huffman codes are not unique All versions yield the same average length Which one should we choose? The one with the minimum variance in codeword lengths i.e., with the minimum height tree Why? It will ensure the least amount of variability in the encoded stream 71

Another Example! Consider the source: A = {a, b, c}, P(a) = 0.8, P(b) = 0.02, P(c) = 0.18 H = 0.816 bits/symbol Huffman code: a → 0 b → 11 c → 10 l = 1.2 bits/symbol Redundancy = 0.384 bits/symbol (on average)(47%!) Q: Could we do better? 72

Extended Huffman Codes Example 1: Consider encoding sequences of two letters l = (0.64x1+0.144x2+0.144x3+0.0324x4+0.016x5+0.016x6+0.0036x7+0.0004x8+0.0036x8)/2 = 1.7228/2 bits/symbol 73 Redundancy = 0.0045 bits/symbol 73

Extended Huffman Codes (Remarks) The idea can be extended further Consider all possible nm sequences (we did 32) In theory, by considering more sequences we can improve the coding !! (is it applicable ? ) In reality, the exponential growth of the alphabet makes this impractical E.g., for length 3 ASCII seq.: 2563= 224= 16M Practical consideration: most sequences would have zero frequency → Other methods are needed (Adaptive Huffman Coding) 74