© Jalal Kawash 2010 Trees & Information Coding: 3 Peeking into Computer Science.

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Lecture04 Data Compression.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Huffman Coding: An Application of Binary Trees and Priority Queues
Optimal Merging Of Runs
A Data Compression Algorithm: Huffman Compression
Lecture 6: Greedy Algorithms I Shang-Hua Teng. Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
© Jalal Kawash 2010 Trees & Information Coding Peeking into Computer Science.
© Jalal Kawash 2010 Introduction Peeking into Computer Science 1.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
© Jalal Kawash 2010 Graphs Peeking into Computer Science.
© Jalal Kawash 2010 Graphs Peeking into Computer Science.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
CSCE350 Algorithms and Data Structure Lecture 19 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CPS 100, Spring Huffman Coding l D.A Huffman in early 1950’s l Before compressing data, analyze the input stream l Represent data using variable.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
D ESIGN & A NALYSIS OF A LGORITHM 12 – H UFFMAN C ODING Informatics Department Parahyangan Catholic University.
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
Compression & Huffman Codes
Algorithms for iSNE Dr. Kenneth Cosh Week 13.
ISNE101 – Introduction to Information Systems and Network Engineering
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 9: Huffman Codes
Math 221 Huffman Codes.
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Data Structure and Algorithms
Greedy Algorithms Alexandra Stefan.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

© Jalal Kawash 2010 Trees & Information Coding: 3 Peeking into Computer Science

© Jalal Kawash 2010Peeking into Computer Science Reading Assignment Mandatory: Chapter 3 – Section 3.5 2

Information Coding An application of trees 3

© Jalal Kawash 2010Peeking into Computer Science Objectives At the end of this section, the student will be able to: 1. Understand the need for variable-length coding (VLC) 2. Understand how compression works 3. Calculate the space savings with VLC 4. Understand decoding problems with VLC 5. Define non-prefix VLC 6. Use Huffman’s algorithm and binary trees to assign optimized non-prefix VLC 7. Represent BL trees as nested lists

© Jalal Kawash 2010Peeking into Computer Science Back to Coding (JT: Review) Assume we have a file that contains data composed of 6 symbols only: A, I, C, D, E, and S (for space) ACE DICE AIDE CAID EAD DAICED … 5

© Jalal Kawash 2010Peeking into Computer Science Back to Coding Assume we have a file that contains data composed of 6 symbols only: A, I, C, D, E, and S (for space) ACESDICESAIDESCAID EADSDAICED … 6

© Jalal Kawash 2010Peeking into Computer Science Coding If the file has 1000 characters, how many bits (0s and 1s) are needed to code the file? 7

© Jalal Kawash 2010Peeking into Computer Science Coding (JT: Review) The first question is How many symbols do we need to represent each character? The objective is to keep the size of the file as small as possible We have 6 characters (messages) and two symbols (0 and 1) 2 is not enough, since 2 2 is 4 8

© Jalal Kawash 2010Peeking into Computer Science 2 bits are not enough (JT: Review) 00 for A 01 for S 10 for I 11 for E We cannot represent the rest C and D 3 works, since 2 3 is 8, so we can represent up to 8 characters and we only have 6 9

© Jalal Kawash 2010Peeking into Computer Science 3 bits are more than enough (JT: Review) Say 000 for A 001 for S 010 for I 011 for E 100 for C 101 for D 110 not used 111 not used 10

© Jalal Kawash 2010Peeking into Computer Science Coding (JT: Review) If the file has 1000 characters, how many bits (0s and 1s) are needed to code the file? Each character needs 3 bits Hence, we need 3x1000 = 3000 bits 11

© Jalal Kawash 2010Peeking into Computer Science Compression How does file compression work? Huffman’s codes reduce the size of a file by using variable-length codes for characters 12

© Jalal Kawash 2010Peeking into Computer Science Compression Analyzing the file we find that 35% are S 28% of the characters are A 20% are E 7% are I 6% are C 4% are D Some characters are more frequent than others 13

© Jalal Kawash 2010Peeking into Computer Science Variable-Length Codes To reduce the size of the file, we can use shorter codes for frequent characters ◦We can use longer codes for infrequent characters 35% are S (use 2 bits for S) 28% are A (use 2 bits for A) 20% are E (use 2 bits for E) 7% are I (use 3 bits for I) 6% are C (use 4 bits for C) 4% are D (use 4 bits for D) 14

© Jalal Kawash 2010Peeking into Computer Science Compressed File Size If we use this coding, what is the size of the file? 350 S’s (35% of 1000) require 700 bits (2 bits for each S) 200 E’s require 400 bits (2 bits) 280 A’s require 560 bits (2 bits) 70 I’s require 210 bits (3 bits) 60 C’s require 240 bits (4 bits) 40 D’s require 160 bits (4 bits) Total is 2270 bits Recall that with fixed codes, the size is 3000 bits Compressed file size is about 76% of original size 15

© Jalal Kawash 2010Peeking into Computer Science Problems with Variable-Length Codes Not any variable-length code works Assume A’s code is 0 C’s code is 1 E’s code is 01 The code 0101 could correspond to ACE, EAC, ACAC, or EE 16

© Jalal Kawash 2010Peeking into Computer Science Prefix Codes Codes that work must have the property: No code can be the prefix of another code Called Non-Prefix Codes 0 is a prefix of 01, this is why our coding failed Another example: 0101 is a prefix of

© Jalal Kawash 2010Peeking into Computer Science Non-Prefix Codes Non-Prefix codes can be generated using a binary tree Start from a binary tree Label edges to left children with 0 Label edges to right children with 1 Record the labels on the path from the root to the leaves Each path corresponds to a non-prefix code 18

© Jalal Kawash 2010Peeking into Computer Science Non-Prefix Codes from a Binary Tree

© Jalal Kawash 2010Peeking into Computer Science Non-Prefix Codes from a Binary Tree A’s code: 00 B’s 0100 C’s 0101 D’s 011 E’s 10 F’s 11 EF A D CB

© Jalal Kawash 2010Peeking into Computer Science No code is a prefix of another EF A D CB

© Jalal Kawash 2010Peeking into Computer Science No Confusion A:00, B:0100, C:0101, D:011, E:10, F: No other interpretation ◦Parsing left to right ◦(JT’s extra: parsing refers to ‘reading’ or ‘breaking into meaningful portions’) BAD 22

© Jalal Kawash 2010Peeking into Computer Science No Confusion A:00, B:0100, C:0101, D:011, E:10, F: So How do we generate such codes? FEDE 23

© Jalal Kawash 2010Peeking into Computer Science Huffman’s Coding Build a binary tree The characters in a file are the leaves The most frequent characters should be closer to the root, generating shorter codes Assign the codes, based on this tree 24

© Jalal Kawash 2010Peeking into Computer Science Huffman's Coding – Step 1 1. Assign to each symbol its weight (frequency) Each of these is a tree of size one! ACDESI

© Jalal Kawash 2010Peeking into Computer Science Huffman's Coding – Step 2 2. Choose two trees that have the minimum weights ◦Replace these two trees with a new tree with new root ◦Make the tree with the smaller weight a right child ◦The weight of the new tree is the sum of old weights (JT: next slide) ACDESI DC New tree New tree weight 6+4 = 10 26

© Jalal Kawash 2010Peeking into Computer Science Huffman's Coding – Step 2 2. Choose two trees that have the minimum weights ◦Replace these two trees with a new tree with new root ◦Make the tree with the smaller weight a right child ◦The weight of the new tree is the sum of old weights AESI DC 10 27

© Jalal Kawash 2010Peeking into Computer Science Huffman's Coding – Repeat Repeat Step 2 until we have a single tree AESI DC 10 28

© Jalal Kawash 2010Peeking into Computer Science Step 2 Again 2. Choose two trees that have the minimum weights ◦Replace these two trees with a new tree with new root ◦Make the tree with the smaller weight a right child ◦The weight of the new tree is the sum of old weights AESI DC 10 DC I 29

© Jalal Kawash 2010Peeking into Computer Science Step 2 Again 2. Choose two trees that have the minimum weights ◦Replace these two trees with a new tree with new root ◦Make the tree with the smaller weight a right child ◦The weight of the new tree is the sum of old weights AESI DC 10 I DC New tree weight 10+7 = 17 30

© Jalal Kawash 2010Peeking into Computer Science A S E 20 I DC 17 31

© Jalal Kawash 2010Peeking into Computer Science Step 2 Again A S E 20 I DC 17 32

© Jalal Kawash 2010Peeking into Computer Science S 35 A 28 E 20 I DC 17 33

© Jalal Kawash 2010Peeking into Computer Science S 35 A 28 E I DC Weight is = 37 E

© Jalal Kawash 2010Peeking into Computer Science S 35 A 28 E I DC 37 Weight is = 37 35

© Jalal Kawash 2010Peeking into Computer Science S 35 A 28 E I DC 37 Weight is =

© Jalal Kawash 2010Peeking into Computer Science S A E I DC

© Jalal Kawash 2010Peeking into Computer Science S A E I DC

© Jalal Kawash 2010Peeking into Computer Science S A E I DC 39

© Jalal Kawash 2010Peeking into Computer Science 3. Label left edges with 0 and right edges with 1 Step 3 SA E I DC

© Jalal Kawash 2010Peeking into Computer Science Read the labels on the path from the root to each leaf, this is its code SA E I DC

© Jalal Kawash 2010Peeking into Computer Science A’s code is 01 SA E I DC

© Jalal Kawash 2010Peeking into Computer Science A’s code is 01 C’s is 1100 S’s is 00 E’s is 10 D’s is1101 I’s is 111 SA E I DC

© Jalal Kawash 2010Peeking into Computer Science More Frequent Characters have shorter codes 35% are S (00, 2 bits) 28% are A (01, 2 bits) 20% are E (10, 2 bits) 7% are I (111, 3 bits) 6% are C (1100, 4 bits) 4% are D (1101, 4 bits) 44

© Jalal Kawash 2010Peeking into Computer Science Recall This Analysis? If we use this coding, what is the size of the file? 350 S’s (35% of 1000) require 700 bits (2 bits for each S) 280 A’s require 560 bits 200 E’s require 400 bits 70 I’s require 210 bits 60 C’s require 240 bits 40 D’s require 160 bits Total is 2270 bits Recall that with fixed codes, the size is 3000 bits Compressed file size is about 76% of original size 45

© Jalal Kawash 2010Peeking into Computer Science David Huffman US Electrical Engineer Contributions in coding theory, signal design for radar and communications, and logic circuits He wrote his coding algorithm as a graduate students at MIT 46

© Jalal Kawash 2010Peeking into Computer Science Concluding Notes Even though coding gave C and D codes of length 4 (compared to 3 in fixed coding), it was beneficial No code generated by Huffman’s method can be a prefix of another code Many compression tools use a combination of different coding methods, Huffman’s is among them 47

© Jalal Kawash 2010Peeking into Computer Science JT’s Extra: Uses Of Compression Image and video files Hi-res: JPEG 322 KB Hi-res: TIF 14.1 MB Images: Courtesy of James Tam