Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna

Slides:



Advertisements
Similar presentations
Chapter 9 Greedy Technique. Constructs a solution to an optimization problem piece by piece through a sequence of choices that are: b feasible - b feasible.
Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees
Lecture 4 (week 2) Source Coding and Compression
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Spatial and Temporal Data Mining
Chapter 9: Greedy Algorithms The Design and Analysis of Algorithms.
A Data Compression Algorithm: Huffman Compression
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Data Compression Basics & Huffman Coding
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Communication Technology in a Changing World Week 2.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character.
CSCE350 Algorithms and Data Structure Lecture 19 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Design & Analysis of Algorithm Huffman Coding
HUFFMAN CODES.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Compression & Huffman Codes
Tries 07/28/16 11:04 Text Compression
Chapter 5. Greedy Algorithms
The Greedy Method and Text Compression
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Analysis & Design of Algorithms (CSCE 321)
Advanced Algorithms Analysis and Design
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Data Structure and Algorithms
Greedy Algorithms Alexandra Stefan.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
CSE 589 Applied Algorithms Spring 1999
Presentation transcript:

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna 9.4 Huffman Trees Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna

Encoding In computer technology, encoding is the process of putting a sequence of characters into a special format for transmission or storage purposes. Is the term used to reference to the processes of analog-to-digital conversion, and can be used in the context of any type of data such as text, images, audio, video or multimedia.

Huffman Code Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol.

Huffman Code Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code.

Huffman Code The running time of Huffman's method is fairly efficient, it takes  O(n log n) operations to construct it. A method was later found to design a Huffman code in linear time if input probabilities (also known as weights) are sorted

Huffman Trees Initialize n single-node trees labeled with the symbols from the given alphabet. Record the frequency of each symbol in it’s trees root to indicate the tree’s weight. Repeat these steps until a single tree is obtained. Find two trees with the smallest weight. Make them the left and right subtree of a new tree and record the sum of their weights in the root of the new tree as its weight.

Huffman Trees Example: Construct a Huffman tree from the following data Symbol A B C D _ Frequency 0.4 0.1 0.2 0.15

1.0 In order to generate binary prefix-free codes for each symbol in our alphabet, we label each left edge of our tree with 0 and every right edge with 1. 1 0.6 1 0.25 0.35 B 0.1 D 0.15 _ 0.15 C 0.2 A 0.4 1 1

Huffman Trees Resulting codewords: Symbol A B C D _ Frequency 0.4 0.1 0.2 0.15 Codeword 100 111 101 110 Example : CAB_ is encoded as 1110100110 Average number of bits per symbol = 1 ∙ 0.4 + 3 ∙ 0.1 + 3 ∙ 0.2 + 3 ∙ 0.15 = 1.75 Compression ratio = (3 – 1.75)/3 * 100 = 42% less memory used than fixed-length encoding

Pseudocode HuffmanTree(B[0..n − 1]) Constructs Huffman’s tree Input : An array B[0..n − 1] of weights Output : A Huffman tree with the given weights assigned to its leaves initialize priority queue S of size n with single-node trees and priorities equal to the elements of B[0..n − 1] while S has more than one element do Tl ← the smallest-weight tree in S delete the smallest-weight tree in S Tr ← the smallest-weight tree in S create a new tree T with Tl and Tr as its left and right subtrees and the weight equal to the sum of Tl and Tr weights insert T into S return T

Huffman Encoding Huffman encoding provides the optimal encoding for all codes using individual letters and those corresponding frequencies. Algorithms that use more then that can lead to better encoding but require more analyzing of the file. The idea is that if we have letters that are more frequent than others we represent those with less bits.

Compression Ratio The compression ration is a standard used to compare other ways of coding: fixed bit length - coded bit length fixed bit length This will give the percent of memory used on the encoding compared to fixed encoding(same length code strings) If we wanted to compare to different algorithms would substitute their average bit length instead of fixed * 100

Compression Ratio cont. Event Name Probability Code Length A 0.3 00 2 B 01 C 0.13 100 3 D 0.12 101 E 0.1 110 F 0.05 111 Huffman avg bits = (.3 * 2) + (.3 * 2) + (.13 * 3) + (.12 * 3) + (.1 * 3) + (.05 * 3) = 2.4 Compression Ratio = ((3 - 2.4) / 3) * 100 = 20% This Huffman encoding uses 20% less memory than it’s fixed length implementation. Extensive testing with Huffman have shown that it typically falls between 20%-80% better depending on text.

Real Life Application Huffman trees aren’t just used in encoding- they can be used in any sort of problem involving “yes or no” decision making. “Yes or no” refers to asking multiple questions with only 2 possible answers (i.e. true or false, heads or tails, etc.) By breaking the problem down into a series of yes or no questions, you can build a binary tree out of the possible outcomes. This is called a Decision Tree.

But What Will A Huffman Tree Do? Huffman’s algorithm is designed to create a minimal length path for a tree with a given weighted path length. This means that the binary tree created will have the shortest paths possible. Root Root Parent VS. Parent Leaf Leaf Leaf Leaf To get to a given leaf, the average path length is 2 To get to a given leaf, the average path length is 1.5

So…? A Huffman Tree is essentially an optimal binary tree that gives frequently accessed nodes shorter paths and less frequently accessed nodes longer paths. When applied to a Decision Tree, this means that we will reach the solution, on average, with fewer “questions”. Hence, we solve the problem faster.

Let’s Try it! Consider the problem of guessing which cup a marble is under. There are 4 cups (c1, c2, c3, and c4), and you do not get to see which cup the marble is placed under. One decision tree could be: Is it c1? c1 Is it c2? Is it c3? c2 Average length: (.25)(1) + (.25)(2) + (.25)(3)(2) = 2.25 c4 c3 Guide: No <- Question -> Yes

Let’s Try It! But what if the person, who cannot be truly random, is more likely to put it under c4? Assume that c4 now has a 40% chance, rather than 25%. The other cups now have a 20% chance. Is it c4? Why is this better than the other Tree, which has an average of 2, rather Than this tree’s seemingly 2.25? 0.4 Is it under c3? c4 0.2 Is it c2? c3 Because of c4’s weight, we are more likely to pick c4. Therefore, we save Time and the average weight is (.4)(1) + (.2)(2) + (.4)(3) = 2 0.2 0.2 c1 c2