CSE 143 Lecture 24 Priority Queues; Huffman Encoding slides created by Marty Stepp and Daniel Otero

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Priority Queues. 2 Priority queue A stack is first in, last out A queue is first in, first out A priority queue is least-first-out The “smallest” element.
Trees Chapter 8.
Huffman Coding: An Application of Binary Trees and Priority Queues
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Version TCSS 342, Winter 2006 Lecture Notes Priority Queues Heaps.
1 TCSS 342, Winter 2005 Lecture Notes Priority Queues and Heaps Weiss Ch. 21, pp
Priority Queues. Priority queue A stack is first in, last out A queue is first in, first out A priority queue is least-first-out –The “smallest” element.
Priority Queues. Priority queue A stack is first in, last out A queue is first in, first out A priority queue is least-first-out –The “smallest” element.
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
Building Java Programs Priority Queues, Huffman Encoding.
CSE 373 Data Structures and Algorithms Lecture 13: Priority Queues (Heaps)
Priority Queues, Heaps & Leftist Trees
Building Java Programs
1 CSC 427: Data Structures and Algorithm Analysis Fall 2010 transform & conquer  transform-and-conquer approach  balanced search trees o AVL, 2-3 trees,
CSE Lectures 22 – Huffman codes
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
1 HEAPS & PRIORITY QUEUES Array and Tree implementations.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Trees. Tree Terminology Chapter 8: Trees 2 A tree consists of a collection of elements or nodes, with each node linked to its successors The node at the.
CSE 143 Lecture 22 Priority Queues; Huffman Encoding slides created by Marty Stepp, Hélène Martin, and Daniel Otero
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Data Structures Week 6: Assignment #2 Problem
Information and Computer Sciences University of Hawaii, Manoa
CSE 143 Lecture 23 Priority Queues and Huffman Encoding
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
data ordered along paths from root to leaf
1 Heaps and Priority Queues Starring: Min Heap Co-Starring: Max Heap.
Priority Queues Dr. David Matuszek
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
1 Heaps and Priority Queues v2 Starring: Min Heap Co-Starring: Max Heap.
Building Java Programs Priority Queues, Huffman Encoding.
Main Index Contents 11 Main Index Contents Complete Binary Tree Example Complete Binary Tree Example Maximum and Minimum Heaps Example Maximum and Minimum.
Chapter 13 Priority Queues. 2 Priority queue A stack is first in, last out A queue is first in, first out A priority queue is least-in-first-out The “smallest”
CSE 373: Data Structures and Algorithms Lecture 11: Priority Queues (Heaps) 1.
1. What is it? It is a queue that access elements according to their importance value. Eg. A person with broken back should be treated before a person.
CS 367 Introduction to Data Structures Lecture 8.
Properties: -The value in each node is greater than all values in the node’s subtrees -Complete tree! (fills up from left to right) Max Heap.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
Design & Analysis of Algorithm Huffman Coding
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
Tries 07/28/16 11:04 Text Compression
Priority Queues and Heaps Suggested reading from Weiss book: Ch. 21
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
CSE 143 Lecture 24 Priority Queues; Huffman Encoding
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Prioritization problems
Chapter 8 – Binary Search Tree
Part-D1 Priority Queues
CSE 373: Data Structures and Algorithms
Lecture 24: Priority Queues and Huffman Encoding
Priority Queues.
Huffman Coding CSE 373 Data Structures.
Priority Queues & Heaps
Heaps and Priority Queues
Priority Queues.
CSE 373 Priority queue implementation; Intro to heaps
Priority Queues.
Podcast Ch23d Title: Huffman Compression
Presentation transcript:

CSE 143 Lecture 24 Priority Queues; Huffman Encoding slides created by Marty Stepp and Daniel Otero

2 Prioritization problems The CSE lab printers constantly accept and complete jobs from all over the building. Suppose we want them to print faculty jobs before staff before student jobs, and grad students before undergraduate students, etc.? You are in charge of scheduling patients for treatment in the ER. A gunshot victim should probably get treatment sooner than that one guy with a sore neck, regardless of arrival time. How do we always choose the most urgent case when new patients continue to arrive? Why can't we solve these problems efficiently with the data structures we have (list, sorted list, map, set, BST, etc.)?

3 Some poor choices list : store customers/jobs in a list; remove min/max by searching (O(N)) –problem: expensive to search sorted list : store in sorted list; binary search it in O(log N) time –problem: expensive to add/remove (O(N)) binary search tree : store in BST, search in O(log N) time for min element –problem: tree could be unbalanced 

4 Priority queue ADT priority queue: a collection of ordered elements that provides fast access to the minimum (or maximum) element –usually implemented using a tree structure called a heap priority queue operations: –add adds in order;O(log N) worst –peek returns minimum value;O(1) always –remove removes/returns minimum value;O(log N) worst –isEmpty, clear, size, iterator O(1) always

5 Java's PriorityQueue class public class PriorityQueue implements Queue Queue pq = new PriorityQueue (); pq.add("Stuart"); pq.add("Marty");... Method/ConstructorDescriptionRuntime PriorityQueue () constructs new empty queue O(1) add( E value) adds value in sorted order O(log N ) clear() removes all elements O(1) iterator() returns iterator over elements O(1) peek() returns minimum element O(1) remove() removes/returns min element O(log N )

6 Inside a priority queue Usually implemented as a heap: a kind of binary tree. Instead of sorted left  right, it's sorted top  bottom –guarantee: each child is greater (lower priority) than its ancestors –add/remove causes elements to "bubble" up/down the tree –(take CSE 332 or 373 to learn about implementing heaps!)

7 Exercise: Firing Squad We have decided that TA performance is unacceptably low. –We must fire all TAs with  2 quarters of experience. Write a class FiringSquad. –Its main method should read a list of TAs from a file, find all with sub-par experience, and fire/replace them. –Print the final list of TAs to the console, sorted by experience. –Input format: name quarters Lisa 0 name quarters Kasey 5 name quarters Stephanie 2

8 Priority queue ordering For a priority queue to work, elements must have an ordering –in Java, this means implementing the Comparable interface Reminder: public class Foo implements Comparable { … public int compareTo(Foo other) { // Return positive, zero, or negative number... }

9 Homework 8 (Huffman Tree)

10 File compression compression: Process of encoding information in fewer bits. –But isn't disk space cheap? Compression applies to many things: –store photos without exhausting disk space –reduce the size of an attachment –make web pages smaller so they load faster –reduce media sizes (MP3, DVD, Blu-Ray) –make voice calls over a low-bandwidth connection (cell, Skype) Common compression programs: –WinZip or WinRAR for Windows –Stuffit Expander for Mac

11 ASCII encoding ASCII: Mapping from characters to integers (binary bits). –Maps every possible character to a number ( 'A'  65) –uses one byte (8 bits) for each character –most text files on your computer are in ASCII format CharASCII valueASCII (binary) ' 'a' 'b' 'c' 'e' 'z'

12 Huffman Encoding Huffman encoding: Uses variable lengths for different characters to take advantage of their relative frequencies. –Some characters occur more often than others. If those characters used < 8 bits each, the file would be smaller. –Other characters need > 8, but that's OK; they're rare. CharASCII valueASCII (binary) Hypothetical Huffman ' 'a' 'b' 'c' 'e' 'z'

13 Huffman compression The following major steps can be used to compress a file: 1. Count the occurrences of each character in the file. 2. Place the characters and their counts into a priority queue. 3. Use the priority queue to create a special binary tree called a Huffman tree. 4. Traverse the Huffman tree to find a (character -> binary) mapping from each character to its binary encoding. 5. For each character in the original file, convert it into its compressed binary encoding.

14 1) Counting characters step 1: count occurrences of characters into a map (provided) –example input file contents: ab ab cab counts: {' '=2, 'a'=3, 'b'=3, 'c'=1, EOF=1} –every file actually ends with a single invisible character indicating the end-of-file (EOF) byte char 'a''b'' 'a''b'' 'c''a''b'EOF ASCII binary N/A

15 2) Create priority queue step 2: place characters and counts into a priority queue –store a single character and its count as a Huffman node object –the priority queue will organize them into ascending order

16 3) Build Huffman tree step 2: create "Huffman tree" from the node counts algorithm: Put all node counts into a priority queue. while P.Q. size > 1: –Remove two rarest characters. –Combine into a single node with these two as its children.

17 Build tree example

18 4) Tree -> binary encodings The Huffman tree you have created tells you the compressed binary encoding to use for every character in the file. –left branches mean 0, right branches mean 1 –example: 'b' is 10 –What are the binary encodings of: EOF, ' ', 'b', 'a' ?

19 5) compress the actual file Based on the preceding tree, we have the following encodings: {' '=00, 'a'=11, 'b'=10, 'c'=010, EOF=011} –Using this map, we can encode the file into a shorter binary representation. The text ab ab cab would be encoded as: –Overall: , (22 bits, ~3 bytes) char 'a''b'' 'a''b'' 'c''a''b'EOF binary byte 123 char a b ab c a b EOF binary

20 Decompressing How do we decompress a file of Huffman-compressed bits? useful "prefix property" –No encoding A is the prefix of another encoding B –I.e. never will have x  011 and y  the algorithm: –Read each bit one at a time from the input. –If the bit is 0, go left in the tree; if it is 1, go right. –If you reach a leaf node, output the character at that leaf and go back to the tree root.

21 Decompressing Use the tree to decompress a compressed file with these bits: the algorithm: –Read each bit one at a time. –If it is 0, go left; if 1, go right. –If you reach a leaf, output the character there and go back to the tree root. Output: bac aca

22 Public methods to write HuffmanTree(Map counts) –Given a Map of counts/char in a file, creates its Huffman tree. public Map createEncodings() –Traverses Huffman tree to produce a mapping from each character in the tree to its binary representation. Example: {' '=010, 'a'=11, 'b'=00, 'd'=011, 'n'=10} public void compress(InputStream in, BitOutputStream out) throws IOException –Reads text data from given input file and uses encodings to write a compressed version of the data to the given output file public public void decompress(BitInputStream in, OutputStream out) throws IOException –Reads compressed binary data from given input and uses Huffman tree to write decompressed version to the output.

23 Provided Bit I/O Streams Java's input/output streams read/write 1 byte (8 bits) at a time. –We want to read/write one single bit at a time. BitInputStream : Reads one bit at a time from input. BitOutputStream : Writes one bit at a time to output. public BitInputStream(InputStream in) Creates stream to read bits from given input public int readBit() Reads a single 1 or 0; returns -1 at end of file public boolean hasNextBit() Returns true iff another bit can be read public void close() Stops reading from the stream public BitOutputStream(OutputStream out) Creates stream to write bits to given output public void writeBit(int bit) Writes a single bit public void writeBits(String bits) Treats each character of the given string as a bit ( '0' or '1' ) and writes each of those bits to the output public void close() Stops reading from the stream