Lecture 24: Priority Queues and Huffman Encoding

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Priority Queues. 2 Priority queue A stack is first in, last out A queue is first in, first out A priority queue is least-first-out The “smallest” element.
Trees Chapter 8.
Huffman Coding: An Application of Binary Trees and Priority Queues
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Version TCSS 342, Winter 2006 Lecture Notes Priority Queues Heaps.
1 TCSS 342, Winter 2005 Lecture Notes Priority Queues and Heaps Weiss Ch. 21, pp
Priority Queues. Priority queue A stack is first in, last out A queue is first in, first out A priority queue is least-first-out –The “smallest” element.
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
Building Java Programs Priority Queues, Huffman Encoding.
CSE 373 Data Structures and Algorithms Lecture 13: Priority Queues (Heaps)
Building Java Programs
1 CSC 427: Data Structures and Algorithm Analysis Fall 2010 transform & conquer  transform-and-conquer approach  balanced search trees o AVL, 2-3 trees,
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
1 HEAPS & PRIORITY QUEUES Array and Tree implementations.
Lecture 06: Tree Structures Topics: Trees in general Binary Search Trees Application: Huffman Coding Other types of Trees.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
CSE 143 Lecture 22 Priority Queues; Huffman Encoding slides created by Marty Stepp, Hélène Martin, and Daniel Otero
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CSE 143 Lecture 23 Priority Queues and Huffman Encoding
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
data ordered along paths from root to leaf
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
1 Heaps and Priority Queues Starring: Min Heap Co-Starring: Max Heap.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
1 Heaps and Priority Queues v2 Starring: Min Heap Co-Starring: Max Heap.
CSE 143 Lecture 24 Priority Queues; Huffman Encoding slides created by Marty Stepp and Daniel Otero
Building Java Programs Priority Queues, Huffman Encoding.
Main Index Contents 11 Main Index Contents Complete Binary Tree Example Complete Binary Tree Example Maximum and Minimum Heaps Example Maximum and Minimum.
CSE 373: Data Structures and Algorithms Lecture 11: Priority Queues (Heaps) 1.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
CSE373: Data Structures & Algorithms
Design & Analysis of Algorithm Huffman Coding
Data Structures and Design in Java © Rick Mercer
HUFFMAN CODES.
Tries 07/28/16 11:04 Text Compression
Priority Queues and Heaps Suggested reading from Weiss book: Ch. 21
Priority Queues An abstract data type (ADT) Similar to a queue
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
The Greedy Method and Text Compression
CSE 143 Lecture 24 Priority Queues; Huffman Encoding
The Greedy Method and Text Compression
Heaps, Priority Queues, Compression
Prioritization problems
Chapter 8 – Binary Search Tree
Part-D1 Priority Queues
CSE 373: Data Structures and Algorithms
Find in a linked list? first last 7  4  3  8 NULL
Priority Queues.
Huffman Coding CSE 373 Data Structures.
Priority Queues An abstract data type (ADT) Similar to a queue
Priority Queues & Heaps
Heaps and Priority Queues
Priority Queues.
CSE 373 Priority queue implementation; Intro to heaps
Priority Queues.
Priority Queues CSE 373 Data Structures.
Podcast Ch23d Title: Huffman Compression
Scoreboard What else might we want to do with a data structure?
Tree (new ADT) Terminology: A tree is a collection of elements (nodes)
Presentation transcript:

Lecture 24: Priority Queues and Huffman Encoding CSE 143 Lecture 24: Priority Queues and Huffman Encoding

Prioritization problems ER scheduling: You are in charge of scheduling patients for treatment in the ER. A gunshot victim should probably get treatment sooner than that one guy with a sore neck, regardless of arrival time. How do we always choose the most urgent case when new patients continue to arrive? print jobs: The CSE lab printers constantly accept and complete jobs from all over the building. Suppose we want them to print faculty jobs before staff before student jobs, and grad students before undergraduate students, etc.? What would be the runtime of solutions to these problems using the data structures we know (list, sorted list, map, set, BST, etc.)?

Inefficient structures list : store jobs in a list; remove min/max by searching (O(N)) problem: expensive to search sorted list : store in sorted list; binary search it in O(log N) time problem: expensive to add/remove (O(N)) binary search tree : store in BST, go right for max in O(log N) problem: tree becomes unbalanced

Priority queue ADT priority queue: a collection of ordered elements that provides fast access to the minimum (or maximum) element priority queue operations: add adds in order; O(log N) worst peek returns minimum value; O(1) always remove removes/returns minimum value; O(log N) worst isEmpty, clear, size, iterator O(1) always

Java's PriorityQueue class public class PriorityQueue<E> implements Queue<E> Queue<String> pq = new PriorityQueue<String>(); pq.add(“Stuart"); pq.add(“Allison"); ... Method/Constructor Description Runtime PriorityQueue<E>() constructs new empty queue O(1) add(E value) adds value in sorted order O(log N ) clear() removes all elements iterator() returns iterator over elements peek() returns minimum element remove() removes/returns min element size() number of elements in queue

Inside a priority queue Usually implemented as a heap, a kind of binary tree. Instead of sorted left  right, it's sorted top  bottom guarantee: each child is greater (lower priority) than its ancestors add/remove causes elements to "bubble" up/down the tree (take CSE 332 or 373 to learn about implementing heaps!) 10 20 80 40 60 85 90 50 99 65

Exercise: Fire the TAs We have decided that novice Tas should all be fired. Write a class TAManager that reads a list of TAs from a file. Find all with  2 quarters experience, and replace them. Print the final list of TAs to the console, sorted by experience. Input format: name quarters Will 3 name quarters Ying 2 name quarters Andrew 1

Priority queue ordering For a priority queue to work, elements must have an ordering in Java, this means implementing the Comparable interface Reminder: public class Foo implements Comparable<Foo> { … public int compareTo(Foo other) { // Return positive, zero, or negative integer }

Homework 8 (Huffman Coding)

File compression compression: Process of encoding information in fewer bits. But isn't disk space cheap? Compression applies to many things: store photos without exhausting disk space reduce the size of an e-mail attachment make web pages smaller so they load faster reduce media sizes (MP3, DVD, Blu-Ray) make voice calls over a low-bandwidth connection (cell, Skype) Common compression programs: WinZip or WinRAR for Windows Stuffit Expander for Mac

ASCII encoding ASCII: Mapping from characters to integers (binary bits). Maps every possible character to a number ('A'  65) uses one byte (8 bits) for each character most text files on your computer are in ASCII format Char ASCII value ASCII (binary) ' ' 32 00100000 'a' 97 01100001 'b' 98 01100010 'c' 99 01100011 'e' 101 01100101 'z' 122 01111010

Huffman encoding Huffman encoding: Uses variable lengths for different characters to take advantage of their relative frequencies. Some characters occur more often than others. If those characters use < 8 bits each, the file will be smaller. Other characters need > 8, but that's OK; they're rare. Char ASCII value ASCII (binary) Hypothetical Huffman ' ' 32 00100000 10 'a' 97 01100001 0001 'b' 98 01100010 01110100 'c' 99 01100011 001100 'e' 101 01100101 1100 'z' 122 01111010 00100011110

Huffman's algorithm The idea: Create a "Huffman Tree" that will tell us a good binary representation for each character. Left means 0, right means 1. example: 'b' is 10 More frequent characters will be "higher" in the tree (have a shorter binary value). To build this tree, we must do a few steps first: Count occurrences of each unique character in the file. Use a priority queue to order them from least to most frequent.

Huffman compression 1. Count the occurrences of each character in file {' '=2, 'a'=3, 'b'=3, 'c'=1, EOF=1} 2. Place characters and counts into priority queue 3. Use priority queue to create Huffman tree  4. Traverse tree to find (char  binary) map {' '=00, 'a'=11, 'b'=10, 'c'=010, EOF=011} 5. For each char in file, convert to compressed binary version 11 10 00 11 10 00 010 1 1 10 011 00

1) Count characters step 1: count occurrences of characters into a map example input file contents: ab ab cab counts array: (in HW8, we do this part for you) byte 1 2 3 4 5 6 7 8 9 char 'a' 'b' ' ' 'c' ASCII 97 98 32 99 binary 01100001 01100010 00100000 01100011

2) Create priority queue step 2: place characters and counts into a priority queue store a single character and its count as a Huffman node object the priority queue will organize them into ascending order

3) Build Huffman tree step 2: create "Huffman tree" from the node counts algorithm: Put all node counts into a priority queue. while P.Q. size > 1: Remove two rarest characters. Combine into a single node with these two as its children.

Build tree example

4) Tree to binary encodings The Huffman tree tells you the binary encodings to use. left means 0, right means 1 example: 'b' is 10 What are the binary encodings of: EOF, ' ', 'c', 'a'? What is the relationship between tree branch height, binary representation length, character frequency, etc.?

5) compress the actual file Based on the preceding tree, we have the following encodings: {' '=00, 'a'=11, 'b'=10, 'c'=010, EOF=011} Using this map, we can encode the file into a shorter binary representation. The text ab ab cab would be encoded as: Overall: 1110001110000101110011, (22 bits, ~3 bytes) Encode.java does this for us using our codes file. How would we go back in the opposite direction (decompress)? char 'a' 'b' ' ' 'c' EOF binary 11 10 00 010 011 byte 1 2 3 char a b a b c a b EOF binary 11 10 00 11 10 00 010 1 1 10 011 00

How do we decompress a file of Huffman-compressed bits? Decompressing How do we decompress a file of Huffman-compressed bits? useful "prefix property" No encoding A is the prefix of another encoding B I.e. never will have x  011 and y  011100110 the algorithm: Read each bit one at a time from the input. If the bit is 0, go left in the tree; if it is 1, go right. If you reach a leaf node, output the character at that leaf and go back to the tree root.

Decompressing Use the tree to decompress a compressed file with these bits: 1011010001101011011 Read each bit one at a time. If it is 0, go left; if 1, go right. If you reach a leaf, output the character there and go back to the tree root. Output: bac aca 1011010001101011011 b a c _ a c a

Public methods to write public HuffmanTree(int[] counts) Given character frequencies for a file, create Huffman tree (Steps 2-3) public void write(PrintStream output) Write mappings between characters and binary to a .code file (Step 4) public HuffmanTree(Scanner input) Reconstruct the tree from a .code file public void decode(BitInputStream in, PrintStream out, int eof) Use the Huffman tree to decode characters

Bit I/O streams Java's input/output streams read/write 1 byte (8 bits) at a time. We want to read/write one single bit at a time. BitInputStream: Reads one bit at a time from input. BitOutputStream: Writes one bit at a time to output. public BitInputStream(String file) Creates stream to read bits from given file public int readBit() Reads a single 1 or 0 public void close() Stops reading from the stream public BitOutputStream(String file) Creates stream to write bits to given file public void writeBit(int bit) Writes a single bit public void close() Stops reading from the stream