Building Java Programs Priority Queues, Huffman Encoding.

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Heaps1 Part-D2 Heaps Heaps2 Recall Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is a pair (key, value)
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Priority Queues. 2 Priority queue A stack is first in, last out A queue is first in, first out A priority queue is least-first-out The “smallest” element.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Trees Chapter 8.
Huffman Coding: An Application of Binary Trees and Priority Queues
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Version TCSS 342, Winter 2006 Lecture Notes Priority Queues Heaps.
Binary Trees A binary tree is made up of a finite set of nodes that is either empty or consists of a node called the root together with two binary trees,
1 TCSS 342, Winter 2005 Lecture Notes Priority Queues and Heaps Weiss Ch. 21, pp
Priority Queues. Priority queue A stack is first in, last out A queue is first in, first out A priority queue is least-first-out –The “smallest” element.
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
Building Java Programs Priority Queues, Huffman Encoding.
Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein.
CSE 373 Data Structures and Algorithms Lecture 13: Priority Queues (Heaps)
Priority Queues, Heaps & Leftist Trees
Building Java Programs
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
CSE 143 Lecture 22 Priority Queues; Huffman Encoding slides created by Marty Stepp, Hélène Martin, and Daniel Otero
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Data Compression1 File Compression Huffman Tries ABRACADABRA
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CSE 143 Lecture 23 Priority Queues and Huffman Encoding
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
1 Heaps and Priority Queues Starring: Min Heap Co-Starring: Max Heap.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
1 Heaps and Priority Queues v2 Starring: Min Heap Co-Starring: Max Heap.
CSE 143 Lecture 24 Priority Queues; Huffman Encoding slides created by Marty Stepp and Daniel Otero
Main Index Contents 11 Main Index Contents Complete Binary Tree Example Complete Binary Tree Example Maximum and Minimum Heaps Example Maximum and Minimum.
CSE 373: Data Structures and Algorithms Lecture 11: Priority Queues (Heaps) 1.
1. What is it? It is a queue that access elements according to their importance value. Eg. A person with broken back should be treated before a person.
CS 367 Introduction to Data Structures Lecture 8.
Properties: -The value in each node is greater than all values in the node’s subtrees -Complete tree! (fills up from left to right) Max Heap.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
CSE373: Data Structures & Algorithms Priority Queues
CSE373: Data Structures & Algorithms
Design & Analysis of Algorithm Huffman Coding
Data Structures and Design in Java © Rick Mercer
HUFFMAN CODES.
Assignment 6: Huffman Code Generation
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
The Greedy Method and Text Compression
CSE 143 Lecture 24 Priority Queues; Huffman Encoding
The Greedy Method and Text Compression
Heaps, Priority Queues, Compression
Prioritization problems
Chapter 8 – Binary Search Tree
Part-D1 Priority Queues
CSE 373: Data Structures and Algorithms
Lecture 24: Priority Queues and Huffman Encoding
Priority Queues.
Ch. 8 Priority Queues And Heaps
Huffman Coding CSE 373 Data Structures.
CSE 373 Priority queue implementation; Intro to heaps
Priority Queues.
Podcast Ch23d Title: Huffman Compression
Presentation transcript:

Building Java Programs Priority Queues, Huffman Encoding

2 Prioritization problems Emergency room Gunshot victim should be treated before guy with a sore neck Treat urgent cases first Printing Print faculty jobs before student jobs Print grad student jobs before undergrad jobs Homework Work on things that are due soonest, even if given more recently What would be the runtime of solutions to these problems using the data structures we know (list, sorted list, map, set, BST, etc.)?

3 Inefficient structures List Remove min/max by searching (O(N)) Problem: expensive to search Sorted list Binary search it in O(log N) time Problem: expensive to add/remove (O(N)) Binary search tree Go right for max in O(log N) Problem: tree becomes unbalanced

4 Priority queue ADT priority queue: a collection of ordered elements that provides fast access to the minimum (or maximum) element Useful when we want to deal with things unequally Works like a queue: priority queue operations: add adds in order;O(log N) worst peek returns minimum value;O(1) always remove removes/returns minimum value;O(log N) worst isEmpty, clear, size, iterator O(1) always

5 Java's PriorityQueue class public class PriorityQueue implements Queue Queue pq = new PriorityQueue (); pq.add("Helene"); pq.add("Melissa");... Method/ConstructorDescriptionRuntime PriorityQueue () constructs new empty queue O(1) add( E value) adds value in sorted order O(log N ) clear() removes all elements O(1) iterator() returns iterator over elements O(1) peek() returns minimum element O(1) remove() removes/returns min element O(log N ) size() number of elements in queue O(1)

6 Inside a priority queue Usually implemented as a heap, a kind of binary tree. Instead of sorted left  right, it's sorted top  bottom guarantee: each child is greater (lower priority) than its ancestors add/remove causes elements to "bubble" up/down the tree (take CSE 332 or 373 to learn about implementing heaps!) Another use for this? Heap sort

7 Exercise: Fire the TAs We have decided that novice TAs should all be fired. Write a class TAManager that reads a list of TAs from a file. Find all with  2 quarters experience and fire them. Print the final list of TAs to the console, sorted by experience.

8 Priority queue ordering For a priority queue to work, elements must have an ordering in Java, this means implementing the Comparable interface Reminder: public class Foo implements Comparable { … public int compareTo(Foo other) { // Return positive, zero, or negative integer }

9 Homework 8 (Huffman Coding)

10 File compression compression: Process of encoding information in fewer bits. But isn't disk space cheap? Compression applies to many things: store photos without exhausting disk space reduce the size of an attachment make web pages smaller so they load faster reduce media sizes (MP3, DVD, Blu-Ray) make voice calls over a low-bandwidth connection (cell, Skype) Common compression programs: WinZip or WinRAR for Windows Stuffit Expander for Mac

11 ASCII encoding ASCII: Mapping from characters to integers (binary bits). Maps every possible character to a number ( 'A'  65) uses one byte (8 bits) for each character most text files on your computer are in ASCII format CharASCII valueASCII (binary) ' 'a' 'b' 'c' 'e' 'z'

12 Huffman encoding Huffman encoding: Uses variable lengths for different characters to take advantage of their relative frequencies. Some characters occur more often than others. If those characters use < 8 bits each, the file will be smaller. Other characters need > 8, but that's OK; they're rare. CharASCII valueASCII (binary) Hypothetical Huffman ' 'a' 'b' 'c' 'e' 'z'

13 Compressing text Key insight: characters occur unevenly Common characters should use fewer bits Uncommon characters should use more bits Then average length of a file would decrease How can we come up with these encodings? Hint: for each character we make a sequence of choices (0 or 1), kind of like “yes” or “no” answers in 20 Questions.

14 Huffman's algorithm The idea: Create a "Huffman Tree" that will tell us a good binary representation for each character. Left means 0, right means 1. example: 'b' is 10 More frequent characters will be "higher" in the tree (have a shorter binary value). To build this tree, we must do a few steps first: Count occurrences of each unique character in the file. Use a priority queue to order them from least to most frequent.

15 What you will write HuffmanNode Binary tree node (à la 20 Questions) Each storing a character and a count of its occurrences HuffmanTree Two ways to build a Huffman-based tree Output the Huffman codes to a file Decode a sequence of bits into characters

16 Huffman compression 1. Count the occurrences of each character in file ' '=2, 'a'=3, 'b'=3, 'c'=1, EOF=1 2. Place characters and counts into priority queue 3. Use priority queue to create Huffman tree  4. Traverse tree to find (char  binary) map {' '=00, 'a'=11, 'b'=10, 'c'=010, EOF=011}

17 Make code: “a dad cab” i counts [0,0,0,...,2,..., 3, 1, 1, 2,..., 0, 0]

18 Output encoding | 9 | / \ | 4 | | 5 | / \ / \ | 2 | | 2 || 2 | | 3 | ‘ ‘ d / \ a | 1 | | 1 | b c

19 Encoding For each character in the file Determine its binary Huffman encoding Output the bits to an output file Already implemented for you Problem: how does one read and write bits?

20 Bit I/O streams Java's input/output streams read/write 1 byte (8 bits) at a time. We want to read/write one single bit at a time. BitInputStream : Reads one bit at a time from input. BitOutputStream : Writes one bit at a time to output. public BitInputStream(String file) Creates stream to read bits from given file public int readBit() Reads a single 1 or 0 public void close() Stops reading from the stream public BitOutputStream(String file) Creates stream to write bits to given file public void writeBit(int bit) Writes a single bit public void close() Stops reading from the stream

21 Encode (you don’t do this) a d a d c a b 32 (‘ ‘) (d) (b) (c) (a) 11

22 EOF We need a special character to say “STOP” Otherwise, we may read extra characters (can only write whole bytes – 8 bits – at a time) We call this the EOF – end-of-file character Add to the tree manually when we construct from the int[] counts What value will it have? Can’t represent any existing character pq --> | 1 | | 1 | | 1 | | 2 | | 2 | | 3 | b c eof ‘ ‘ d a

23 Tree with EOF | 10| / \ 1 / \ | 4 | | 6 | / \ 1 0 / \ | 2 | | 2 | | 3 | | 3 | ‘ ‘ d 0 / \ 1 a | 1 | | 2 | eof 0 / \ | 1 | | 1 | b c 32 (‘ ‘) (d) (eof) (b) (c) (a) 11

24 Remaking the tree 32 (‘ ‘) (d) (eof) (b) (c) (a) 11

25 Decompressing How do we decompress a file of Huffman-compressed bits? useful "prefix property" No encoding A is the prefix of another encoding B I.e. never will have x  011 and y  the algorithm: Read each bit one at a time from the input. If the bit is 0, go left in the tree; if it is 1, go right. If you reach a leaf node, output the character at that leaf and go back to the tree root.

26 Decoding | 10| / \ 1 / \ | 4 | | 6 | / \ 1 0 / \ | 2 | | 2 | | 3 | | 3 | ‘ ‘ d 0 / \ 1 a | 1 | | 2 | eof 0 / \ | 1 | | 1 | b c

27 Public methods to write public HuffmanTree(int[] counts) Given character chounts for a file, create Huffman tree public void write(PrintStream output) Write the character-encoding pairs to the output file public HuffanTree(Scanner input) Reconstruct the Huffman tree from a code file public void decode (BitInputStream input, PrintStream output, int eof) Use Huffman tree to decompress the input into the given output