Download presentation
Presentation is loading. Please wait.
Published byBeverly Haynes Modified over 9 years ago
1
Homework #5 New York University Computer Science Department Data Structures Fall 2008 Eugene Weinstein
2
Homework #4 Review Huffman coding is a variable-length binary encoding for text We implemented Huffman's optimal code finding algorithm (book 389-395) o Builds tree representing shortest possible code Input for HW#4: letters, frequencies: o A 20 E 24... Construct Huffman tree Navigate tree to find code: o c: 0, a: 10, b: 11
3
Homework #5 Overview Given a document o Calculate letter frequencies o Construct Huffman code o Encode document o Calculate memory savings of Huffman binary encoding vs 8-bit ASCII o Correctly decode document We can use Huffman code building algorithm from HW#4 o So we will keep HuffmanTree and HuffmanNode
4
Organization The new code for this assignment should go into HuffmanConverter.java o The filename of file to encode is passed as a parameter on the command line o So if my file is foo.txt, I should be able to run java HuffmanConverter foo.txt o Then foo.txt show up in args[0] o If you use an IDE, specify command-line options through the menus Test inputs and outputs linked from assignment page (2007 version)2007 version
5
HuffmanConverter Instance Vars String contents - stores file to process o Lines are separated by '\n' - line break character o e.g., twoLines = line1 + '\n' + line2; HuffmanTree huffmanTree - output of HW4 int count[] - frequencies in input file o Indexed on ASCII value of characters, e.g., count[(int)'a'] is frequency of 'a' String code[] - binary string per character o Also indexed on ASCII value, e.g., code[(int)'a'] == "10001"
6
To Implement readContents() - reads in a file and stores in String contents recordFrequencies() - process file stored in contents and store frequencies in count[] frequenciesToTree() - use HW4 code to produce Huffman tree treeToCode() - slight modification of HW4: traverse Huffman tree and populate code[] encodeMessage() - use code[] to encode decodeMessage() - use inverse of code[]
7
Implementation Notes readContents() can use Scanner o Read a line at a time, and append to contents inserting '\n' to separate lines recordFrequencies(): iterate over contents one character at a time frequenciesToTree() o Very similar to main() method of HW4 o Create a BinaryHeap object o For every non-zero-count letter, create a HuffmanNode object, insert into heap o Then run Huffman algorithm
8
Implementation Notes, Cont'd treeToCode() o Similar to printCode() of HW4 o Instead of printing code, store in code[] encodeMessage() o For each character of contents, look up its binary string in code[], append
9
Implementation Notes, Cont'd decodeMessage() o Need to implement inverse mapping of code[]: binary strings to characters o Several possible implementations Traverse Huffman tree as you read binary string, output character when you reach a leaf Build HashMap mapping strings to ASCII values of characters
10
HashMap An array maps integers to Objects o e.g., String args[]: args[i] returns ith String A HashMap maps Objects to Objects Access with put() and get(), e.g., o HashMap ids = new HashMap(); o ids.put("Alice", 123456789); o ids.put("Ben", 321654987); o int id = (Integer) ids.get("Alice"); o // id gets 123456789 For decode, map bit Strings to characters
11
Homework #5 Tips Keep checking intermediate results Make use of sample outputs herehere Print out intermediate results! You might need special cases for newline ('\n') Your encoding might differ from the examples o Depends on the BinaryHeap implementation o Same-frequency items are returned in arbitrary order (e.g., in love_poem_58, 'N', '-', '.', 'W', and 'p' all have frequency one) However, Huffman encoding length must match! o Guaranteed to be shortest-length encoding
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.