Homework #5 New York University Computer Science Department Data Structures Fall 2008 Eugene Weinstein.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Greedy Algorithms (Huffman Coding)
Problem: Huffman Coding Def: binary character code = assignment of binary strings to characters e.g. ASCII code A = B = C =
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Alford Academy Business Education and Computing1 Advanced Higher Computing Based on Heriot-Watt University Scholar Materials File Handling.
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
1 Assignment 2: (Due at 10:30 a.m on Friday of Week 10) Question 1 (Given in Tutorial 5) Question 2 (Given in Tutorial 7) If you do Question 1 only, you.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
A Data Compression Algorithm: Huffman Compression
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
6/26/2015 7:13 PMTries1. 6/26/2015 7:13 PMTries2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3) Huffman encoding.
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
1 Binary Search Trees (Continued) Study Project 3 Solution Balanced Binary Search Trees Balancing Operations Reading: L&C 11.1 – 11.4.
Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein.
Homework Reading –Finish K&R Chapter 1 (if not done yet) –Start K&R Chapter 2 for next time. Programming Assignments –DON’T USE and string library functions,
Huffman code uses a different number of bits used to encode characters: it uses fewer bits to represent common characters and more bits to represent rare.
CSE Lectures 22 – Huffman codes
1 Project 7: Huffman Code. 2 Extend the most recent version of the Huffman Code program to include decode information in the binary output file and use.
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
Introduction to Programming Prof. Rommel Anthony Palomino Department of Computer Science and Information Technology Spring 2011.
Guide to Assignment 3 Programming Tasks 1 CSE 2312 Computer Organization and Assembly Language Programming Vassilis Athitsos University of Texas at Arlington.
Approaches to Problem Solving greedy algorithms dynamic programming backtracking divide-and-conquer.
Chapter 5: Data Input and Output Department of Computer Science Foundation Year Program Umm Alqura University, Makkah Computer Programming Skills
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
Data Compression1 File Compression Huffman Tries ABRACADABRA
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Data Structures Week 6: Assignment #2 Problem
The basics of the array data structure. Storing information Computer programs (and humans) cannot operate without information. Example: The array data.
BUILDING JAVA PROGRAMS CHAPTER 7 Arrays. Exam #2: Chapters 1-6 Thursday Dec. 4th.
CS 206 Introduction to Computer Science II 09 / 10 / 2009 Instructor: Michael Eckmann.
Pointers OVERVIEW.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Advanced Computer Science Lesson 4: Reviewing Loops and Arrays Reading User Input.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
Additive White Gaussian Noise
Dynamic programming vs Greedy algo – con’t Input: Output: Objective: a number W and a set of n items, the i-th item has a weight w i and a cost c i a subset.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Bahareh Sarrafzadeh 6111 Fall 2009
Main Index Contents 11 Main Index Contents Complete Binary Tree Example Complete Binary Tree Example Maximum and Minimum Heaps Example Maximum and Minimum.
Guide to Assignment 3 and 4 Programming Tasks 1 CSE 2312 Computer Organization and Assembly Language Programming Vassilis Athitsos University of Texas.
1Computer Sciences Department. 2 Advanced Design and Analysis Techniques TUTORIAL 7.
Higher Computing Science 2016 Prelim Revision. Topics to revise Computational Constructs parameter passing (value and reference, formal and actual) sub-programs/routines,
بسم الله الرحمن الرحيم My Project Huffman Code. Introduction Introduction Encoding And Decoding Encoding And Decoding Applications Applications Advantages.
Recitation Nov. 15. HW5: Huffman Encoding/Decoding Task: –Read a text file (i.e. “message.text”) and figure out character frequencies. Don’t forget ‘\n’
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Sets and Maps Chapter 9.
Greedy Algorithms Alexandra Stefan.
Tries 07/28/16 11:04 Text Compression
Assignment 6: Huffman Code Generation
Tries 5/27/2018 3:08 AM Tries Tries.
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
The Greedy Method and Text Compression
Chapter 17 Binary I/O Dr. Clincy - Lecture.
Topics Introduction to File Input and Output
Chapter 11 Data Compression
CS2011 Introduction to Programming I Arrays (I)
Greedy Algorithms Alexandra Stefan.
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Homework Reading Programming Assignments Finish K&R Chapter 1
Heaps and Priority Queues
Sets and Maps Chapter 9.
Topics Introduction to File Input and Output
Topics Introduction to File Input and Output
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Homework #5 New York University Computer Science Department Data Structures Fall 2008 Eugene Weinstein

Homework #4 Review Huffman coding is a variable-length binary encoding for text We implemented Huffman's optimal code finding algorithm (book ) o Builds tree representing shortest possible code Input for HW#4: letters, frequencies: o A 20 E Construct Huffman tree Navigate tree to find code: o c: 0, a: 10, b: 11

Homework #5 Overview Given a document o Calculate letter frequencies o Construct Huffman code o Encode document o Calculate memory savings of Huffman binary encoding vs 8-bit ASCII o Correctly decode document We can use Huffman code building algorithm from HW#4 o So we will keep HuffmanTree and HuffmanNode

Organization The new code for this assignment should go into HuffmanConverter.java o The filename of file to encode is passed as a parameter on the command line o So if my file is foo.txt, I should be able to run  java HuffmanConverter foo.txt o Then foo.txt show up in args[0] o If you use an IDE, specify command-line options through the menus Test inputs and outputs linked from assignment page (2007 version)2007 version

HuffmanConverter Instance Vars String contents - stores file to process o Lines are separated by '\n' - line break character o e.g., twoLines = line1 + '\n' + line2; HuffmanTree huffmanTree - output of HW4 int count[] - frequencies in input file o Indexed on ASCII value of characters, e.g., count[(int)'a'] is frequency of 'a' String code[] - binary string per character o Also indexed on ASCII value, e.g., code[(int)'a'] == "10001"

To Implement readContents() - reads in a file and stores in String contents recordFrequencies() - process file stored in contents and store frequencies in count[] frequenciesToTree() - use HW4 code to produce Huffman tree treeToCode() - slight modification of HW4: traverse Huffman tree and populate code[] encodeMessage() - use code[] to encode decodeMessage() - use inverse of code[]

Implementation Notes readContents() can use Scanner o Read a line at a time, and append to contents inserting '\n' to separate lines recordFrequencies(): iterate over contents one character at a time frequenciesToTree() o Very similar to main() method of HW4 o Create a BinaryHeap object o For every non-zero-count letter, create a HuffmanNode object, insert into heap o Then run Huffman algorithm

Implementation Notes, Cont'd treeToCode() o Similar to printCode() of HW4 o Instead of printing code, store in code[] encodeMessage() o For each character of contents, look up its binary string in code[], append

Implementation Notes, Cont'd decodeMessage() o Need to implement inverse mapping of code[]: binary strings to characters o Several possible implementations  Traverse Huffman tree as you read binary string, output character when you reach a leaf  Build HashMap mapping strings to ASCII values of characters

HashMap An array maps integers to Objects o e.g., String args[]: args[i] returns ith String A HashMap maps Objects to Objects Access with put() and get(), e.g., o HashMap ids = new HashMap(); o ids.put("Alice", ); o ids.put("Ben", ); o int id = (Integer) ids.get("Alice"); o // id gets For decode, map bit Strings to characters

Homework #5 Tips Keep checking intermediate results Make use of sample outputs herehere Print out intermediate results! You might need special cases for newline ('\n') Your encoding might differ from the examples o Depends on the BinaryHeap implementation o Same-frequency items are returned in arbitrary order (e.g., in love_poem_58, 'N', '-', '.', 'W', and 'p' all have frequency one) However, Huffman encoding length must match! o Guaranteed to be shortest-length encoding