15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 12, 2004 More LZW / Practicum.

Slides:



Advertisements
Similar presentations
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Advertisements

ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Practicum 2: - Asymptotics - List and Tree Structures Fundamental Data Structures and Algorithms Klaus Sutner Feb. 5, 2004.
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
ICS103 Programming in C Lecture 1: Overview of Computers & Programming
1 COMP 382: Reasoning about algorithms Unit 9: Undecidability [Slides adapted from Amos Israeli’s]
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Design of Algorithms by Induction Part 2 Bibliography: [Manber]- Chap 5.
9/5/06CS 6463: AT Computational Geometry1 CS 6463: AT Computational Geometry Fall 2006 Plane Sweep Algorithms and Segment Intersection Carola Wenk.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
CSC1016 Coursework Clarification Derek Mortimer March 2010.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Tutorial 6 & 7 Symbol Table
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Chapter 1 Program Design
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
February 17, 2015Applied Discrete Mathematics Week 3: Algorithms 1 Double Summations Table 2 in 4 th Edition: Section th Edition: Section th.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
CS1Q Computer Systems Lecture 8
Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression.
Fundamental Data Structures and Algorithms Aleks Nanevski February 10, 2004 based on a lecture by Peter Lee LZW Compression.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
IT253: Computer Organization
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 5.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.
1 Heaps and Priority Queues Starring: Min Heap Co-Starring: Max Heap.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Fundamental Data Structures and Algorithms Klaus Sutner April 27, 2004 Computational Geometry.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2005 LZW Compression.
Grade Book Database Presentation Jeanne Winstead CINS 137.
Data Structures and Algorithm Analysis Introduction Lecturer: Ligang Dong, egan Tel: , Office: SIEE Building.
1 Heaps and Priority Queues v2 Starring: Min Heap Co-Starring: Max Heap.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.
CMPS 3130/6130 Computational Geometry Spring 2015
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Course Review Fundamental Structures of Computer Science Margaret Reid-Miller 29 April 2004.
IT1001 – Personal Computer Hardware & system Operations Week7- Introduction to backup & restore tools Introduction to user account with access rights.
Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Chapter 5 Linked List by Before you learn Linked List 3 rd level of Data Structures Intermediate Level of Understanding for C++ Please.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
CSE 143 Lecture 13 Recursive Backtracking slides created by Ethan Apter
Loop Invariants and Binary Search Chapter 4.4, 5.1.
Bushy Binary Search Tree from Ordered List. Behavior of the Algorithm Binary Search Tree Recall that tree_search is based closely on binary search. If.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
LZW (Lempel-Ziv-welch) compression method The LZW method to compress data is an evolution of the method originally created by Abraham Lempel and Jacob.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
3.3 Fundamentals of data representation
CSC317 Greedy algorithms; Two main properties:
Data Compression.
Priority Queues An abstract data type (ADT) Similar to a queue
Applied Algorithmics - week7
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Presentation transcript:

Fundamental Data Structures and Algorithms Klaus Sutner February 12, 2004 More LZW / Practicum

Quizzes There will be an on-line quiz next week (Thursday). It will be on Bb and you have an hour to complete the thingy, any time within a 36 hour interval. We’ll post a practice exam shortly. Don’t blow this off.

Last Time…

Last Time:Lempel & Ziv

Fred’s Improvements F.H. has several bright ideas on how to improve LZW. First off, let’s forget alphabets: the input can always be assumed to be in binary. Second, we will try to reuse code numbers: once a node in the trie is interior, we reuse its number. Third, we’ll be aggressive about growing the trie: when we add one child, we’ll also add the other.

Fred’s Improvements Lastly, for good measure we don’t transmit the actual integer sequence produced. Instead, we use Huffman coding as a secondary compression step and send the Huffman- compressed file. That’s it for the moment.

Reminder: Compression where each prefix is in the dictionary. We stop when we fall out of the dictionary: A b We scan a sequence of symbols A = a 1 a 2 a 3 …. a k

Reminder: Compressing Then send the code for A = a 1 a 2 a 3 …. a k and update dictionary. This is the classical algorithm. How about Fred’s variant? Here is an example from the point of view of the binary trie.

Fred’s Binary LZW Input: ^ Dictionary: Output:

Binary LZW Input: ^ Dictionary: Output:

Binary LZW Input: ^ Dictionary: Output:

Binary LZW Input: ^ Dictionary: Output:

Binary LZW Input: ^ 0 1 Dictionary: Output:

Binary LZW Input: ^ 0 1 Dictionary: Output:

Binary LZW Input: ^ 0 1 Dictionary: Output:

Binary LZW Input: ^ 0 1 Dictionary: Output: 10

Binary LZW Input: ^ 0 1 Dictionary: Output:

Binary LZW Input: ^ 0 1 Dictionary: Output:

Binary LZW Input: ^ 0 1 Dictionary: Output:

Binary LZW Input: ^ 0 1 Dictionary: Output:

Binary LZW Input: ^ 0 1 Dictionary: Output:

Back to Fred’s Improvements Recall Fred’s ideas: Binary Alphabets. Code number reuse. Fast growing trie. Secondary compression. How about it?

LZW Correctness How do we know that decompression always works? (Note that compression is not an issue here). Formally have two maps comp : texts  int seq. decomp : int seq.  texts We need for all texts T: decomp(comp(T)) = T

Getting Personal Think about Ann: compresses T, sends int sequence Bob: decompresses int sequence, tries to reconstruct T Question: Can Bob always succeed? Assuming of course the int sequence is real (the map decomp() is not total).

How? How do we prove that Bob can always succeed? Think of Ann and Bob working in parallel. Time 0: both initialize their dictionaries. Time t: Ann determines next code number c, sends it to Bob. Bob must be able to convert c back into the corresponding word.

Induction We can use induction on t. The problem is: What property should we establish by induction? It has to be a claim about Bob’s dictionary and his ability to decompress the integer sequence. How do the two dictionaries compare over time?

The Claim At time t = 0 both Ann and Bob have the same dictionary. But at any time t > 0 we have Claim: Bob’s dictionary misses exactly the last entry in Ann’s dictionary. Bob is always behind, but just by one entry.

Why? Suppose at time t Ann enters A b with code number C and sends c = code(A). Easy case: c < C-1 By IH c is already in Bob’s dictionary. So Bob can decode and now knows A. But then Bob can update his dictionary: all he needs is the first letter of A.

The Hard Case Now suppose c = C-1. But then Ann must have entered A into her dictionary at the previous step, and must have read A again. So we have Last step :A’ b’ = A Now:A b where a 1 = b’ But then A’ = b’ w.

The Hard Case In other words, the text actually looked like so …. b’ w b’ w b’ b …. But Bob already knows A’ and thus can reconstruct A. QED Ponder deeply.

Car Dodging How does one organize code for a problem like this? First Step: Make sure you understand the algorithm. If at all possible, draw some pictures, get to know the data structures, get a feel for the necessary operations, … Don’t worry about efficiency at this point.

Car Dodging Second Step: Design an implementation for the algorithm. - data structures - operations - interaction Don’t get paralysed by efficiency at this point, but keep thinking about it.

Car Dodging Third Step: Code and test each component. This means: do NOT put the whole system together first, and then start testing it by running real examples. This is a colossal waste of time, you will have to backtrack in any case.

Car Dodging Fourth Step: Assemble the pieces and test them together. Start with easy cases and work your way up to hard cases. The more the merrier. Make sure your code has a debugging facility: write debugging output to a text file, then massage it with perl, sed, awk, emacs, … to get at the information you need.

Debugging Symbolic debuggers are mostly useful in the first round of debugging (local data structures and operations). For a whole system they are essentially worthless. Megabytes of debugging text is the way to go. And never, ever remove your debugging code.

Efficiency One you are reasonably certain that your code is correct, you may want to spend some time streamlining it. This is about real physical time (and memory consumption) and shouldn’t produce any improvements by orders of magnitude, but you may be able to shave off a few constants. Inner loops are a classical target here.

CarDodging: 1 For CarDodger, the algorithm is a sweep-line method: We want to compute a polygonal subset R of space- time (really: 2-dim Euclidean space). We sweep a line L parallel to the x-axis across it and record the intersection of R and L. These intersections are disjoint intervals, and change significantly only when a new obstacle appears (a boundary segment of R) or when two lines intersect, a merge event. So we take action only at these times.

CarDodging: 2 There are only two central data structures: - event queue - interval structure Execution controlled by a while loop: while( event queue not empty ) get next event update interval structure

CarDodging: 2 The event queue is just a heap. The interval structure ought to be a balanced tree, but that is a bit hard to implement. A good mockup is a plain list/array implementation: instead of O(log n) access to the critical intervals we smash our way through the data structure at linear time. The bottleneck should be strictly localized, not propagated throughout the code.

CarDodging: 3 To build and test the heap and IS we need to decide one more thing: what's an event, what is an interval. Many options here, two small class hierarchies seem prudent. Could use interfaces, but abstract classes may be more natural.

Abstract Classes Has at least one abstract (unimplemented) method. Cannot be instantiated. Can have data (in interfaces: only static final). Can inherit from only one abstract class (no multiple inheritance).

CarDodging: 4 Clearly debugging output must at the very least provide information about how each event is handled (dump the IS after each update). Test cases are quite hard to come by here, really lead back to Step 1: understanding the algorithm. Correctness is a very elusive goal for this type of program.

Lastly … Submit your code and score 100 points … Make sure you have a back-up, though, on some safe filesystem. Sometimes things do get lost.