Download presentation
Presentation is loading. Please wait.
Published byFrederica Hall Modified over 9 years ago
1
15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 12, 2004 More LZW / Practicum
2
Quizzes There will be an on-line quiz next week (Thursday). It will be on Bb and you have an hour to complete the thingy, any time within a 36 hour interval. We’ll post a practice exam shortly. Don’t blow this off.
3
Last Time…
4
Last Time:Lempel & Ziv
5
Fred’s Improvements F.H. has several bright ideas on how to improve LZW. First off, let’s forget alphabets: the input can always be assumed to be in binary. Second, we will try to reuse code numbers: once a node in the trie is interior, we reuse its number. Third, we’ll be aggressive about growing the trie: when we add one child, we’ll also add the other.
6
Fred’s Improvements Lastly, for good measure we don’t transmit the actual integer sequence produced. Instead, we use Huffman coding as a secondary compression step and send the Huffman- compressed file. That’s it for the moment.
7
Reminder: Compression where each prefix is in the dictionary. We stop when we fall out of the dictionary: A b We scan a sequence of symbols A = a 1 a 2 a 3 …. a k
8
Reminder: Compressing Then send the code for A = a 1 a 2 a 3 …. a k and update dictionary. This is the classical algorithm. How about Fred’s variant? Here is an example from the point of view of the binary trie.
9
Fred’s Binary LZW 10010110011 Input: ^ 0 1 01 Dictionary: Output:
10
Binary LZW 10010110011 Input: ^ 0 1 0 Dictionary: Output: 1 12 01
11
Binary LZW 10010110011 Input: ^ 0 1 3 Dictionary: Output: 10 12 01 0 01
12
Binary LZW 10010110011 Input: ^ 0 1 4 Dictionary: Output: 103 12 01 0 01 3 01
13
Binary LZW 10010110011 Input: ^ 0 1 Dictionary: Output: 1034 12 01 0 01 3 01 4 5 0 1
14
Binary LZW 10010110011 Input: ^ 0 1 Dictionary: Output: 10340 12 01 01 3 01 4 5 0 1 06
15
Binary LZW 10010110011 Input: ^ 0 1 Dictionary: Output: 103402 1 01 01 3 01 4 5 0 1 06 2 01 7
16
Binary LZW 103402 Input: ^ 0 1 Dictionary: Output: 10
17
Binary LZW 103402 Input: ^ 0 1 Dictionary: Output: 1 1 01 2 0
18
Binary LZW 103402 Input: ^ 0 1 Dictionary: Output: 10 1 01 01 20 3
19
Binary LZW 103402 Input: ^ 0 1 Dictionary: Output: 1001 1 01 01 3 01 20 4
20
Binary LZW 103402 Input: ^ 0 1 Dictionary: Output: 1001011 1 01 01 3 01 4 5 0 1 02
21
Binary LZW 103402 Input: ^ 0 1 Dictionary: Output: 100101100 1 01 01 3 01 4 5 0 1 06 2
22
Back to Fred’s Improvements Recall Fred’s ideas: Binary Alphabets. Code number reuse. Fast growing trie. Secondary compression. How about it?
23
LZW Correctness How do we know that decompression always works? (Note that compression is not an issue here). Formally have two maps comp : texts int seq. decomp : int seq. texts We need for all texts T: decomp(comp(T)) = T
24
Getting Personal Think about Ann: compresses T, sends int sequence Bob: decompresses int sequence, tries to reconstruct T Question: Can Bob always succeed? Assuming of course the int sequence is real (the map decomp() is not total).
25
How? How do we prove that Bob can always succeed? Think of Ann and Bob working in parallel. Time 0: both initialize their dictionaries. Time t: Ann determines next code number c, sends it to Bob. Bob must be able to convert c back into the corresponding word.
26
Induction We can use induction on t. The problem is: What property should we establish by induction? It has to be a claim about Bob’s dictionary and his ability to decompress the integer sequence. How do the two dictionaries compare over time?
27
The Claim At time t = 0 both Ann and Bob have the same dictionary. But at any time t > 0 we have Claim: Bob’s dictionary misses exactly the last entry in Ann’s dictionary. Bob is always behind, but just by one entry.
28
Why? Suppose at time t Ann enters A b with code number C and sends c = code(A). Easy case: c < C-1 By IH c is already in Bob’s dictionary. So Bob can decode and now knows A. But then Bob can update his dictionary: all he needs is the first letter of A.
29
The Hard Case Now suppose c = C-1. But then Ann must have entered A into her dictionary at the previous step, and must have read A again. So we have Last step :A’ b’ = A Now:A b where a 1 = b’ But then A’ = b’ w.
30
The Hard Case In other words, the text actually looked like so …. b’ w b’ w b’ b …. But Bob already knows A’ and thus can reconstruct A. QED Ponder deeply.
31
Car Dodging How does one organize code for a problem like this? First Step: Make sure you understand the algorithm. If at all possible, draw some pictures, get to know the data structures, get a feel for the necessary operations, … Don’t worry about efficiency at this point.
32
Car Dodging Second Step: Design an implementation for the algorithm. - data structures - operations - interaction Don’t get paralysed by efficiency at this point, but keep thinking about it.
33
Car Dodging Third Step: Code and test each component. This means: do NOT put the whole system together first, and then start testing it by running real examples. This is a colossal waste of time, you will have to backtrack in any case.
34
Car Dodging Fourth Step: Assemble the pieces and test them together. Start with easy cases and work your way up to hard cases. The more the merrier. Make sure your code has a debugging facility: write debugging output to a text file, then massage it with perl, sed, awk, emacs, … to get at the information you need.
35
Debugging Symbolic debuggers are mostly useful in the first round of debugging (local data structures and operations). For a whole system they are essentially worthless. Megabytes of debugging text is the way to go. And never, ever remove your debugging code.
36
Efficiency One you are reasonably certain that your code is correct, you may want to spend some time streamlining it. This is about real physical time (and memory consumption) and shouldn’t produce any improvements by orders of magnitude, but you may be able to shave off a few constants. Inner loops are a classical target here.
37
CarDodging: 1 For CarDodger, the algorithm is a sweep-line method: We want to compute a polygonal subset R of space- time (really: 2-dim Euclidean space). We sweep a line L parallel to the x-axis across it and record the intersection of R and L. These intersections are disjoint intervals, and change significantly only when a new obstacle appears (a boundary segment of R) or when two lines intersect, a merge event. So we take action only at these times.
38
CarDodging: 2 There are only two central data structures: - event queue - interval structure Execution controlled by a while loop: while( event queue not empty ) get next event update interval structure
39
CarDodging: 2 The event queue is just a heap. The interval structure ought to be a balanced tree, but that is a bit hard to implement. A good mockup is a plain list/array implementation: instead of O(log n) access to the critical intervals we smash our way through the data structure at linear time. The bottleneck should be strictly localized, not propagated throughout the code.
40
CarDodging: 3 To build and test the heap and IS we need to decide one more thing: what's an event, what is an interval. Many options here, two small class hierarchies seem prudent. Could use interfaces, but abstract classes may be more natural.
41
Abstract Classes Has at least one abstract (unimplemented) method. Cannot be instantiated. Can have data (in interfaces: only static final). Can inherit from only one abstract class (no multiple inheritance).
42
CarDodging: 4 Clearly debugging output must at the very least provide information about how each event is handled (dump the IS after each update). Test cases are quite hard to come by here, really lead back to Step 1: understanding the algorithm. Correctness is a very elusive goal for this type of program.
43
Lastly … Submit your code and score 100 points … Make sure you have a back-up, though, on some safe filesystem. Sometimes things do get lost.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.