15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Analysis of Algorithms
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Algorithmic Complexity Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
CSE 332 Review Slides Tyler Robison Summer
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Course Review COMP171 Spring Hashing / Slide 2 Elementary Data Structures * Linked lists n Types: singular, doubly, circular n Operations: insert,
Lecture 5: Master Theorem and Linear Time Sorting
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CSE 326: Data Structures Lecture #13 Extendible Hashing and Splay Trees Alon Halevy Spring Quarter 2001.
Tirgul 7 Heaps & Priority Queues Reminder Examples Hash Tables Reminder Examples.
Hashing General idea: Get a large array
The Complexity of Algorithms and the Lower Bounds of Problems
Cpt S 223 – Advanced Data Structures Course Review Midterm Exam # 2
David Luebke 1 7/2/2015 Merge Sort Solving Recurrences The Master Theorem.
Fundamental Data Structures and Algorithms Klaus Sutner February 12, 2004 More LZW / Practicum.
Priority Queues, Heaps & Leftist Trees
Advanced Data Structures and Algorithms COSC-600 Lecture presentation-6.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
David Luebke 1 10/3/2015 CS 332: Algorithms Solving Recurrences Continued The Master Theorem Introduction to heapsort.
CS 3343: Analysis of Algorithms
Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression.
CS 3610 Midterm Review.
Fundamental Data Structures and Algorithms Aleks Nanevski February 10, 2004 based on a lecture by Peter Lee LZW Compression.
Brought to you by Max (ICQ: TEL: ) February 5, 2005 Advanced Data Structures Introduction.
Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Course Review Midterm.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
Heaps, Heapsort, Priority Queues. Sorting So Far Heap: Data structure and associated algorithms, Not garbage collection context.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Advanced Data Structure Hackson Leung
MA/CSSE 473 Day 27 Hash table review Intro to string searching.
MA/CSSE 473 Day 23 Student questions Space-time tradeoffs Hash tables review String search algorithms intro.
CS223 Advanced Data Structures and Algorithms 1 Review for Midterm Neil Tang 03/06/2008.
Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2005 LZW Compression.
Java Methods Big-O Analysis of Algorithms Object-Oriented Programming
Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
+ David Kauchak cs312 Review. + Midterm Will be posted online this afternoon You will have 2 hours to take it watch your time! if you get stuck on a problem,
Data Structure II So Pak Yeung Outline Review  Array  Sorted Array  Linked List Binary Search Tree Heap Hash Table.
Review for Exam 2 Topics covered (since exam 1): –Splay Tree –K-D Trees –RB Tree –Priority Queue and Binary Heap –B-Tree For each of these data structures.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Course Review Fundamental Structures of Computer Science Margaret Reid-Miller 29 April 2004.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
8 January Heap Sort CSE 2011 Winter Heap Sort Consider a priority queue with n items implemented by means of a heap  the space used is.
CSC 413/513: Intro to Algorithms Solving Recurrences Continued The Master Theorem Introduction to heapsort.
1 COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf.
Advanced Data Structure By Kayman 21 Jan Outline Review of some data structures Array Linked List Sorted Array New stuff 3 of the most important.
Midterm Review Fundamental Data Structures and Algorithms Margaret Reid-Miller 2 March 2004.
FALL 2005CENG 213 Data Structures1 Priority Queues (Heaps) Reference: Chapter 7.
CISC220 Fall 2009 James Atlas Dec 07: Final Exam Review.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
David Luebke 1 2/5/2016 CS 332: Algorithms Introduction to heapsort.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
CSC 2300 Data Structures & Algorithms March 13, 2007 Chapter 6. Priority Queues.
Complexity of Algorithms Fundamental Data Structures and Algorithms Ananda Guna January 13, 2005.
Priority Queues An abstract data type (ADT) Similar to a queue
Hashing Exercises.
Cse 373 April 26th – Exam Review.
November 1st – Exam Review
Review for Midterm Neil Tang 03/04/2010
CSE 326: Data Structures: Midterm Review
CMSC 341: Data Structures Priority Queues – Binary Heaps
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Final Exam Review COP4530.
Presentation transcript:

Fundamental Data Structures and Algorithms Margaret Reid-Miller 1 March 2005 More LZW / Midterm Review

2 Midterm Thursday, 12:00 noon, 3 March 2005 WeH 7500 Worth a total of 125 points Closed book, but you may have one page of notes. If you have a question, raise your hand and stay in your seat

Last Time…

4 Last Time:Lempel & Ziv

5 Reminder: Compressing where each prefix is in the dictionary. We stop when we fall out of the dictionary: A b We scan a sequence of symbols A = a 1 a 2 a 3 …. a k

6 Reminder: Compressing Then send the code for A = a 1 a 2 a 3 …. a k This is the classical algorithm.

7 …s  …s  s  sb… LZW: Compress bad case Input: ^ Dictionary: Output: …. s    - word (possibly empty)

8 …s  …s  s  sb… LZW: Compress bad case (time t) Input: ^ Dictionary: Output: …. s  … 

9.… LZW: Uncompress bad case (time t) Input: ^ Dictionary: Output:… s  

10 …s  …s  s  sb… LZW: Compress bad case (step t+1) Input: ^ Dictionary: Output: …….  s    s

11.…  LZW: Uncompress bad case (time t+1) Input: ^ Dictionary: Output: ……s……s s  

12 …s  …s  s  sb… LZW: Compress bad case (time t+2) Input: ^ Dictionary: Output: …….  s    s  +1 b

13 ….   LZW: Uncompress bad case (time t+2) Input: Dictionary: Output: ……s  s   What is  ?? ^

14.…   LZW: Uncompress bad case (time t+2) Input: Dictionary: Output: ……s  ss s   What is  ?? It codes for s s!  s ^

15 Example  aabbbaabbaaabaababb s  s  s  s Input Output add to D 0 a 0+a3:aa 1+b4:ab 5-bb5:bb 3+aa6:bba 6+bba7:aab 7+aab8:bbaa 9-aaba9:aaba 5+bb 10:aabab s = a  = ab

16 LZW Correctness So we know that when this case occurs, decompression works. Is this the only bad case? How do we know that decompression always works? (Note that compression is not an issue here). Formally have two maps comp : texts  int seq. decomp : int seq.  texts We need for all texts T: decomp(comp(T)) = T

17 Getting Personal Think about Ann: compresses T, sends int sequence Bob: decompresses int sequence, tries to reconstruct T Question: Can Bob always succeed? Assuming of course the int sequence is valid (the map decomp() is not total).

18 How? How do we prove that Bob can always succeed? Think of Ann and Bob working in parallel. Time 0: both initialize their dictionaries. Time t: Ann determines next code number c, sends it to Bob. Bob must be able to convert c back into the corresponding word.

19 Induction We can use induction on t. The problem is: What property should we establish by induction? It has to be a claim about Bob’s dictionary. How do the two dictionaries compare over time?

20 The Claim At time t = 0 both Ann and Bob have the same dictionary. But at any time t > 0 we have Claim: Bob’s dictionary misses exactly the last entry in Ann’s dictionary after processing the last code Ann sends. (Ann can add Wx to the dictionary, but Bob won’t know x until the next message he receives.)

21 The Easy Case Suppose at time t Ann enters A b with code number C and sends c = code(A). Easy case: c < C-1 By Inductive Hypothesis Bob has codes upto and including C-2 in his dictionary. That is, c is already in Bob’s dictionary. So Bob can decode and now knows A. But then Bob can update his dictionary: all he needs is the first letter of A.

22 The Easy Case Suppose at time t Ann enters A b with code number C and sends c = code(A). Easy case: c < C-1 … A b … c C C-1 Entered: Sent:

23 The Hard Case Now suppose c = C-1. Recall, at time t Ann had entered A b with code number C and sent c = code(A). … A b … c C C-1 Entered: Sent:

24 The Hard Case Now suppose c = C-1. Recall, at time t Ann had entered A b with code number C and sent c = code(A). … A’ s’ … b … c C c Entered: Sent: A = A’ s’ a 1 = s’

25 The Hard Case Now suppose c = C-1. Recall, at time t Ann had entered A b with code number C and sent c = code(A). … s’ W s’ … b… c C c Entered: Sent: A’ = s’ W

26 The Hard Case Now suppose c = C-1. Recall, at time t Ann had entered A b with code number C and sent c = code(A). … s’ W s’ W s’ b … c C c Entered: Sent:

27 The Hard Case Now suppose c = C-1. Recall, at time t Ann had entered A b with code number C and sent c = code(A). So we have Time t-1:entered c = code(A), sent code(A’), where A = A’ s’ Time t:entered C = code(A b), sent c = code(A), where a 1 = s’ But then A’ = s’ W.

28 The Hard Case In other words, the text must looked like so …. s’ W s’ W s’ b …. But Bob already knows A’ and thus can reconstruct A. QED

Midterm Review

30 Basic Data Structures  List  Persistance  Tree  Height of tree, Depth of node, Level  Perfect, Complete, Full  Min & Max number of nodes

31 Recurrence Relations  E.g., T(n) = T(n-1) + n/2  Solve by repeated substitution  Solve resulting series  Prove by guessing and substitution  Master Theorem  T(N) = aT(N/b) + f(N)

32 Solving recurrence equations Repeated substitution: t(n) = n + t(n-1) = n + (n-1) + t(n-2) = n + (n-1) + (n-2) + t(n-3) and so on… = n + (n-1) + (n-2) + (n-3) + … + 1

33 Incrementing series  This is an arithmetic series that comes up over and over again, because characterizes many nested loops: for (i=1; i<n; i++) { for (j=1; j<i; j++) { f(); }

34 “Big-Oh” notation N c  f(N) T(N) n0n0 running time T(N) = O(f(N)) “T(N) is order f(N)”

35 Upper And Lower Bounds f(n) = O( g(n) )Big-Oh f(n) ≤ c g(n) for some constant c and n > n 0 f(n) =  ( g(n) ) Big-Omega f(n) ≥ c g(n) for some constant c and n > n 0 f(n) =  ( g(n) ) Theta f(n) = O( g(n) ) and f(n) =  ( g(n) )

36 Upper And Lower Bounds f(n) = O( g(n) )Big-Oh Can only be used for upper bounds. f(n) =  ( g(n) ) Big-Omega Can only be used for lower bounds f(n) =  ( g(n) ) Theta Pins down the running time exactly (up to a multiplicative constant).

37 Big-O characteristic  Low-order terms “don’t matter”:  Suppose T(N) = 20n nlog n + 5  Then T(N) = O(n 3 )  Question:  What constants c and n 0 can be used to show that the above is true?  Answer: c=35, n 0 =1

38 Big-O characteristic  The bigger task always dominates eventually.  If T1(N) = O(f(N)) and T2(N) = O(g(N)).  Then T1(N) + T2(N) = max( O(f(N)), O(g(N) ).  Also:  T 1 (N)  T 2 (N) = O( f(N)  g(N) ).

39 Dictionary  Operations:  Insert  Delete  Find  Implementations:  Binary Search Tree  AVL Tree  Splay  Trie  Hash

40 Binary search trees  Simple binary search trees can have bad behavior for some insertion sequences.  Average case O(log N), worst case O(N).  AVL trees maintain a balance invariant to prevent this bad behavior.  Accomplished via rotations during insert.  Splay trees achieve amortized running time of O(log N).  Accomplished via rotations during find.

41 AVL trees  Definition  Min number of nodes of height H  F H+3 -1, where F n is nth Fibonacci number  Insert - single & double rotations. How many?  Delete - lazy. How bad?

42 Single rotation  For the case of insertion into left subtree of left child: Z Y X ZY X Deepest node of X has depth 2 greater than deepest node of Z. Depth reduced by 1 Depth increased by 1

43 Double rotation  For the case of insertion into the right subtree of the left child. Z X Y1Y1 Y2Y2 ZXY1Y1 Y2Y2

44 Splay trees  Splay trees provide a guarantee that any sequence of M operations (starting from an empty tree) will require O(Mlog N) time.  Hence, each operation has amortized cost of O(log N).  It is possible that a single operation requires O(N) time.  But there are no bad sequences of operations on a splay tree.

45 Splaying, case 3  Case 3: Zig-zag (left).  Perform an AVL double rotation. a Z b X Y1Y1 Y2Y2 a Z b XY1Y1 Y2Y2

46 Splaying, case 4  Case 4: Zig-zig (left).  Special rotation. a Z b Y W X a Z b Y W X

47 Tries  Good for unequal length keys or sequences  Find O(m), m sequence length  But: Few to many children I likelove you 5 9 lovely … …

48 Hash Tables  Hash function h: h(key) = index  Desirable properties:  Approximate random distribution  Easy to calculate  E.g., Division: h(k) = k mod m  Perfect hashing  When know all input keys in advance

49 Collisions  Separate chaining  Linked list: ordered vs unordered  Open addressing  Linear probing - clustering very bad with high load factor  *Quadratic probing - secondary clustering, table size must be prime  Double hashing - table size must be prime, too complex

50 Hash Tables  Delete?  Rehash when load factor high - double (amortize cost constant)  Find & insert are near constant time!  But: no min, max, next,… operation  Trade space for time--load factors <75%

Priority Queues

52 Priority Queues  Operations:  Insert  FindMin  DeleteMin  Implementations:  Linked list  Search tree  Heap

53  Linked list  deleteMinO(1)O(N)  insert O(N)O(1)  Search trees  All operationsO(log N)  Heaps  deleteMinO(log N)  insertO(log N)  buildheapO(N) N inserts or Possible priority queue implementations

54 Heaps  Properties: 1.Complete binary tree in an array 2.Heap order property  Insert: push up  DeleteMin: push down  Heapify: starting at bottom, push down  Heapsort: BuildHeap + DeleteMin

55 Insert - Push up  Insert leaf to establish complete tree property.  Bubble inserted leaf up the tree until the heap order property is satisfied

56 DeleteMin - Push down  Move last leaf to root to restore complete tree property.  Bubble the transplanted leaf value down the tree until the heap order property is satisfied

57 Heapify - Push down  Start at bottom subtrees.  Bubble subtree root down until the heap order property is satisfied

Sorting

59 Simple sorting algorithms Several simple, quadratic algorithms (worst case and average). - Bubble Sort - Insertion Sort - Selection Sort Only Insertion Sort of practical interest: running time linear in number of inversion of input sequence. Constants small. Stable?

60 Sorting Review Asymptotically optimal O(n log n) algorithms (worst case and average). - Merge Sort - Heap Sort Merge Sort purely sequential and stable. But requires extra memory: 2n + O(log n).

61 Quick Sort Overall fastest. In place. BUT: Worst case quadratic. Average case O(n log n). Not stable. Implementation details tricky.

62 Radix Sort Used by old computer-card-sorting machines. Linear time: b passes on b-bit elements b/m passes m bits per pass Each pass must be stable BUT: Uses 2n+2 m space. May only beat Quick Sort for very large arrays.

Data Compression

64 Data Compression  Huffman  Optimal prefix-free codes  Priority queue on “tree” frequency  LZW  Dictionary of codes for previously seen patterns  When find pattern increase length by one  trie

65 Huffman  Full: every node  Is a leaf, or  Has exactly 2 children.  Build tree bottom up:  Use priority queue of trees weight - sum of frequencies.  New tree of two lowest weight trees. c a b d a=1, b=001, c=000, d=01

66 Summary of LZW LZW is an adaptive, dictionary based compression method. Incrementally builds the dictionary (trie) as it encodes the data. Building the dictionary while decoding is slightly more complicated, but requires no special data structures.