1 Algorithmic Aspects of Searching in the Past Thomas Ottmann Institut für Informatik, Universität Freiburg, Germany (Lecture 13: Persistence and Oblivious Data Structures) Advanced Algorithms & Data Structures
2 Overview Motivation: Oblivious and persistent structures Examples: Arrays, search trees, Z-stratified search trees, relaxation Making structures persistent: Structure-copying, path-copying-, DSST-method Application: Pointlocation Application: Time-evolving data: Capture and replay of whiteboard data, in particular handwriting traces Oblivious structures: Randomized and uniquely represented structures, c-level jump lists
3 Motivation A structure storing a set of keys is called oblivious, if it is not possible to infer its generation history from its current shape. A structure is called persistent, if it supports access to multiple versions. Partially persistent: All versions can be accessed but only the newest version can be modified. Fully persistent: All versions can be accessed and modified. Confluently persistent: Two or more old versions can be combined into one new version.
4 Example: Arrays Array: …… Uniquely represented structure, hence, oblivious! Access: In time O(log n) by binary search. Update (Insertion, Deletion): (n) Caution: Storage structure may still depend on generation history!
5 Example: Natural search trees Only partially oblivious! Insertion history can sometimes be reconstructed. Deleted keys are not visible. Access, insertion, deletion of keys may take time (n) 1, 3, 5, 75, 1, 3,
6 Example: Balanced search tree Problem: Updates come in sudden bursts (Example: Recording ink-traces from pen input) Not enough time to serialize insertions and rebalancing transformations Solution: Relaxed balancing: Carry out updates and rebalancing transformations concurrently!
7 Stratified search trees.... ….. … … … …
8 Example
9
10 Insertion Insert the new key among the leaves at the expected position and deposit a „push-up-request“ … … …….... ….. x p
11 Iterative sequence of insertions
12 Handling of push-up-requests (1) A push-up-request either leads to a local structural change and halt, which can be carried out in time O(1) (Case 1) or (exclusively) to a recursive shift of the push-up-requests to the next higher stratum without any structural change (Case 2) Case 1 [There is still room on the next higher stratum]
13 Handling of push-up-requests (2) Case 2 [Next higher stratum is full] Append a new apex, if node is pushed over topmost stratum boarder
14 Deletion Locate x among the leaves. Deposit a removal request at x. Handle removal request. … … …….... ….. ……
15 Handling removal requests Case 1 [Enough nodes at bottommost stratum] Case 2 [Bottommost stratum too sparse] Deposit „pull-down-request“ pq q
16 Handling of pull-down-requests (1) 1p231p23 1p234p1234 1p p Case1 [There are enough nodes on next higher stratum] Finite structural change and Halt!
17 Handling of pull-down-requests (2) p q q p Case 2 [Not enough nodes on next higher stratum] Recursively shift pull-down-request to next higher stratum, but no structural change!
18 Z-stratified search trees: Observations Insertions, deletions, and rebalancing-transformations (removal of, ) can be arbitrarily interleaved. The amortized restructuring costs per insertion or deletion are constant. The generation history of a current version may be partially reconstructed (Sequence of insertions and deletions are partially visible) But: Update operations are always applied to the current version Z-stratified search trees are not persistent
19 Overview Motivation: Oblivious and persistent structures Examples: Arrays, search trees, Z-stratified search trees, relaxation Making structures persistent: Structure-copying, path-copying-, DSST-method Application: Pointlocation Application: Time-evolving data: Capture and replay of whiteboard data, in particular handwriting traces Oblivious structures: Randomized and uniquely represented structures, c-level jump lists
20 Simple methods for making structures persistent Copy structure and apply an update-operation to the copy, yields fully persistence at the price of (n) time per update and space (m n) for m updates applied to structures of size n. (Structure-copying method) Do nothing, but store a log-file of all updates! In order to access version i, first carry out i updates, starting with the initial structure, and generate version i. (i) time per access, (m) space for m operations. Hybrid-method: Store the complete sequence of updates and additionally each k-th version for a suitably chosen k. Result: Time and space requirement increases at least with a faktor sqr(m) ! Are there any better methods? …. for search trees….
21 Persistent search trees (1) Path-copying method version 0:
22 Persistent search trees (1) Path-copying method version 1: Insert (2)
23 Persistent search trees (1) Path-copying method version 1: Insert (2) version 2: Insert (4)
24 Persistent search trees (1) Path-copying method Restructuring costs: O(log n) per update operation version 1: Insert (2) version 2: Insert (4)
25 Persistent search trees (2) DSST-method: Extend each node by a time-stamped modification box ? All versions before time t All versions after time t Modification boxes initially empty are filled bottom up k t: rp lp rp
26 DSST method version 0
27 DSST method lp version 0:
28 DSST method lp version 1: Insert (2) version 2: Insert (4)
29 DSST method The amortized costs (time and space) per update operation are O(1) rp 1 lp version 1: Insert (2) version 2: Insert (4)
30 Overview Motivation: Oblivious and persistent structures Examples: Arrays, search trees, Z-stratified search trees, relaxation Making structures persistent: Structure-copying, path-copying-, DSST-method Application: Pointlocation Application: Time-evolving data: Capture and replay of whiteboard data, in particular handwriting traces Oblivious structures: Randomized and uniquely represented structures, c-level jump lists
31 Application: Planar Pointlocation Suppose that the Euclidian plane is subdivided into polygons by n line segments that intersect only at their endpoints. Given such a polygonal subdivision and an on-line sequence of query points in the plane, the planar point location problem, is to determine for each query point the polygon containing it. Measure an algorithm by three parameters: 1) The preprocessing time. 2) The space required for the data structure. 3) The time per query.
32 Planar point location -- example
33 Planar point location -- example
34 Solving planar point location (Cont.) Partition the plane into vertical slabs by drawing a vertical line through each endpoint. Within each slab the lines are totally ordered. Allocate a search tree per slab containing the lines at the leaves with each line associate the polygon above it. Allocate another search tree on the x-coordinates of the vertical lines
35 Solving planar point location (Cont.) To answer query first find the appropriate slab then search the slab to find the polygon
36 Planar point location -- example
37 Planar point location -- analysis Query time is O(log n) How about the space ? (n 2 ) And so could be the preprocessing time
38 Planar point location -- bad example Total # lines O(n), and number of lines in each slab is O(n).
39 Planar point location & persistence So how do we improve the space bound ? Key observation: The lists of the lines in adjacent slabs are very similar. Create the search tree for the first slab. Then obtain the next one by deleting the lines that end at the corresponding vertex and adding the lines that start at that vertex How many insertions/deletions are there alltogether ? 2n
40 Planar point location & persistence (cont) Updates should be persistent since we need all search trees at the end. Partial persistence is enough. Well, we already have the path copying method, lets use it. What do we get ? O(n logn) space and O(n log n) preprocessing time. We can improve the space bound to O(n) by using the DSST method.
41 Overview Motivation: Oblivious and persistent structures Examples: Arrays, search trees, Z-stratified search trees, relaxation Making structures persistent: Structure-copying, path-copying-, DSST-method Application: Pointlocation Application: Time-evolving data: Capture and replay of whiteboard data, in particular handwriting traces Oblivious structures: Randomized and uniquely represented structures, c-level jump lists
42 Author Audience Data sources Lightweight content creation Recorded learning module Document Input media Whiteboard TouchScreen Tablet PC Time evolving data: Presentation recording
43 Cintiq Tablet (Wacom) Pen input, large display Eye contact with audience
44 Random access facility Access of an ink-object s j corresponding to time t j requires the immediate presentation of s j and of all ink-objects since t 0
45 Whiteboard data Whiteboard data-stream requires Fast insertion and deletion of graphical objects (lines, circles, pen-traces, …) in large quantities, Partially persistent storage which allows: Fast access (display and „rendering“) of all data for a given time stamp, Synchronisability (as slave) with audio-stream (master). Problem: Find a suitable method for storing the whiteboard-action stream!
46 Postprocessing Whiteboard-stream is made persistent by the structure-copying method: For each time stamp t a complete list of all objects visible on the board at time t is (pre-)computed and stored for random access. Disadvantage: Highly redundant, very large data-volume Advantage: Visible scrolling Storage and representation of freehand ink-traces: Find a suitable compromise between conflicting goals: Data-volume Access cost (time) and dynamic replay (visible scrolling) Individual, personal style Skalability (vector- vs. raster-based-representation)
47 Overview Motivation: Oblivious and persistent structures Examples: Arrays, search trees, Z-stratified search trees, relaxation Making structures persistent: Structure-copying, path-copying-, DSST-method Application: Pointlocation Application: Time-evolving data: Capture and replay of whiteboard data, in particular handwriting traces Oblivious structures: Randomized and uniquely represented structures, c-level jump lists
48 Methods for making structures oblivious Unique representation of the structure: Set/size uniqueness: For each set of n keys there is exactly one structure which can store such a set. The storage is order unique, i.e. the nodes of the strucure are ordered and the keys are stored in ascending order in nodes with ascending numbers. Randomise the structure: Assure that the expectation for the occurrence of a structure storing a set M of keys is independent of the way how M was generated. Observation: The address-assingment of pointers has to be subject under a randomised regime!
49 Example of a randomised structure Z-stratified search tree On each stratum, randomly choose the distribution of trees from Z. Insertion? Deletion? … … …….... …..
50 Uniquely represented structures (a) Generation history determines structure (b) Set-uniqueness:Set determines structure 1, 3, 5, 7 5, 1, 3, 7 1, 3, 5,
51 Uniquely represented structures (c) Size-uniqueness:Size determines structure 1, 3, 5, 7 2, 4, 5, 8 Common structure Order-uniqueness: Fixed ordering of nodes determines where the keys are to be stored
52 Set- and order-unique structures Lower bounds? Assumptions: A dictionary of size n is represented by a graph of n nodes. Node degree finite (fixed), Fixed order of the nodes, i-th node stores i-largest key. Operations allowed to change a graph: Creation | Removal of a node Pointer change Exchange of keys Theorem: For each set- and order-unique representation of a dictionary with n keys, at least one of the operations access, insertion, or deletion must require time (n 1/3 ).
53 Uniquely represented dictionaries Problem: Find set-unique oder size-unique representations of the ADT „dictionary“ Known solutions: (1)set-unique, oder-unique Aragon/Seidel, FOCS 1989: Randomized Search Trees universal hash-function Update as for priority search trees! Search, insert, delete can be carried out in O(log n) expected time. (s, h(s)) priority s X
54 The Jelly Fish (2) L. Snyder, 1976, set-unique, oder-unique Upper Bound: Jelly Fish, search insert delete in time O( n). body: n nodes n tentacles of length n each
55 Lower bound for tree-based structures set-unique, oder-unique Lower bound: For “ tree-based ” structures the following holds: Update-time · Search-time = Ω (n) Number of nodes n ≤ h L + 1 L ≥ (n – 1)/h At least L-1 keys must have moved from leaves to internal nodes. Therefore, update requires time Ω(L). Delete x 1 Insert x n+1 > x n L leaves · x n x 1 h
56 Cons-structures (3) Sunder/Tarjan, STOC 1990, Upper bound: (Nearly) full, binary search trees Einzige erlaubte Operation für Updates: Search time O(log n) Einfügen Entfernen in Zeit O( n) möglich · · · · L R x LR x Cons,,
57 Jump-lists (Half-dynamic) 2-level jump-list 2-level jump-liste of size n Search:O(i) = O( ) time Insertion: Deletion: O( ) time tail 0i2in (n-1)/i·i
58 Jump-lists: Dynamization 2-level-jump-list of size n search:O(i) = O( n) time insert delete : O( n) time Can be made fully dynamic: (i-1) 2 i2i2 n(i+1) 2 (i+2) 2
59 3-level jump-lists level 2 Search(x): locate x by following level-2-pointers identifying i 2 keys among which x may occur, level-1-pointers identifying i keys among which x may occur, level-0-pointers identifying x time: O(i) = O(n 1/3 ) 0i2ii 2 i 2 +i2·i 2
60 3-level jump-lists level 2 Update requires Changing of 2 pointers on level 0 Changing of i pointers on level 1 Changing of all i pointers onlevel 2 Update time O(i) = O(n 1/3 ) 0i2ii 2 i 2 +i2·i 2
61 c-level jump-lists Let Lower levels: level 0: all pointers of length 1:... level j: all pointers of legth i j-1 :... level c/2 :... Upper levels: level j: connect in a in list all nodes 1, 1·i j-1 +1, 2· i j-1 +1, 3· i j-1 +1,... level c:
62 c-level jump-lists Theorem: For each c ≥ 3, the c-level jump-list is a size and order-unique representation of dictionaries with the following characteristics: Space requirement O(c·n) Access time O(c·n 1/c ) Update time, if n is even, if n is odd
63 1 top-level tree with n leaves All low-level trees for each sequence of n consecutive keys Top-level tree direct search to the root of the currently active low-level trees Semi-dynamic structure: low-level-tree-size s+1 = top-level-tree-size Shared-search-trees Reduction of search time
64 Shared-search-trees Pointers at: Level 0: (p-2 0 ) p (p+ 2 0 ) Level 1: (p-2 1 ) p (p+ 2 1 ) … Level k-2: (p-2 k-2 ) p (p+ 2 k-2 ) 2(k-1) Pointers per node p, k = O(log n) Search time O(log n) SpaceO(n log n)
65 Insertion:Determine insertion position; Change all pointers jumping over the insertion position; Add 2 new pointers per level; Completely rebuild top-level tree.
66 Number of pointerchanges:level 0: 2 · level 1: 2 · … level k-2:2 · 2 k · (2 k-1 -1)+2(k-1) =
67 Shared search trees: Summary Theorem: Shared search trees are a size- und order-unique representation of dictionaries with the following characteristics: Space requirement: O(n log n) Search time: O(log n) Upadate time: O( n ) Open problem: Is there a size- and order-unique representation of by graphs with bounded node degree, search time O(log n), and update time o(n) (e.g.. O( n))?