1 Algorithmic Aspects of Searching in the Past Thomas Ottmann Institut für Informatik, Universität Freiburg, Germany

Slides:



Advertisements
Similar presentations
Two Segments Intersect?
Advertisements

Planar point location -- example
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Chapter 4: Trees Part II - AVL Tree
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Dynamic Planar Convex Hull Operations in Near- Logarithmic Amortized Time TIMOTHY M. CHAN.
2/14/13CMPS 3120 Computational Geometry1 CMPS 3120: Computational Geometry Spring 2013 Planar Subdivisions and Point Location Carola Wenk Based on: Computational.
1 Persistent data structures. 2 Ephemeral: A modification destroys the version which we modify. Persistent: Modifications are nondestructive. Each modification.
Brute-Force Triangulation
UNC Chapel Hill M. C. Lin Polygon Triangulation Chapter 3 of the Textbook Driving Applications –Guarding an Art Gallery –3D Morphing.
9/12/06CS 6463: AT Computational Geometry1 CS 6463: AT Computational Geometry Fall 2006 Triangulations and Guarding Art Galleries II Carola Wenk.
1 Algorithmic Aspects of Searching in the Past Christine Kupich Institut für Informatik, Universität Freiburg Lecture 1: Persistent Data Structures Advanced.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Convex Hulls Computational Geometry, WS 2006/07 Lecture 2 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen, Institut für Informatik Fakultät für.
Multiversion Access Methods - Temporal Indexing. Basics A data structure is called : Ephemeral: updates create a new version and the old version cannot.
Relaxed Balancing Advanced Algorithms & Data Structures Lecture Theme 09 Prof. Dr. Th. Ottmann Summer Semester 2006.
WS Prof. Dr. Th. Ottmann Algorithmentheorie 16 – Persistenz und Vergesslichkeit.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
Lecture 12 : Special Case of Hidden-Line-Elimination Computational Geometry Prof. Dr. Th. Ottmann 1 Special Cases of the Hidden Line Elimination Problem.
BTrees & Bitmap Indexes
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Special Cases of the Hidden Line Elimination Problem Computational Geometry, WS 2007/08 Lecture 16 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen,
Lists A list is a finite, ordered sequence of data items. Two Implementations –Arrays –Linked Lists.
Rossella Lau Lecture 5, DCO20105, Semester A, DCO Data structures and algorithms  Lecture 5: Deque Comparison of sequence containers  Deque.
Persistent Data Structures Computational Geometry, WS 2007/08 Lecture 12 Prof. Dr. Thomas Ottmann Khaireel A. Mohamed Algorithmen & Datenstrukturen, Institut.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Lower and Upper Bounds on Obtaining History Independence Niv Buchbinder and Erez Petrank Technion, Israel.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
1 Geometric Solutions for the IP-Lookup and Packet Classification Problem (Lecture 12: The IP-LookUp & Packet Classification Problem, Part II) Advanced.
Geometric Data Structures Computational Geometry, WS 2007/08 Lecture 13 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen, Institut für Informatik.
Hidden-Line Elimination Computational Geometry, WS 2006/07 Lecture 14 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen, Institut für Informatik Fakultät.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Point Location Computational Geometry, WS 2007/08 Lecture 5 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen, Institut für Informatik Fakultät für.
Uniquely Represented Data Structures Advanced Algorithms & Data Structures Lecture Theme 10 Prof. Dr. Th. Ottmann Summer Semester 2006.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
1 Persistent data structures. 2 Ephemeral: A modification destroys the version which we modify. Persistent: Modifications are nondestructive. Each modification.
Lecture 6: Point Location Computational Geometry Prof. Dr. Th. Ottmann 1 Point Location 1.Trapezoidal decomposition. 2.A search structure. 3.Randomized,
Lecture 11 : More Geometric Data Structures Computational Geometry Prof. Dr. Th. Ottmann 1 Geometric Data Structures 1.Rectangle Intersection 2.Segment.
Primary Indexes Dense Indexes
Line Segment Intersection Computational Geometry, WS 2006/07 Lecture 3 – Part II Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen, Institut für Informatik.
Skip Lists1 Skip Lists William Pugh: ” Skip Lists: A Probabilistic Alternative to Balanced Trees ”, 1990  S0S0 S1S1 S2S2 S3S3 
Orthogonal Range Searching I Range Trees. Range Searching S = set of geometric objects Q = query object Report/Count objects in S that intersect Q Query.
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
Compiled by: Dr. Mohammad Alhawarat BST, Priority Queue, Heaps - Heapsort CHAPTER 07.
UNC Chapel Hill M. C. Lin Point Location Reading: Chapter 6 of the Textbook Driving Applications –Knowing Where You Are in GIS Related Applications –Triangulation.
UNC Chapel Hill M. C. Lin Line Segment Intersection Chapter 2 of the Textbook Driving Applications –Map overlap problems –3D Polyhedral Morphing.
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
CSE AU B-Trees1 B-Trees CSE 373 Data Structures.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Skip Lists 二○一七年四月二十五日
The ADT Table The ADT table, or dictionary Uses a search key to identify its items Its items are records that contain several pieces of data 2 Figure.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
CMPS 3130/6130 Computational Geometry Spring 2015
February 17, 2005Lecture 6: Point Location Point Location (most slides by Sergi Elizalde and David Pritchard)
CSE 589 Applied Algorithms Spring 1999 Prim’s Algorithm for MST Load Balance Spanning Tree Hamiltonian Path.
DATA STRUCURES II CSC QUIZ 1. What is Data Structure ? 2. Mention the classifications of data structure giving example of each. 3. Briefly explain.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
Computational Geometry
PC trees and Circular One Arrangements
Lecture 18: Uniformity Testing Monotonicity Testing
Database Design and Programming
Presentation transcript:

1 Algorithmic Aspects of Searching in the Past Thomas Ottmann Institut für Informatik, Universität Freiburg, Germany (Lecture 13: Persistence and Oblivious Data Structures) Advanced Algorithms & Data Structures

2 Overview Motivation: Oblivious and persistent structures Examples: Arrays, search trees, Z-stratified search trees, relaxation Making structures persistent: Structure-copying, path-copying-, DSST-method Application: Pointlocation Application: Time-evolving data: Capture and replay of whiteboard data, in particular handwriting traces Oblivious structures: Randomized and uniquely represented structures, c-level jump lists

3 Motivation A structure storing a set of keys is called oblivious, if it is not possible to infer its generation history from its current shape. A structure is called persistent, if it supports access to multiple versions. Partially persistent: All versions can be accessed but only the newest version can be modified. Fully persistent: All versions can be accessed and modified. Confluently persistent: Two or more old versions can be combined into one new version.

4 Example: Arrays Array: …… Uniquely represented structure, hence, oblivious! Access: In time O(log n) by binary search. Update (Insertion, Deletion):  (n) Caution: Storage structure may still depend on generation history!

5 Example: Natural search trees Only partially oblivious! Insertion history can sometimes be reconstructed. Deleted keys are not visible. Access, insertion, deletion of keys may take time  (n) 1, 3, 5, 75, 1, 3,

6 Example: Balanced search tree Problem: Updates come in sudden bursts (Example: Recording ink-traces from pen input) Not enough time to serialize insertions and rebalancing transformations Solution: Relaxed balancing: Carry out updates and rebalancing transformations concurrently!

7 Stratified search trees.... ….. … … … …

8 Example

9

10 Insertion Insert the new key among the leaves at the expected position and deposit a „push-up-request“ … … …….... ….. x p

11 Iterative sequence of insertions

12 Handling of push-up-requests (1) A push-up-request either leads to a local structural change and halt, which can be carried out in time O(1) (Case 1) or (exclusively) to a recursive shift of the push-up-requests to the next higher stratum without any structural change (Case 2) Case 1 [There is still room on the next higher stratum]

13 Handling of push-up-requests (2) Case 2 [Next higher stratum is full] Append a new apex, if node is pushed over topmost stratum boarder

14 Deletion Locate x among the leaves. Deposit a removal request at x. Handle removal request. … … …….... ….. ……

15 Handling removal requests Case 1 [Enough nodes at bottommost stratum] Case 2 [Bottommost stratum too sparse] Deposit „pull-down-request“ pq q

16 Handling of pull-down-requests (1) 1p231p23 1p234p1234 1p p Case1 [There are enough nodes on next higher stratum] Finite structural change and Halt!

17 Handling of pull-down-requests (2) p q q p Case 2 [Not enough nodes on next higher stratum] Recursively shift pull-down-request to next higher stratum, but no structural change!

18 Z-stratified search trees: Observations Insertions, deletions, and rebalancing-transformations (removal of, ) can be arbitrarily interleaved. The amortized restructuring costs per insertion or deletion are constant. The generation history of a current version may be partially reconstructed (Sequence of insertions and deletions are partially visible) But: Update operations are always applied to the current version Z-stratified search trees are not persistent

19 Overview Motivation: Oblivious and persistent structures Examples: Arrays, search trees, Z-stratified search trees, relaxation Making structures persistent: Structure-copying, path-copying-, DSST-method Application: Pointlocation Application: Time-evolving data: Capture and replay of whiteboard data, in particular handwriting traces Oblivious structures: Randomized and uniquely represented structures, c-level jump lists

20 Simple methods for making structures persistent Copy structure and apply an update-operation to the copy, yields fully persistence at the price of  (n) time per update and space  (m n) for m updates applied to structures of size n. (Structure-copying method) Do nothing, but store a log-file of all updates! In order to access version i, first carry out i updates, starting with the initial structure, and generate version i.  (i) time per access,  (m) space for m operations. Hybrid-method: Store the complete sequence of updates and additionally each k-th version for a suitably chosen k. Result: Time and space requirement increases at least with a faktor sqr(m) ! Are there any better methods? …. for search trees….

21 Persistent search trees (1) Path-copying method version 0:

22 Persistent search trees (1) Path-copying method version 1: Insert (2)

23 Persistent search trees (1) Path-copying method version 1: Insert (2) version 2: Insert (4)

24 Persistent search trees (1) Path-copying method Restructuring costs: O(log n) per update operation version 1: Insert (2) version 2: Insert (4)

25 Persistent search trees (2) DSST-method: Extend each node by a time-stamped modification box ? All versions before time t All versions after time t Modification boxes initially empty are filled bottom up k t: rp lp rp

26 DSST method version 0

27 DSST method lp version 0:

28 DSST method lp version 1: Insert (2) version 2: Insert (4)

29 DSST method The amortized costs (time and space) per update operation are O(1) rp 1 lp version 1: Insert (2) version 2: Insert (4)

30 Overview Motivation: Oblivious and persistent structures Examples: Arrays, search trees, Z-stratified search trees, relaxation Making structures persistent: Structure-copying, path-copying-, DSST-method Application: Pointlocation Application: Time-evolving data: Capture and replay of whiteboard data, in particular handwriting traces Oblivious structures: Randomized and uniquely represented structures, c-level jump lists

31 Application: Planar Pointlocation Suppose that the Euclidian plane is subdivided into polygons by n line segments that intersect only at their endpoints. Given such a polygonal subdivision and an on-line sequence of query points in the plane, the planar point location problem, is to determine for each query point the polygon containing it. Measure an algorithm by three parameters: 1) The preprocessing time. 2) The space required for the data structure. 3) The time per query.

32 Planar point location -- example

33 Planar point location -- example

34 Solving planar point location (Cont.) Partition the plane into vertical slabs by drawing a vertical line through each endpoint. Within each slab the lines are totally ordered. Allocate a search tree per slab containing the lines at the leaves with each line associate the polygon above it. Allocate another search tree on the x-coordinates of the vertical lines

35 Solving planar point location (Cont.) To answer query first find the appropriate slab then search the slab to find the polygon

36 Planar point location -- example

37 Planar point location -- analysis Query time is O(log n) How about the space ?  (n 2 ) And so could be the preprocessing time

38 Planar point location -- bad example Total # lines O(n), and number of lines in each slab is O(n).

39 Planar point location & persistence So how do we improve the space bound ? Key observation: The lists of the lines in adjacent slabs are very similar. Create the search tree for the first slab. Then obtain the next one by deleting the lines that end at the corresponding vertex and adding the lines that start at that vertex How many insertions/deletions are there alltogether ? 2n

40 Planar point location & persistence (cont) Updates should be persistent since we need all search trees at the end. Partial persistence is enough. Well, we already have the path copying method, lets use it. What do we get ? O(n logn) space and O(n log n) preprocessing time. We can improve the space bound to O(n) by using the DSST method.

41 Overview Motivation: Oblivious and persistent structures Examples: Arrays, search trees, Z-stratified search trees, relaxation Making structures persistent: Structure-copying, path-copying-, DSST-method Application: Pointlocation Application: Time-evolving data: Capture and replay of whiteboard data, in particular handwriting traces Oblivious structures: Randomized and uniquely represented structures, c-level jump lists

42 Author Audience Data sources Lightweight content creation Recorded learning module Document Input media Whiteboard TouchScreen Tablet PC Time evolving data: Presentation recording

43 Cintiq Tablet (Wacom) Pen input, large display Eye contact with audience

44 Random access facility Access of an ink-object s j corresponding to time t j requires the immediate presentation of s j and of all ink-objects since t 0

45 Whiteboard data Whiteboard data-stream requires Fast insertion and deletion of graphical objects (lines, circles, pen-traces, …) in large quantities, Partially persistent storage which allows: Fast access (display and „rendering“) of all data for a given time stamp, Synchronisability (as slave) with audio-stream (master). Problem: Find a suitable method for storing the whiteboard-action stream!

46 Postprocessing Whiteboard-stream is made persistent by the structure-copying method: For each time stamp t a complete list of all objects visible on the board at time t is (pre-)computed and stored for random access. Disadvantage: Highly redundant, very large data-volume Advantage: Visible scrolling Storage and representation of freehand ink-traces: Find a suitable compromise between conflicting goals: Data-volume Access cost (time) and dynamic replay (visible scrolling) Individual, personal style Skalability (vector- vs. raster-based-representation)

47 Overview Motivation: Oblivious and persistent structures Examples: Arrays, search trees, Z-stratified search trees, relaxation Making structures persistent: Structure-copying, path-copying-, DSST-method Application: Pointlocation Application: Time-evolving data: Capture and replay of whiteboard data, in particular handwriting traces Oblivious structures: Randomized and uniquely represented structures, c-level jump lists

48 Methods for making structures oblivious Unique representation of the structure: Set/size uniqueness: For each set of n keys there is exactly one structure which can store such a set. The storage is order unique, i.e. the nodes of the strucure are ordered and the keys are stored in ascending order in nodes with ascending numbers. Randomise the structure: Assure that the expectation for the occurrence of a structure storing a set M of keys is independent of the way how M was generated. Observation: The address-assingment of pointers has to be subject under a randomised regime!

49 Example of a randomised structure Z-stratified search tree On each stratum, randomly choose the distribution of trees from Z. Insertion? Deletion? … … …….... …..

50 Uniquely represented structures (a) Generation history determines structure (b) Set-uniqueness:Set determines structure 1, 3, 5, 7 5, 1, 3, 7 1, 3, 5,

51 Uniquely represented structures (c) Size-uniqueness:Size determines structure 1, 3, 5, 7 2, 4, 5, 8 Common structure Order-uniqueness: Fixed ordering of nodes determines where the keys are to be stored

52 Set- and order-unique structures Lower bounds? Assumptions: A dictionary of size n is represented by a graph of n nodes. Node degree finite (fixed), Fixed order of the nodes, i-th node stores i-largest key. Operations allowed to change a graph: Creation | Removal of a node Pointer change Exchange of keys Theorem: For each set- and order-unique representation of a dictionary with n keys, at least one of the operations access, insertion, or deletion must require time  (n 1/3 ).

53 Uniquely represented dictionaries Problem: Find set-unique oder size-unique representations of the ADT „dictionary“ Known solutions: (1)set-unique, oder-unique Aragon/Seidel, FOCS 1989: Randomized Search Trees universal hash-function Update as for priority search trees! Search, insert, delete can be carried out in O(log n) expected time. (s, h(s)) priority s  X

54 The Jelly Fish (2) L. Snyder, 1976, set-unique, oder-unique Upper Bound: Jelly Fish, search insert delete in time O(  n). body:  n nodes  n tentacles of length  n each

55 Lower bound for tree-based structures set-unique, oder-unique Lower bound: For “ tree-based ” structures the following holds: Update-time · Search-time = Ω (n) Number of nodes n ≤ h  L + 1 L ≥ (n – 1)/h At least L-1 keys must have moved from leaves to internal nodes. Therefore, update requires time Ω(L). Delete x 1 Insert x n+1 > x n L leaves · x n x 1 h

56 Cons-structures (3) Sunder/Tarjan, STOC 1990, Upper bound: (Nearly) full, binary search trees Einzige erlaubte Operation für Updates: Search time O(log n) Einfügen Entfernen in Zeit O(  n) möglich · · · · L R x LR x Cons,,

57 Jump-lists (Half-dynamic) 2-level jump-list 2-level jump-liste of size n Search:O(i) = O( ) time Insertion: Deletion: O( ) time tail 0i2in (n-1)/i·i

58 Jump-lists: Dynamization 2-level-jump-list of size n search:O(i) = O(  n) time insert delete : O(  n) time Can be made fully dynamic: (i-1) 2 i2i2 n(i+1) 2 (i+2) 2

59 3-level jump-lists level 2 Search(x): locate x by following level-2-pointers identifying i 2 keys among which x may occur, level-1-pointers identifying i keys among which x may occur, level-0-pointers identifying x time: O(i) = O(n 1/3 ) 0i2ii 2 i 2 +i2·i 2

60 3-level jump-lists level 2 Update requires Changing of 2 pointers on level 0 Changing of i pointers on level 1 Changing of all i pointers onlevel 2 Update time O(i) = O(n 1/3 ) 0i2ii 2 i 2 +i2·i 2

61 c-level jump-lists Let Lower levels: level 0: all pointers of length 1:... level j: all pointers of legth i j-1 :... level c/2 :... Upper levels: level j: connect in a in list all nodes 1, 1·i j-1 +1, 2· i j-1 +1, 3· i j-1 +1,... level c:

62 c-level jump-lists Theorem: For each c ≥ 3, the c-level jump-list is a size and order-unique representation of dictionaries with the following characteristics: Space requirement O(c·n) Access time O(c·n 1/c ) Update time, if n is even, if n is odd

63 1 top-level tree with  n leaves All low-level trees for each sequence of  n consecutive keys Top-level tree direct search to the root of the currently active low-level trees Semi-dynamic structure: low-level-tree-size s+1 = top-level-tree-size Shared-search-trees Reduction of search time

64 Shared-search-trees Pointers at: Level 0: (p-2 0 ) p (p+ 2 0 ) Level 1: (p-2 1 ) p (p+ 2 1 ) … Level k-2: (p-2 k-2 ) p (p+ 2 k-2 ) 2(k-1) Pointers per node p, k = O(log n) Search time O(log n) SpaceO(n log n)

65 Insertion:Determine insertion position; Change all pointers jumping over the insertion position; Add 2 new pointers per level; Completely rebuild top-level tree.

66 Number of pointerchanges:level 0: 2 · level 1: 2 · … level k-2:2 · 2 k · (2 k-1 -1)+2(k-1) =

67 Shared search trees: Summary Theorem: Shared search trees are a size- und order-unique representation of dictionaries with the following characteristics: Space requirement: O(n log n) Search time: O(log n) Upadate time: O(  n ) Open problem: Is there a size- and order-unique representation of by graphs with bounded node degree, search time O(log n), and update time o(n) (e.g.. O(  n))?