1 Algorithmic Aspects of Searching in the Past Christine Kupich Institut für Informatik, Universität Freiburg Lecture 1: Persistent Data Structures Advanced.

Slides:



Advertisements
Similar presentations
Chapter 13. Red-Black Trees
Advertisements

Planar point location -- example
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Dynamic Planar Convex Hull Operations in Near- Logarithmic Amortized Time TIMOTHY M. CHAN.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
1 Persistent data structures. 2 Ephemeral: A modification destroys the version which we modify. Persistent: Modifications are nondestructive. Each modification.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Chapter 7 Data Structure Transformations Basheer Qolomany.
Multiversion Access Methods - Temporal Indexing. Basics A data structure is called : Ephemeral: updates create a new version and the old version cannot.
WS Prof. Dr. Th. Ottmann Algorithmentheorie 16 – Persistenz und Vergesslichkeit.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 11.
Update 1 Persistent Data Structures (Version Control) v0v0 v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 Ephemeral query v0v0 v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 Partial persistence.
BTrees & Bitmap Indexes
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Persistent Data Structures Computational Geometry, WS 2007/08 Lecture 12 Prof. Dr. Thomas Ottmann Khaireel A. Mohamed Algorithmen & Datenstrukturen, Institut.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
1 Algorithmic Aspects of Searching in the Past Thomas Ottmann Institut für Informatik, Universität Freiburg, Germany
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
Fully Persistent B-Trees 23 rd Annual ACM-SIAM Symposium on Discrete Algorithms, Kyoto, Japan, January 18, 2012 Gerth Stølting Brodal Konstantinos Tsakalidis.
Department of Computer Eng. & IT Amirkabir University of Technology (Tehran Polytechnic) Data Structures Lecturer: Abbas Sarraf Search.
Trees and Red-Black Trees Gordon College Prof. Brinton.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Point Location Computational Geometry, WS 2007/08 Lecture 5 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen, Institut für Informatik Fakultät für.
Dynamic Set AVL, RB Trees G.Kamberova, Algorithms Dynamic Set ADT Balanced Trees Gerda Kamberova Department of Computer Science Hofstra University.
1 Persistent data structures. 2 Ephemeral: A modification destroys the version which we modify. Persistent: Modifications are nondestructive. Each modification.
Lecture 6: Point Location Computational Geometry Prof. Dr. Th. Ottmann 1 Point Location 1.Trapezoidal decomposition. 2.A search structure. 3.Randomized,
CSC 212 Lecture 19: Splay Trees, (2,4) Trees, and Red-Black Trees.
General Trees and Variants CPSC 335. General Trees and transformation to binary trees B-tree variants: B*, B+, prefix B+ 2-4, Horizontal-vertical, Red-black.
Important Problem Types and Fundamental Data Structures
CS 61B Data Structures and Programming Methodology Aug 11, 2008 David Sun.
Orthogonal Range Searching I Range Trees. Range Searching S = set of geometric objects Q = query object Report/Count objects in S that intersect Q Query.
1 Geometric Intersection Determining if there are intersections between graphical objects Finding all intersecting pairs Brute Force Algorithm Plane Sweep.
UNC Chapel Hill M. C. Lin Point Location Reading: Chapter 6 of the Textbook Driving Applications –Knowing Where You Are in GIS Related Applications –Triangulation.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
UNC Chapel Hill M. C. Lin Orthogonal Range Searching Reading: Chapter 5 of the Textbook Driving Applications –Querying a Database Related Application –Crystal.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
Computer Algorithms Submitted by: Rishi Jethwa Suvarna Angal.
Mudasser Naseer 1 10/20/2015 CSC 201: Design and Analysis of Algorithms Lecture # 11 Red-Black Trees.
1 Red-Black Trees By Mary Hudachek-Buswell Red Black Tree Properties Rotate Red Black Trees Insertion Red Black Trees.
Red-Black Tree Algorithm : Design & Analysis [12].
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Chapter 9 Binary Tree and General Tree. Overview ● Two-way decision making is one of the fundamental concepts in computing.  A binary tree models two-way.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 9.
Fall 2006 CSC311: Data Structures 1 Chapter 10: Search Trees Objectives: Binary Search Trees: Search, update, and implementation AVL Trees: Properties.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Data Structures Chapter 10: Efficient Binary Search Trees 10-1.
Four different data structures, each one best in a different setting. Simple Heap Balanced Heap Fibonacci Heap Incremental Heap Our results.
3.1. Binary Search Trees   . Ordered Dictionaries Keys are assumed to come from a total order. Old operations: insert, delete, find, …
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
AVL trees1 AVL Trees Height of a node : The height of a leaf is 1. The height of a null pointer is zero. The height of an internal node is the maximum.
CSE Advanced Algorithms Instructor : Gautam Das Submitted by Raja Rajeshwari Anugula & Srujana Tiruveedhi.
February 17, 2005Lecture 6: Point Location Point Location (most slides by Sergi Elizalde and David Pritchard)
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
1 Binary Search Trees   . 2 Ordered Dictionaries Keys are assumed to come from a total order. New operations: closestKeyBefore(k) closestElemBefore(k)
Michal Balas1 I/O-efficient Point Location using Persistent B-Trees Lars Arge, Andrew Danner, and Sha-Mayn Teh Department of Computer Science, Duke University.
Computational Geometry
Multiway Search Trees Data may not fit into main memory
Persistent Data Structures (Version Control)
Temporal Indexing MVBT.
Temporal Indexing MVBT.
Binary Tree and General Tree
Red-Black Trees Motivations
Priority Queues MakeQueue create new empty queue
Multi-Way Search Trees
Lecture 9 Algorithm Analysis
Lecture 9 Algorithm Analysis
Lecture 9 Algorithm Analysis
Presentation transcript:

1 Algorithmic Aspects of Searching in the Past Christine Kupich Institut für Informatik, Universität Freiburg Lecture 1: Persistent Data Structures Advanced Topics in Algorithms & Data Structures

2 Overview Motivation Example: Natural search trees Making data structures partially persistent Example: Partially persistent red-black trees An application: Point location An application: Grounded 2-dimensional range searching Making data structures fully persistent

3 Motivation Ephemeral: no mechanism to revert to previous states A structure is called persistent, if it supports access to multiple versions. Partially persistent: All versions can be accessed but only the newest version can be modified. Fully persistent: All versions can be accessed and modified. Confluently persistent: Two or more old versions can be combined into one new version. Oblivious: The data structure yields no knowledge about the sequence of operations that have been applied to it other than the final result of the operations.

4 Example: Natural search trees Only partially oblivious! Insertion history can sometimes be reconstructed. Deleted keys are not visible

5 Simple methods for making structures persistent Structure-copying method: Make a copy of the data structure each time it is changed. Yields full persistence at the price of  (n) time and space per update to a structure of size n Store a log-file of all updates! In order to access version i, first carry out i updates, starting with the initial structure, and generate version i.  (i) time per access, O(1) space and time per update Hybrid-method: Store the complete sequence of updates and additionally each k-th version for a suitably chosen k. Result: Any choice of k causes blowup in either storage space or access time Are there any better methods?

6 Making data structures persistent Several constructions to make various data structures persistent have been devised, but no general approach has been taken until the seminal paper by Driscoll, Sarnak, Sleator and Tarjan, They propose methods to make linked data structures partially as well as fully persistent. Let’s first have a look at how to make structures partially persistent

7 Fat node method - partial persistence Record all changes made to node fields in the nodes Each fat node contains same fields as ephemeral node and a version stamp Add a modification history to every node: each field in a node contains a list of version-value pairs

8 Fat node method - partial persistence Modifications Ephemeral update step i creates new node: create a new fat node with version stamp i and original field values Ephemeral update step i changes a field: store the field value plus a timestamp Each node knows what its value was at any previous point in time Access field f of version i Choose the value with maximum version stamp no greater than i

9 Fat node method - analysis Time cost per access gives O(log m) slowdown per node (using binary search on the modification history) Time and Space cost per update step is O(1) (to store the modification along with the timestamp at the end of the modification history)

10 Fat node method - Example A partially persistent search tree. Insertions:5,3,13,15,1,9,7,11,10, followed by deletion of item

11 Path-copying method - partial persistence Make a copy of the node before changing it to point to the new child. Cascade the change back until root is reached. Restructuring costs O(height_of_tree) per update operation Every modification creates a new root Maintain an array of roots indexed by timestamp.

12 Path-copying method - Example version 0: version 1: Insert (2) version 2: Insert (4)

13 Path-copying method - Example version 0:

14 Path-copying method - partial persistence version 1: Insert (2)

15 Path-copying method - partial persistence version 1: Insert (2) version 2: Insert (4)

16 Node-copying method - partial persistence Extend each node by a time-stamped modification box (initially empty) Version before the modification time t Version at/ after time t k t: rp lp rp Searching in version j Follow an entry pointer with largest version number i, i <= j Compare keys and follow newest pointer no greater than j

17 Node-copying method - partial persistence version 0 version 1: Insert (2) version 2: Insert (4)

18 Node-copying method - partial persistence lp version 0: version 1: Insert (2)

19 Node-copying method - partial persistence lp version 1: Insert (2) version 2: Insert (4)

20 Node-copying method - partial persistence rp 1 lp version 1: Insert (2) version 2: Insert (4)

21 Node-copying method - partial persistence Modification If modification box empty, fill it. Otherwise, make a copy of the node, using only the latest values, i.e. value in modification box plus the value we want to insert, without using modification box Cascade this change to the node’s parent If the node is a root, add the new root to a sorted array of roots Access time gets O(1) slowdown per node, plus additive O(log m) cost for finding the correct root

22 Node-copying method - Example A partially persistent search tree. Insertions: 5,3,13,15,1,9,7,11,10, followed by deletion of item

23 Node-copying method - partial persistence The amortized costs (time and space) per modification are O(1). Proof: Using the potential technique

24 Potential technique The potential is a function of the entire data structure Definition potential function: A measure of a data structure whose change after an operation corresponds to the time cost of the operation The initial potential has to be equal to zero and non-negative for all versions The amortized cost of an operation is the actual cost plus the change in potential Different potential functions lead to different amortized bounds

25 Node-copying method - partial persistence Definitions Live nodes: they form the latest version ( reachable from the root of the most recent version), dead otherwise Full live nodes: live nodes whose modification boxes are full

26 Node-copying method - potential paradigm The potential function f (T): the number of full live nodes in T (initially zero) The amortized cost of an operation is the actual cost plus the change in potential Δ f =? Each modification involves k number of copies, each with a O(1) space and time cost, and one change to a modification box with O(1) time cost Change in potential after update operation i: Δ f = Space: O(k + Δ f), time: O(k Δ f) Hence, a modification takes O(1) amortized space and O(1) amortized time

27 Red-black trees Constraints All missing nodes are regarded as black Any red node has a black parent From any node, all paths to a missing node contain the same number of black nodes Depth of an n-node red-black tree is at most 2 log n Root is colored black

28 Red-black trees Rebalancing transformations - insertion bubble the violation up the tree recolor

29 Red-black trees Rebalancing transformations - insertion 11 rr 4. lr + recolor parent and gran-parent 3. leaving no inconsistency An insertion requires O(log n) recolorings plus at most 2 rotations Case 3.

30 Red-black trees - partial persistence A red-black tree can be made partially persistent using the node copying method at an amortized space cost of O(1) per insertion or deletion and a worst-case time cost of O(log n) per access, insertion or deletion. Each node contains: a key 2 pointers for the successors a color bit and an extra pointer (version stamp, direction) Colors are not used in access operations. Old colors can be overwritten

31 Red-black trees - partial persistence An Example: insert E, C, M, O, N

32 Red-black trees - partial persistence An Example: insert E, C, M, O, N recolor E C 1-2 r,b 2 r E 1 Insert C Insert M E M 3 b r E C 1-2 r,b 2 r O E M 3-4 b r 4 r E C 1-2 r,b 2 r O E M 3-4 b,r,b r,b 4 r E C 1-2 r,b 2 Insert O

33 Red-black trees - partial persistence Insert N O E M 3-4 b r,b 4 r E C 1-2 r,b 2 N r O E M 3-4 b r,b 4 r E C 1-2 r,b 2 5 RR O r N M r,b r

34 O r N M r,b r LR + recolor O r N M r,b r N r O E M 3-4 b r,b 4 r E C 1-2 r,b 2 5 N O E M 3-5 b r,b 4 r E C 1-2 r,b 2 5 M r

35 Application: Grounded 2-Dimensional Range Searching Given a set of points, and a query triple (a,b,i) Report the set of points a<x<b and y<i. ab i x y

36 Application: Grounded 2-Dimensional Range Searching ab i To answer a query: Report all points in version i whose x-coordinates are in [a,b]. Query time? Persistent red-black tree: Space ?, preprocessing time ? Version i contains every point for which y<i. Use x-coordinates as keys.

37 1-Dimensional Range Search

38 Application: Planar point location Suppose that the Euclidian plane is subdivided into polygons by n line segments that intersect only at their endpoints. Given such a polygonal subdivision and an on-line sequence of query points in the plane, the planar point location problem, is to determine for each query point the polygon containing it. Measure an algorithm by three parameters: 1) The preprocessing time. 2) The space required for the data structure. 3) The time per query.

39 Planar point location - example

40 Solving planar point location (Cont.) Dobkin-Lipton: Partition the plane into vertical slabs by drawing a vertical line through each endpoint. Within each slab the lines are totally ordered. Allocate a search tree per slab containing the lines and with each line associate the polygon above it. Allocate another search tree on the x-coordinates of the vertical lines

41 Planar point location -- example

42 Solving planar point location (Cont.) To answer a query: first find the appropriate slab then search the slab to find the polygon Query time is O(log n) How about the space ?

43 Planar point location -- bad example Total # lines O(n), and number of lines in each slab is O(n).

44 Planar point location & persistence So how do we improve the space bound ? Key observation: The lists of the lines in adjacent slabs are very similar. Create the search tree for the first slab. Then obtain the next one by deleting the lines that end at the corresponding vertex and adding the lines that start at that vertex How many insertions/deletions are there all together ? 2n (One insertion and one deletion per segment)

45 Planar point location & persistence (cont) Updates should be persistent since we need all search trees at the end. Partial persistence is enough. Well, we already have the path copying method, lets use it. What do we get ? O(n log n) space and O(n log n) preprocessing time. Using the node-copying method, we can improve the space bound to O(n).

46 Making data structures fully persistent With this type of persistence the versions don't form a simple linear path, they form a version tree (since you can also update in the past). Lack of linear ordering. Impose a total ordering on the versions (version list) The version list defines a preorder on the version tree (for navigation): for any version i, the descendants of i in the version tree occur consecutively in the version list, starting with i version list: A version tree

47 Making data structures fully persistent iA iC iGiAiM iI iK dE iM 9 dM iO 5 12 iE Search tree versions:

48 Full persistence It must be possible to: perform insertions in the version list and given two versions i and j, determine whether i precedes or follows j in the version list This list order problem has been addressed by Dietz and Sleator order queries are answered in O(1) worst case time with an O(1) amortized time bound for insertion

49 Fat node method - full persistence Each fat node contains same fields as ephemeral node plus space for extra fields (each with a field name and a version stamp) Each field in a node contains a list of version-value pairs Access Versions are compared with respect to their position in the version list, not with respect to their numeric values Access a field in version i: search for the version stamp rightmost in version list, but not to the right of i

50 Fat node method - Example iA iC iGiAiM iI iK dE iM 9 dM iO 5 12 iE Version list: 1,6,7,10,11,2,8,9,3,4,5,12 E AC A G K MM I O 1-10,121-10, , 12 A fully persistent search tree

51 Fat node method - full persistence Update operation i Add i to the version list Update step creates new node: create new fat node with original field values (stamp i) Update step changes a field f: we have to guarantee that the new value of f will be used only in version i Time cost per access and update step O(log m), provided each set of field values is stored in a search tree, ordered by version stamp Space cost Worst-case space cost per update step is O(1)

52 Applications Partially persistent balanced search trees give a simple solution to the planar point location problem, the grounded 2- dimensional range searching problem, … can be used as a substitute for Chazelle‘s hive graph (geometric retrieval) Fully persistent data structures can be used for the binary dispatching problem (OO – languages: find for a invocation the most specific applicable method) text editing Oblivious data structures cryptography

53 References J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan: Making data structures persistent. Journal of Computer and System Sciences, 38:86-124, Final version. N. Sarnak, R. E. Tarjan. Planar Point Location Using Persistent Search Trees: Communications of the ACM,29:669 – 679, July D. Micciancio: Oblivious Data Structures: Applications to Cryptography.1997.