I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.

Slides:



Advertisements
Similar presentations
Temporal Databases S. Srinivasa Rao April 12, 2007
Advertisements

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Orthogonal range searching. The problem (1-D) Given a set of points S on the line, preprocess them to build structure that allows efficient queries of.
Chapter 4: Trees Part II - AVL Tree
Transform and Conquer Chapter 6. Transform and Conquer Solve problem by transforming into: a more convenient instance of the same problem (instance simplification)
An Optimal Dynamic Interval Stabbing-Max Data Structure? Pankaj K. Agarwal, Lars Arge and Ke Yi Department of Computer Science Duke University.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Dynamic Planar Convex Hull Operations in Near- Logarithmic Amortized Time TIMOTHY M. CHAN.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
External Memory Geometric Data Structures
Multiversion Access Methods - Temporal Indexing. Basics A data structure is called : Ephemeral: updates create a new version and the old version cannot.
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge Spring 2009 April 28, 2009.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
Geometric Data Structures Computational Geometry, WS 2007/08 Lecture 13 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen, Institut für Informatik.
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj.
I/O-Algorithms Lars Arge Aarhus University April 16, 2008.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University March 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
I/O-Algorithms Lars Arge Aarhus University March 6, 2007.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part II Prof. Dr. Th. Ottmann Summer Semester 2006.
Lecture 11 : More Geometric Data Structures Computational Geometry Prof. Dr. Th. Ottmann 1 Geometric Data Structures 1.Rectangle Intersection 2.Segment.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
CS4432: Database Systems II
1 Geometric Intersection Determining if there are intersections between graphical objects Finding all intersecting pairs Brute Force Algorithm Plane Sweep.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Chapter 19: Binary Trees. Objectives In this chapter, you will: – Learn about binary trees – Explore various binary tree traversal algorithms – Organize.
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
1 Trees 4: AVL Trees Section 4.4. Motivation When building a binary search tree, what type of trees would we like? Example: 3, 5, 8, 20, 18, 13, 22 2.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
(2,4) Trees1 What are they? –They are search Trees (but not binary search trees) –They are also known as 2-4, trees.
External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets.
X1x1 x2x2 top-k y 3-sided x1x1 x2x2 External Memory Three-Sided Range Reporting and Top-k Queries with Sublogarithmic Updates Gerth Stølting Brodal Aarhus.
Michal Balas1 I/O-efficient Point Location using Persistent B-Trees Lars Arge, Andrew Danner, and Sha-Mayn Teh Department of Computer Science, Duke University.
arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
Spatial Data Management
Multiway Search Trees Data may not fit into main memory
B+ Tree.
Segment tree and Interval Tree
Advanced Topics in Data Management
Orthogonal Range Searching and Kd-Trees
Wednesday, April 18, 2018 Announcements… For Today…
B+-Trees and Static Hashing
STACS arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Spatial Indexing I R-trees
8th Workshop on Massive Data Algorithms, August 23, 2016
Lecture 20: Indexes Monday, February 27, 2006.
Tree-Structured Indexes
Presentation transcript:

I/O-Algorithms Lars Arge Fall 2014 September 25, 2014

Lars Arge I/O-algorithms 2 I/O-Model Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory T = # output size in searching problem We often assume that M>B 2 I/O: Movement of block between memory and disk D P M Block I/O

Lars Arge I/O-algorithms 3 Fundamental Bounds Internal External Scanning: N Sorting: N log N Permuting Searching:

Lars Arge I/O-algorithms 4 Fundamental Data Structures B-trees: Node degree  (B)  queries in –Rebalancing using split/fuse  updates in Weight-balanced B-tress: Weight rather than degree constraint  Ω(w(v)) updates below v between rebalancing operations on v Persistent B-trees: –Update in current version in –Search in all previous versions in Buffer trees –Batching of operations to obtain bounds  construction algorithms

Lars Arge I/O-algorithms 5 Problem: –Maintain N intervals with unique endpoints dynamically such that stabbing query with point x can be answered efficiently As in (one-dimensional) B-tree case we are interested in – space – update – query Interval Management x

Lars Arge I/O-algorithms 6 Interval Management: Static Solution Sweep from left to right maintaining persistent B-tree –Insert interval when left endpoint is reached –Delete interval when right endpoint is reached Query x answered by reporting all intervals in B-tree at “time” x – space – query – construction using buffer technique x

Lars Arge I/O-algorithms 7 Base tree on endpoints – “slab” X v associated with each node v Interval stored in highest node v where it contains midpoint of X v Intervals I v associated with v stored in –Left slab list sorted by left endpoint (search tree) –Right slab list sorted by right endpoint (search tree)  Linear space and O(log N) update (assuming fixed endpoint set) Internal Interval Tree

Lars Arge I/O-algorithms 8 Query with x on left side of midpoint of X root –Search left slab list left-right until finding non-stabbed interval –Recurse in left child  O(log N+T) query bound x Internal Interval Tree

Lars Arge I/O-algorithms 9 Externalizing Interval Tree Natural idea: –Block tree –Use B-tree for slab lists Number of stabbed intervals in large slab list may be small (or zero) –We can be forced to do I/O in each of O(log N) nodes

Lars Arge I/O-algorithms 10 Externalizing Interval Tree Idea: –Decrease fan-out to  height remains – slabs define multislabs –Interval stored in two slab lists (as before) and one multislab list –Intervals in small multislab lists collected in underflow structure –Query answered in v by looking at 2 slab lists and not O(log B) multislab

Lars Arge I/O-algorithms 11 Base tree: Weight-balanced B-tree with branching parameter and leaf parameter B on endpoints –Interval stored in highest node v where it contains slab boundary Each internal node v contains: –Left slab list for each of slabs –Right slab lists for each of slabs – multislab lists –Underflow structure Interval in set I v of intervals associated with v stored in –Left slab list of slab containing left endpoint –Right slab list of slab containing right endpoint –Widest multislab list it spans If < B intervals in multislab list they are instead stored in underflow structure (  contains ≤ B 2 intervals) External Interval Tree v $m$ blocks v

Lars Arge I/O-algorithms 12 External Interval tree Each leaf contains <B/2 intervals (unique endpoint assumption) –Stored in one block Slab lists implemented using B-trees – query –Linear space *We may “wasted” a block for each of the lists in node *But only internal nodes Underflow structure implemented using static structure – query –Linear space  Linear space v

Lars Arge I/O-algorithms 13 External Interval Tree Query with x –Search down tree for x while in node v reporting all intervals in I v stabbed by x In node v –Query two slab lists –Report all intervals in relevant multislab lists –Query underflow structure Analysis: –Visit nodes –Query slab lists –Query multislab lists –Query underflow structure $m$ blocks v

Lars Arge I/O-algorithms 14 External Interval Tree Update – ignoring base tree update/rebalancing: –Search for relevant node –Update two slab lists –Update multislab list or underflow structure Update of underflow structure in O(1) I/Os amortized: –Maintain update block with ≤ B updates –Check of update block adds O(1) I/Os to query bound –Rebuild structure when B updates have been collected using I/Os (Global rebuilding)  Update in I/Os amortized v

Lars Arge I/O-algorithms 15 External Interval Tree Note: –Insert may increase number of intervals in underflow structure for some multislab to B –Delete may decrease number of intervals in multislab to B  Need to move B intervals to/from multislab/underflow structure We only move –Intervals from multislab list when decreasing to size B/2 –Intervals to multislab list when increasing to size B  O(1) I/Os amortized used to move intervals

Lars Arge I/O-algorithms 16 Base Tree Update Before inserting new interval we insert new endpoints in base tree using I/Os –Leads to rebalancing using splits  Boundary in v becomes boundary in parent(v)  Intervals need to be moved Move intervals (update secondary structures) in O(w(v)) I/Os  O(1) amortized split bound (weight balanced B-tree)  amortized insert bound v v’’ v’

Lars Arge I/O-algorithms 17 Splitting Interval Tree Node When v splits we may need to move O(w(v)) intervals –Intervals in v containing boundary –Intervals in parent(v) with endpoints in X v containing boundary Intervals move to two new slab and multislab lists in parent(v)

Lars Arge I/O-algorithms 18 Splitting Interval Tree Node Moving intervals in v in O(w(v)) I/Os –Collect in left order (and remove) by scanning left slab lists –Collect in right order (and remove) by scanning right slab lists –Remove multislab lists containing boundary –Remove from underflow structure by rebuilding it –Construct lists and underflow structure for v’ and v’’ similarly

Lars Arge I/O-algorithms 19 Splitting Interval Tree Node Moving intervals in parent(v) in O(w(v)) I/Os –Collect in left order by scanning left slab list –Collect in right order by scanning right slab list –Merge with intervals collected in v  two new slab lists –Construct new multislab lists by splitting relevant multislab list –Insert intervals in small multislab lists in underflow structure

Lars Arge I/O-algorithms 20 External Interval Tree Split in O(1) I/Os amortized –Space: O(N/B) –Query: –Insert: I/Os amortized Deletes in I/Os amortized using global rebuilding: –Delete interval as previously using I/Os –Mark relevant endpoint as deleted –Rebuild structure in after N/2 deletes Note: Deletes can also be handled using fuse operations $m$ blocks v

Lars Arge I/O-algorithms 21 External Interval Tree External interval tree –Space: O(N/B) –Query: –Updates: I/Os amortized Removing amortization: –Moving intervals to/from underflow structure –Delete global rebuilding –Underflow structure update –Base node tree splits v Perform operations/construction lazily Move lazily – complicated: Interference Queries

Lars Arge I/O-algorithms 22 Summary/Conclusion: Interval Management Interval management corresponds to simple form of 2d range search –Diagonal corner queries We obtained the same bounds as for the 1d case –Space: O(N/B) –Query: –Updates: I/Os (x,x) (x 1,x 2 ) x x1x1 x2x2

Lars Arge I/O-algorithms 23 Summary/Conclusion: Interval Management Main problem in designing structure: –Binary  large fan-out Large fan-out resulted in the need for –Multislabs and multislab lists –Underflow structure to avoid O(B)-cost in each node General solution techniques: –Filtering: Charge part of query cost to output –Bootstrapping: *Use O(B 2 ) size structure in each internal node *Constructed using persistence *Dynamic using global rebuilding –Weight-balanced B-tree: Split/fuse in amortized O(1)

Lars Arge I/O-algorithms 24 Three-Sided Range Queries Interval management: “1.5 dimensional” search More general 2d problem: Dynamic 3-sidede range searching –Maintain set of points in plane such that given query (q 1, q 2, q 3 ), all points (x,y) with q 1  x  q 2 and y  q 3 can be found efficiently (x,x) (x 1,x 2 ) x x1x1 x2x2 q3q3 q2q2 q1q1

Lars Arge I/O-algorithms 25 Three-Sided Range Queries Report all points (x,y) with q 1  x  q 2 and y  q 3 Static solution: –Sweep top-down inserting x in persistent B-tree at (x,y) –Answer query by performing range query with [q 1,q 2 ] in B-tree at q 3 Optimal: –O(N/B) space –O(log B N+T/B) query – construction Dynamic? … in internal memory: priority search tree

Lars Arge I/O-algorithms 26 Base tree on x-coordinates with nodes augmented with points Heap on y-coordinates –Decreasing y values on root-leaf path –(x,y) on path from root to leaf holding x –If v holds point then parent(v) holds point Internal Priority Search Tree , , ,3 4 5,6 5 9,4 1 1, ,1 1

Lars Arge I/O-algorithms 27 Linear space Insert of (x,y) (assuming fixed x-coordinate set): –Compare y with y-coordinate in root –Smaller: Recursively insert (x,y) in subtree on path to x –Bigger: Insert in root and recursively insert old point in subtree  O(log N) update Internal Priority Search Tree , , ,3 4 5,6 5 9,4 1 1, ,1 1 Insert (10,21) 10,21

Lars Arge I/O-algorithms 28 Internal Priority Search Tree Query with (q 1, q 2, q 3 ) starting at root v: –Report point in v if satisfying query –Visit both children of v if point reported –Always visit child(s) of v on path(s) to q 1 and q 2  O(log N+T) query , , ,3 4 5,6 5 9,4 1 1, ,

Lars Arge I/O-algorithms 29 Natural idea: Block tree Problem: – I/Os to follow paths to to q 1 and q 2 –But O(T) I/Os may be used to visit other nodes (“overshooting”)  query Externalizing Priority Search Tree , , ,3 4 5,6 5 9,4 1 1, ,1 1

Lars Arge I/O-algorithms 30 Externalizing Priority Search Tree Solution idea: –Store B points in each node  *O(B 2 ) points stored in each supernode *B output points can pay for “overshooting” –Bootstrapping: *Store O(B 2 ) points in each supernode in static structure , , ,3 4 5,6 5 9,4 1 1, ,1 1

Lars Arge I/O-algorithms 31 External Priority Search Tree Base tree: Weight-balanced B-tree with branching parameter B/4 and leaf parameter B on x-coordinates Points in “heap order”: –Root stores B top points for each of the child slabs –Remaining points stored recursively Points in each node stored in “B 2 -structure” –Persistent B-tree structure for static problem  Linear space

Lars Arge I/O-algorithms 32 External Priority Search Tree Query with (q 1, q 2, q 3 ) starting at root v: –Query B 2 -structure and report points satisfying query –Visit child v if *v on path to q 1 or q 2 *All points corresponding to v satisfy query

Lars Arge I/O-algorithms 33 External Priority Search Tree Analysis: – I/Os used to visit node v – nodes on path to q 1 or q 2 –For each node v not on path to q 1 or q 2 visited, B points reported in parent(v)  query

Lars Arge I/O-algorithms 34 External Priority Search Tree Insert (x,y) (ignoring insert in base tree - rebalancing): –Find relevant node u: *Query B 2 -structure to find B points in root corresponding to node u on path to x *If y smaller than y-coordinates of all B points then recursively search in u –Insert (x,y) in B 2 -structure of v –If B 2 -structure contains >B points for child u, remove lowest point and insert recursively in u Delete: Similarly u

Lars Arge I/O-algorithms 35 Analysis: –Update visits nodes –B 2 -structure queried/updated in each node *One query *One insert and one delete B 2 -structure analysis: –Query: –Update: O(1) using global rebuilding *Store updates in update block *Rebuild after B updates using I/Os  I/O updates External Priority Search Tree u

Lars Arge I/O-algorithms 36 Dynamic Base Tree Deletion: –Delete point as previously –Delete x-coordinate from base tree using global rebuilding  I/Os amortized Insertion: –Insert x-coordinate in base tree and rebalance (using splits) –Insert point as previously Split: Boundary in v becomes boundary in parent(v) v v’’ v’

Lars Arge I/O-algorithms 37 Dynamic Base Tree Split: When v splits B new points needed in parent(v) One point obtained from v’ (v’’) using “bubble-up” operation: –Find top point p in v’ –Insert p in B 2 -structure –Remove p from B 2 -structure of v’ –Recursively bubble-up point to v’ Bubble-up in I/Os –Follow one path from v to leaf –Uses O(1) I/O in each node  Split in I/Os v’’ v’

Lars Arge I/O-algorithms 38 Dynamic Base Tree O(1) amortized split cost: –Cost: O(w(v)) –Weight balanced base tree: inserts below v between splits  External Priority Search Tree –Space: O(N/B) –Query: –Updates: I/Os amortized Amortization can be removed from update bound in several ways –Utilizing lazy rebuilding v’’ v’

Lars Arge I/O-algorithms 39 Summary/Conclusion: Priority Search Tree We have now discussed structures for special cases of two- dimensional range searching –Space: O(N/B) –Query: –Updates: Cannot be obtained for general (4-sided) 2d range searching: – query requires space – space requires query q3q3 q2q2 q1q1 q q q3q3 q2q2 q1q1 q4q4

Lars Arge I/O-algorithms 40 References External Memory Geometric Data Structures Lecture notes by Lars Arge. –Section 6-7