I/O-Algorithms Lars Arge Aarhus University March 9, 2006.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

Temporal Databases S. Srinivasa Rao April 12, 2007
An Optimal Dynamic Interval Stabbing-Max Data Structure? Pankaj K. Agarwal, Lars Arge and Ke Yi Department of Computer Science Duke University.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Dynamic Planar Convex Hull Operations in Near- Logarithmic Amortized Time TIMOTHY M. CHAN.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
External Memory Geometric Data Structures
Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus.
2-dimensional indexing structure
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
Optimal Planar Point Enclosure Indexing Lars Arge, Vasilis Samoladas and Ke Yi Department of Computer Science Duke University Technical University of Crete.
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Aarhus University March 16, 2006.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
I/O-Algorithms Lars Arge Spring 2009 January 27, 2009.
I/O-Algorithms Lars Arge Spring 2007 January 30, 2007.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge Spring 2009 April 28, 2009.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Spring 2006 February 2, 2006.
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
I/O-Algorithms Lars Arge Aarhus University March 6, 2007.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
AALG, lecture 11, © Simonas Šaltenis, Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees and.
Orthogonal Range Searching I Range Trees. Range Searching S = set of geometric objects Q = query object Report/Count objects in S that intersect Q Query.
1 Geometric Intersection Determining if there are intersections between graphical objects Finding all intersecting pairs Brute Force Algorithm Plane Sweep.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
UNC Chapel Hill M. C. Lin Orthogonal Range Searching Reading: Chapter 5 of the Textbook Driving Applications –Querying a Database Related Application –Crystal.
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
14/13/15 CMPS 3130/6130 Computational Geometry Spring 2015 Windowing Carola Wenk CMPS 3130/6130 Computational Geometry.
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
2IL50 Data Structures Fall 2015 Lecture 9: Range Searching.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Lecture 3: External Memory Indexing Structures (Contd) CS6931 Database Seminar.
External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets.
CMPS 3130/6130 Computational Geometry Spring 2015
February 17, 2005Lecture 6: Point Location Point Location (most slides by Sergi Elizalde and David Pritchard)
UNC Chapel Hill M. C. Lin Geometric Data Structures Reading: Chapter 10 of the Textbook Driving Applications –Windowing Queries Related Application –Query.
Michal Balas1 I/O-efficient Point Location using Persistent B-Trees Lars Arge, Andrew Danner, and Sha-Mayn Teh Department of Computer Science, Duke University.
arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
Computational Geometry
CMPS 3130/6130 Computational Geometry Spring 2017
Advanced Topics in Data Management
Orthogonal Range Searching and Kd-Trees
R-tree: Indexing Structure for Data in Multi-dimensional Space
STACS arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
8th Workshop on Massive Data Algorithms, August 23, 2016
Presentation transcript:

I/O-Algorithms Lars Arge Aarhus University March 9, 2006

Lars Arge I/O-algorithms 2 I/O-Model Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory K = # output size in searching problem We often assume that M>B 2 I/O: Movement of block between memory and disk D P M Block I/O

Lars Arge I/O-algorithms 3 Fundamental Bounds Internal External Scanning: N Sorting: N log N Permuting Searching:

Lars Arge I/O-algorithms 4 Fundamental Data Structures B-trees: Node degree  (B)  queries in –Rebalancing using split/fuse  updates in Weight-balanced B-tress: Weight rather than degree constraint  Ω(w(v)) updates below v between rebalancing operations on v Persistent B-trees: –Update in current version in –Search in all previous versions in Buffer trees –Batching of operations to obtain bounds  construction algorithms

Lars Arge I/O-algorithms 5 Last time: Interval management Maintain N intervals with unique endpoints dynamically such that stabbing query with point x can be answered efficiently Static solution: Persistent B-tree –Linear space and query Dynamic solution: External interval tree – update x

Lars Arge I/O-algorithms 6 Base tree on endpoints – “slab” X v associated with each node v Interval stored in highest node v where it contains midpoint of X v Intervals I v associated with v stored in –Left slab list sorted by left endpoint (search tree) –Right slab list sorted by right endpoint (search tree)  Linear space and O(log N) update Internal Interval Tree

Lars Arge I/O-algorithms 7 Query with x on left side of midpoint of X root –Search left slab list left-right until finding non-stabbed interval –Recurse in left child  O(log N+T) query bound x Internal Interval Tree

Lars Arge I/O-algorithms 8 Externalizing Interval Tree Natural idea: –Block tree –Use B-tree for slab lists Number of stabbed intervals in large slab list may be small (or zero) –We can be forced to do I/O in each of O(log N) nodes

Lars Arge I/O-algorithms 9 Externalizing Interval Tree Idea: –Decrease fan-out to  height remains – slabs define multislabs –Interval stored in two slab lists (as before) and one multislab list –Intervals in small multislab lists collected in underflow structure –Query answered in v by looking at 2 slab lists and not O(log N) multislab

Lars Arge I/O-algorithms 10 External Interval Tree Linear space, query, update General solution techniques: –Filtering: Charge part of query cost to output –Bootstrapping: *Use O(B 2 ) size structure in each internal node *Constructed using persistence *Dynamic using global rebuilding –Weight-balanced B-tree: Split/fuse in amortized O(1)

Lars Arge I/O-algorithms 11 Last time: Three-Sided Range Queries Interval management: “1.5 dimensional” search More general 2d problem: Dynamic 3-sidede range searching –Maintain set of points in plane such that given query (q 1, q 2, q 3 ), all points (x,y) with q 1  x  q 2 and y  q 3 can be found efficiently Linear space, O(log B N+T/B) query static solution using persistence –Dynamic: External priority search tree (x,x) (x 1,x 2 ) x x1x1 x2x2 q3q3 q2q2 q1q1

Lars Arge I/O-algorithms 12 Base tree on x-coordinates with nodes augmented with points Heap on y-coordinates –Decreasing y values on root-leaf path –(x,y) on path from root to leaf holding x –If v holds point then parent(v) holds point  Linear space and O(log N) update Internal Priority Search Tree , , ,3 4 5,6 5 9,4 1 1, ,1 1

Lars Arge I/O-algorithms 13 Internal Priority Search Tree Query with (q 1, q 2, q 3 ) starting at root v: –Report point in v if satisfying query –Visit both children of v if point reported –Always visit child(s) of v on path(s) to q 1 and q 2  O(log N+T) query , , ,3 4 5,6 5 9,4 1 1, ,

Lars Arge I/O-algorithms 14 Natural idea: Block tree Problem: – I/Os to follow paths to to q 1 and q 2 –But O(T) I/Os may be used to visit other nodes (“overshooting”)  query Externalizing Priority Search Tree , , ,3 4 5,6 5 9,4 1 1, ,1 1

Lars Arge I/O-algorithms 15 Externalizing Priority Search Tree Solution idea: –Store B points in each node  *O(B 2 ) points stored in each supernode *B output points can pay for “overshooting” –Bootstrapping: *Store O(B 2 ) points in each supernode in static structure , , ,3 4 5,6 5 9,4 1 1, ,1 1

Lars Arge I/O-algorithms 16 External Priority Search Tree We have now discussed structures for special cases of two- dimensional range searching –Space: O(N/B) –Query: –Updates: Cannot be obtained for general (4-sided) 2d range searching: – query requires space – space requires query q3q3 q2q2 q1q1 q q q3q3 q2q2 q1q1 q4q4

Lars Arge I/O-algorithms 17 Base tree: Weight balanced tree with branching parameter and leaf parameter B on x-coordinates  height Points below each node stored in 4 linear space secondary structures: –“Right” priority search tree –“Left” priority search tree –B-tree on y-coordinates –Interval (priority search) tree  space External Range Tree

Lars Arge I/O-algorithms 18 Secondary interval tree: –Connect points in each slab in y-order –Project obtained segments in y-axis –Intervals stored in priority search tree *Interval augmented with pointer to corresponding points in y- coordinate B-tree in corresponding child node External Range Tree

Lars Arge I/O-algorithms 19 Query with (q 1, q 2, q 3, q 4 ) answered in top node with q 1 and q 2 in different slabs v 1 and v 2 Points in slab v 1 –Found with 3-sided query in v 1 using right priority search tree Points in slab v 2 –Found with 3-sided query in v 2 using left priority search tree Points in slabs between v 1 and v 2 –Answer stabbing query with q 3 using interval tree  first point above q 3 in each of the slabs –Find points using y-coordinate B-tree in slabs External Range Tree v1v1 v2v2

Lars Arge I/O-algorithms 20 External Range Tree Query analysis: – I/Os to find relevant node – I/Os to answer two 3-sided queries – I/Os to query interval tree – I/Os to traverse B-trees  I/Os v1v1 v2v2

Lars Arge I/O-algorithms 21 External Range Tree Insert: –Insert x-coordinate in weight-balanced B-tree *Split of v can be performed in I/Os  I/Os –Update secondary structures in all nodes on one root-leaf path *Update priority search trees *Update interval tree *Update B-tree  I/Os Delete: –Similar and using global rebuilding v1v1 v2v2

Lars Arge I/O-algorithms 22 Summary: External Range Tree 2d range searching in space – I/O query – I/O update Optimal among query structures q3q3 q2q2 q1q1 q4q4

Lars Arge I/O-algorithms 23 kdB-tree kd-tree: –Recursive subdivision of point-set into two half using vertical/horizontal line –Horizontal line on even levels, vertical on uneven levels –One point in each leaf  Linear space and logarithmic height

Lars Arge I/O-algorithms 24 kd-Tree: Query Query –Recursively visit nodes corresponding to regions intersecting query –Report point in trees/nodes completely contained in query Query analysis –Horizontal line intersect Q(N) = 2+2Q(N/4) = regions –Query covers T regions  I/Os worst-case

Lars Arge I/O-algorithms 25 kdB-tree kdB-tree: –Stop subdivision when leaf contains between B/2 and B points –BFS-blocking of internal nodes Query as before –Analysis as before but each region now contains Θ(B) points  I/O query

Lars Arge I/O-algorithms 26 Construction of kdB-tree Simple algorithm –Find median of y-coordinates (construct root) –Distribute point based on median –Recursively build subtrees –Construct BFS-blocking top-down Idea in improved algorithm –Construct levels at a time using O(N/B) I/Os

Lars Arge I/O-algorithms 27 Construction of kdB-tree Sort N points by x- and by y-coordinates using I/Os Building levels ( nodes) in O(N/B) I/Os: 1. Construct by grid with points in each slab 2. Count number of points in each grid cell and store in memory 3. Find slab s with median x-coordinate 4. Scan slab s to find median x-coordinate and construct node 5. Split slab containing median x-coordinate and update counts 6. Recurse on each side of median x-coordinate using grid (step 3)  Grid grows to during algorithm  Each node constructed in I/Os

Lars Arge I/O-algorithms 28 kdB-tree kdB-tree: –Linear space –Query in I/Os –Construction in I/Os –Point search in I/Os Dynamic? –Deletions relatively easily in I/Os (partial rebuilding)

Lars Arge I/O-algorithms 29 kdB-tree Insertion using Logarithmic Method Partition pointset S into subsets S 0, S 1, … S log N, |S i | = 2 i or |S i | = 0 Build kdB-tree D i on S i Query: Query each D i  Insert: Find first empty D i and construct D i out of elements in S 0,S 1, … S i-1 – I/Os  per moved point –Point moved O(log N) times  I/Os amortized

Lars Arge I/O-algorithms 30 kdB-tree Insertion and Deletion Insert: Use logarithmic method ignoring deletes Delete: Simply delete point p from relevant D i –i can be calculated based on # insertions since p was inserted –# insertions calculated by storing insertion number of each point in separate B-tree  extra update cost To maintain O(log N) structures D i –Perform global rebuild after every Θ(N) updates  extra update cost

Lars Arge I/O-algorithms 31 Summary: kdB-tree 2d range searching in O(N/B) space –Query in I/Os –Construction in I/Os –Updates in I/Os Optimal query among linear space structures q3q3 q2q2 q1q1 q4q4

Lars Arge I/O-algorithms 32 O-Tree Structure O-tree: –B-tree on vertical slabs –B-tree on horizontal slabs in each vertical slab –kdB-tree on points in each leaf

Lars Arge I/O-algorithms 33 O-Tree Query Perform rangesearch with q 1 and q 2 in vertical B-tree –Query all kdB-trees in leaves of two horizontal B-trees with x- interval intersected but not spanned by query –Perform rangesearch with q 3 and q 4 horizontal B-trees with x- interval spanned by query *Query all kdB-trees with range intersected by query

Lars Arge I/O-algorithms 34 O-Tree Query Analysis Vertical B-tree query: Query of all kdB-trees in leaves of two horizontal B-trees: Query horizontal B-trees: Query kdB-trees not completely in query Query in kdB-trees completely contained in query:  I/Os

Lars Arge I/O-algorithms 35 O-Tree Update Insert: –Search in vertical B-tree: I/Os –Search in horizontal B-tree: I/Os –Insert in kdB-tree: I/Os Use global rebuilding when structures grow too big/small –B-trees not contain elements –kdB-trees not contain elements  I/Os Deletes can be handled in I/Os similarly

Lars Arge I/O-algorithms 36 Summary: O-Tree 2d range searching in linear space – I/O query – I/O update Optimal among structures using linear space Can be extended to work in d-dimensions with optimal query bound q3q3 q2q2 q1q1 q4q4

Lars Arge I/O-algorithms 37 Summary/Conclusion: 3 and 4-sided Queries 3-sided 2d range searching: External priority search tree – query, space, update General (4-sided) 2d range searching: –External range tree: query, space, update –O-tree: query, space, update q3q3 q2q2 q1q1 q3q3 q2q2 q1q1 q4q4

Lars Arge I/O-algorithms 38 Summary/Conclusion: Tools and Techniques Tools: –B-trees –Persistent B-trees –Buffer trees –Logarithmic method –Weight-balanced B-trees –Global rebuilding Techniques: –Bootstrapping –Filtering q3q3 q2q2 q1q1 q3q3 q2q2 q1q1 q4q4 (x,x)

Lars Arge I/O-algorithms 39 Other results Many other results for e.g. –Higher dimensional range searching –Range counting, range/stabbing max, and stabbing queries –Halfspace (and other special cases) of range searching –Queries on moving objects –Proximity queries (closest pair, nearest neighbor, point location) –Structures for objects other than points (bounding rectangles) Many heuristic structures in database community Implementation efforts: –TPIE (Duke/Aarhus – –STXXL (Karlsruhe) –LEDA-SM (MPI)

Lars Arge I/O-algorithms 40 Point Enclosure Queries Dual of planar range searching problem –Report all rectangles containing query point (x,y) Internal memory: –Can be solved in O(N) space and O(log N + T) time x y

Lars Arge I/O-algorithms 41 Point Enclosure Queries Similarity between internal and external results (space, query) –in general tradeoff between space and query I/O InternalExternal 1d range search(N, log N + T)(N/B, log B N + T/B) 3-sided 2d range search(N, log N + T)(N/B, log B N + T/B) 2d range search 2d point enclosure(N, log N + T) (N/B, log N + T/B)? 2 B (N/B, log N+T/B) (N/B 1-ε, log B N+T/B)

Lars Arge I/O-algorithms 42 Rectangle Range Searching Report all rectangles intersecting query rectangle Q Often used in practice when handling complex geometric objects Q

Lars Arge I/O-algorithms 43 R-Tree Most common practical rectangle range searching structure [G84] –Rectangles in leaves (in any order) –Hierarchy of bounding rectangles in internal nodes –Query by recursively visiting relevant nodes (as in kdB-tree)

Lars Arge I/O-algorithms 44 R-Tree R-tree is simple and space efficient (and multi purpose) –But in worst case query can take Ω(N/B) I/Os! Many R-tree construction/maintaining heuristic (leaf orderings) have been proposed e.g. –R + -tree, R*-tree, Hilbert, TGS Only PR-tree beats Ω(N/B) query bound – query –Optimal for R-trees (and in other models)

Lars Arge I/O-algorithms 45 References External Memory Geometric Data Structures Lecture notes by Lars Arge. –Section 8+9

Lars Arge I/O-algorithms 46 Geometric Algorithms We will now (shortly) look at geometric algorithms –Solves problem on set of objects Example: Orthogonal line segment intersection –Given set of axis-parallel line segments, report all intersections In internal memory many problems is solved using sweeping

Lars Arge I/O-algorithms 47 Plane Sweeping Sweep plane top-down while maintaining search tree T on vertical segments crossing sweep line (by x-coordinates) –Top endpoint of vertical segment: Insert in T –Bottom endpoint of vertical segment: Delete from T –Horizontal segment: Perform range query with x-interval on T

Lars Arge I/O-algorithms 48 Plane Sweeping In internal memory algorithm runs in optimal O(Nlog N+T) time In external memory algorithm performs badly (>N I/Os) if |T|>M –Even if we implements T as B-tree  O(Nlog B N+T/B) I/Os Solution: Distribution sweeping

Lars Arge I/O-algorithms 49 Distribution Sweeping Divide plane into M/B-1 slabs with O(N/(M/B)) endpoints each Sweep plane top-down while reporting intersections between –part of horizontal segment spanning slab(s) and vertical segments Distribute data to M/B-1 slabs –vertical segments and non-spanning parts of horizontal segments Recurse in each slab

Lars Arge I/O-algorithms 50 Distribution Sweeping Sweep performed in O(N/B+T’/B) I/Os  I/Os Maintain active list of vertical segments for each slab ( <B in memory) –Top endpoint of vertical segment: Insert in active list –Horizontal segment: Scan through all relevant active lists *Removing “expired” vertical segments *Reporting intersections with “non-expired” vertical segments

Lars Arge I/O-algorithms 51 Distribution Sweeping Other example: Rectangle intersection –Given set of axis-parallel rectangles, report all intersections.

Lars Arge I/O-algorithms 52 Distribution Sweeping Divide plane into M/B-1 slabs with O(N/(M/B)) endpoints each Sweep plane top-down while reporting intersections between –part of rectangles spanning slab(s) and other rectangles Distribute data to M/B-1 slabs – Non-spanning parts of rectangles Rcurse in each slab

Lars Arge I/O-algorithms 53 Distribution Sweeping Seems hard to perform sweep in O(N/B+T’/B) I/Os Solution: Multislabs –Reduce fanout of distribution to –Recursion height still –Room for block from each multislab (activlist) in memory

Lars Arge I/O-algorithms 54 Distribution Sweeping Sweep while maintaining rectangle active list for each multisslab –Top side of spanning rectangle: Insert in active multislab list –Each rectangle: Scan through all relevant multislab lists *Removing “expired” rectangles *Reporting intersections with “non-expired” rectangles  I/Os

Lars Arge I/O-algorithms 55 Distribution Sweeping Distribution sweeping can relatively easily be used to solve a number of other problems in the plane I/O-efficiently By decreasing distribution fanout to for c≥1 a number of higher-dimensional problems can also be solved I/O-efficiently

Lars Arge I/O-algorithms 56 Other Results Other geometric algorithms results include: –Red blue line segment intersection (using distribution sweep, buffer trees/batched filtering, external fractional cascading) –General planar line segment intersection (as above and external priority queue) –2d and 2d Convex hull: (Complicated deterministic 3d and simpler randomized) –2d Delaunay triangulation