Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.

Slides:



Advertisements
Similar presentations
INTERVAL TREE & SEGMENTATION TREE
Advertisements

An Optimal Dynamic Interval Stabbing-Max Data Structure? Pankaj K. Agarwal, Lars Arge and Ke Yi Department of Computer Science Duke University.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
External Memory Geometric Data Structures
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
Course Review COMP171 Spring Hashing / Slide 2 Elementary Data Structures * Linked lists n Types: singular, doubly, circular n Operations: insert,
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj.
I/O-Algorithms Lars Arge Aarhus University March 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
Cpt S 223 – Advanced Data Structures Course Review Midterm Exam # 2
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Tirgul 6 B-Trees – Another kind of balanced trees.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Splay Trees and B-Trees
Orthogonal Range Searching I Range Trees. Range Searching S = set of geometric objects Q = query object Report/Count objects in S that intersect Q Query.
Heapsort Based off slides by: David Matuszek
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
CSE AU B-Trees1 B-Trees CSE 373 Data Structures.
14/13/15 CMPS 3130/6130 Computational Geometry Spring 2015 Windowing Carola Wenk CMPS 3130/6130 Computational Geometry.
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Data Structure & Algorithm II.  Delete-min  Building a heap in O(n) time  Heap Sort.
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
Outline Priority Queues Binary Heaps Randomized Mergeable Heaps.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Starting at Binary Trees
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Depth First Search Maedeh Mehravaran Big data 1394.
External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Lecture 10COMPSCI.220.FS.T Binary Search Tree BST converts a static binary search into a dynamic binary search allowing to efficiently insert and.
CMPS 3130/6130 Computational Geometry Spring 2015
February 17, 2005Lecture 6: Point Location Point Location (most slides by Sergi Elizalde and David Pritchard)
COMP 5704 Project Presentation Parallel Buffer Trees and Searching Cory Fraser School of Computer Science Carleton University, Ottawa, Canada
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
UNC Chapel Hill M. C. Lin Geometric Data Structures Reading: Chapter 10 of the Textbook Driving Applications –Windowing Queries Related Application –Query.
Geometric Data Structures
CMPS 3130/6130 Computational Geometry Spring 2017
B+ Tree.
Advanced Topics in Data Management
Orthogonal Range Searching and Kd-Trees
CMSC 341: Data Structures Priority Queues – Binary Heaps
Segment Trees Basic data structure in computational geometry.
CMPS 3130/6130 Computational Geometry Spring 2017
Presentation transcript:

Lars Arge Presented by Or Ozery

I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per block Measuring in terms of # of blocks: n= N / B m=M / B 2The Buffer Tree

I/O Model vs. RAM Model RAM ModelI/O Model ScanningΘ(N)Θ(N)Θ(n)Θ(n) List mergingΘ(N)Θ(N)Θ(n)Θ(n) SortingΘ(N log 2 N)Θ(n log m n) SearchingΘ(log 2 N)Θ(log B N) Sorting using a B-treeΘ(N log 2 N)Θ(N log B N) 3The Buffer Tree

Online vs. Batched Online Problems Batched Problems A single command is given each time. Must be processed before other commands are given. Should be performed in a good W.C. time. For example: Searching. A stream of commands is given. Can perform commands in any legal order. Should be performed in a good amortized time. For example: Sorting. The Buffer Tree4

Motivation We’ve seen that using an online-efficient data structure (B-tree) for a batched problem (sorting) is inefficient. We thus would like to design a data structure for efficient use on batched problems, such as: Sorting Minimum reporting (priority queue) Range searching Interval stabbing The Buffer Tree5

The Main Idea There are 2 reasons why B-tree sorting is inefficient: We work element-wise instead of block-wise. We don’t take advantage of the memory size m. We can fix both problems by using buffers: It allows us to accumulate elements into blocks. Using buffers of size Θ(m), we fully utilize the memory. The Buffer Tree6

(m/4, m)-tree ⇒ branching factor Θ(m). Elements are stored in leaves, in blocks ⇒ O(n) leaves. Each internal node has a buffer of size m. The Buffer Tree7

Basic Properties The height of the tree is O(log m n). The number of internal nodes is O(n/m). From now on define: Leaf nodes: nodes that have children which are leaves. Internal nodes: nodes that are not leaf nodes. The buffer tree uses linear space: Each leaf takes O(1) space ⇒ O(n) space. Each node takes O(m) space ⇒ O(n) space. The Buffer Tree8

Processing Commands We wait until we have a block of commands, then we insert it to the buffer of the root. Because we process commands in a lazy way, we need to time-stamp them. When the buffer of the root gets full, we empty it, using a buffer-emptying process (BEP): We distribute elements to the buffers one level down. If any of the child buffers gets full, we continue in recursion. The Buffer Tree9

Internal Node BEP 1. Sort the elements in the buffer while deleting corresponding insert and delete elements. 2. Scan through the sorted buffer and distribute the elements to the appropriate buffers one level down. 3. If any of the child buffers is now full, run the appropriate BEP recursively. Internal node BEP takes O(x + m), where x is the number of elements in the buffer. The Buffer Tree10

Leaf Node BEP 1. Sort the elements in the buffer as for internal nodes. 2. Merge the sorted buffer with the leaves of the node. 3. If the number of leaves increased: 1. Place the smallest elements in the leaves of the node. 2. Repeatedly insert one block of elements and rebalance. 4. If the number of leaves decreased: 1. Place the elements in sorted order in the leaves, and append “dummy-blocks” at the end. 2. Repeatedly delete one dummy block and rebalance. The Buffer Tree11

Rebalancing - Fission The Buffer Tree12

Rebalancing - Fusion The Buffer Tree13

Rebalancing Cost Rebalancing starts when inserting/deleting a block. The leaf node which sparked the rebalancing, will not cause rebalancing for the next O(m) inserts/deletes. Thus the total number of rebalancing operations on leaf nodes is O(n/m). Each rebalancing operation on a leaf node can span O(log m n) rebalancing operations. So there are O((n/m) log m n) rebalancing operations, each costs O(m) ⇒ Rebalancing takes O(n log m n). The Buffer Tree14

Summing Up We’ve seen rebalancing takes O(n log m n). BEP cost: BEP of full buffers is linear in the number of blocks in the buffer ⇒ Each element pays O(1/B) to be pushed one level down the tree. Because there are O(log m n) levels in the tree, each element pays O(log m n / B) ⇒ BEP takes O(n log m n). Therefore, a sequence of N operations on an empty buffer tree takes O(n log m n). The Buffer Tree15

Sorting After inserting all N items to the tree, we need to empty all the buffers. We do this in a BFS order. How much does emptying all buffers cost? Emptying a buffer takes O(m) amortized. There are O(n/m) buffers ⇒ Total cost is O(n). Thus sorting using a buffer tree takes O(n log m n). The Buffer Tree16

Priority Queue We can easily transform our buffer tree into a PQ by adding support for a delete-min operation: The smallest element is found on the path from the root to the leftmost leaf. Therefore a delete-min operation will empty all the buffers on the above path in O(m log m n). To make-up for the above cost, we delete the M/4 smallest elements and keep them in memory. This way we can answer the next M/4 delete-min’s free. Thus our PQ supports N operations in O(n log m n). The Buffer Tree17

Time-Forward Processing The problem: We are given a topologically ordered DAG. For each vertex v there is a function f v which depends on all f u where u is a predecessor of v. The goal is to compute f v for all v. The Buffer Tree18

TWP Using Our PQ 1. For each vertex v (sorted in topological order): 1. Extract the minimum d - (v) elements from the PQ. 2. Use the extracted elements to compute f v. 3. For each edge (v, u) insert f v in the PQ with priority u. The above works in O(n log m n). The Buffer Tree19

Buffered Range Tree We want to extend our tree to support range queries: Given an interval [x 1, x 2 ], report all elements of the tree that our contained in it. How will we distribute the query elements when emptying a buffer? As long as the interval is contained in a sub-tree, send the query element to the root buffer of that sub-tree. Otherwise, we split the query into its 2 query elements, and report the elements in the relevant sub-trees. The Buffer Tree20

Time Order Representation We say that a list of elements is in time order representation (TOR) if it’s of the form D-S-I, where: D is a sorted list of delete elements. S is a sorted list of query elements. I is a sorted list of insert elements. Lemma 1: A non-full buffer can be brought into TOR in O(m + r) where r · B is the number of queries reported in the process. The Buffer Tree21

Merging of TOR Lists Lemma 2: Let S 1 and S 2 be TOR lists such that all elements of S 2 are older then the elements of S 1. S 1 and S 2 can be merged into a TOR list in O(s 1 + s 2 + r) where s 1 and s 2 are the size in blocks of S 1 and S 2 and r · B is the number of queries reported in the process. The Buffer Tree22

Proof of Lemma 2 Let S j = d j - s j - i j. The Buffer Tree23 d2d2 s2s2 i2i2 d1d1 s1s1 i1i1 d2d2 s2s2 d1d1 i2i2 s1s1 i1i1 d2d2 d1d1 s2s2 i2i2 s1s1 i1i1 d2d2 d1d1 s2s2 s1s1 i2i2 i1i1 dsi Time

Full Sub-Tree Reporting Lemma 3: All buffers of a sub-tree with x leaves can be emptied and collected to a TOR list in O(x + r). Proof: 1. For each level, prepare a TOR list of its elements. 2. Merge the TOR lists of all levels. The Buffer Tree24 After step 1After step 2

Internal Node BEP 1. Compute the TOR of the buffer. 2. Scan the delete elements and distribute them. 3. Scan the range search elements and determine which sub-trees should have their elements reported. 4. For each such sub-tree: 1. Remove the delete elements from (2) and store them in temporary place. 2. Collect the elements of the sub-tree into TOR. 3. Merge this TOR with the TOR of the removed delete elements. 4. Distribute the insert and delete elements to leaf buffers. 5. Merge a copy of the leaves with the TOR. 6. Remove the range search elements from the TOR. 7. Report the resulting elements to whoever needs it. 5. Distribute the range search elements. 6. Distribute the insert elements (if sub-tree was emptied, to leaf buffers). 7. If any child buffer got full, apply the BEP recursively. The Buffer Tree25

Leaf Node BEP 1. Construct the TOR of the elements in the buffer. 2. Merge the TOR with the leaves. 3. Remove all range search elements and continue the BEP as in the normal buffer tree. The Buffer Tree26

Analysis The main difference from the normal buffer tree is the action of reporting all elements of a sub-tree. By lemma 3, this action has a linear cost. We thus can split this cost between the delete elements and query elements, as each element gets either deleted or reported. Thus, a series of N operations on our buffered range tree costs O(n log m n + r). The Buffer Tree27

Orthogonal Line Intersection The problem: Given N line segments parallel to the axes, report all intersections of orthogonal segments. The Buffer Tree28

OLI Using Our Range Tree Sort the segments, once by their top y coordinate, and once by their bottom y coordinate. 1. Merge the 2 sorted list of segments: 1. When encountering a top coordinate of a vertical segment, insert its x coordinate to the tree. 2. When encountering a bottom coordinate of a vertical segment, delete its x coordinate from the tree. 3. When encountering a horizontal segment, insert a query for its endpoints. The above takes an optimal O(n log m n + r). The Buffer Tree29

Buffered Segment Tree We switch parts between points and intervals: We insert and delete intervals from the tree. We use points as queries to get reported on all intervals stabbed by a point. We assume the intervals has (distinct) endpoints from a fixed given set E of size N. The elements in leaves will be the points of E. We build our tree bottom-up in O(n). The Buffer Tree30

Buffered Segment Tree Define: slabs, multi-slabs, short/long segments. The Buffer Tree31 AB CDEF

Internal Node BEP 1. Repeatedly load m/2 blocks of elements into memory, and perform the following: 1. For every multi-slab list insert the relevant long segments. 2. For every multi-slab list that is stabbed by a point, report intervals and remove expired ones. 3. Distribute segments and queries. 2. If there’s a full child buffer, apply BEP recursively. The above costs O(m + x + r) = O(x + r) amortized. The Buffer Tree32

Analysis Because the tree structure is static, there is no rebalancing, and also no emptying of non-full buffers. Therefore the only cost is emptying of full buffers, which is linear. Thus a series of N operations on our segment tree takes O(n log m n + r). A write (flush) operation takes O(n log m n). Therefore we have the desired O(n log m n + r). The Buffer Tree33

Batched Range Searching The problem: Given N points and N axis parallel rectangles in the plane, report all points inside each rectangle. The Buffer Tree34

BRS Using Our Segment Tree 1. Sort points and rectangles by their top y coordinate. 2. Scan the sorted list: 1. For each rectangle, insert the interval that corresponds to its horizontal side, with a delete time matching its bottom y coordinate. 2. For each point, insert a stabbing query. 3. Flush the tree (empty all buffers). The above takes an optimal O(n log m n + r). The Buffer Tree35

Pairwise Rectangle Intersection The problem: Given N axis parallel rectangles in the plane, report all intersecting pairs. The Buffer Tree36

PRI Using Our Segment Tree 2 rectangles in the plane intersect ⇔ one of the following holds: 1. They have intersecting edges. 2. One contains the other ⇒ One contains the other’s midpoint. We have shown an O(n log m n + r) solution for both (1) and (2). Therefore we have an optimal O(n log m n + r) solution for the PRI problem. The Buffer Tree37