X1x1 x2x2 top-k y 3-sided x1x1 x2x2 External Memory Three-Sided Range Reporting and Top-k Queries with Sublogarithmic Updates Gerth Stølting Brodal Aarhus.

Slides:



Advertisements
Similar presentations
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Advertisements

I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
B+-trees. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B = n pages I/O complexity:
External Memory Geometric Data Structures
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
6/14/2015 6:48 AM(2,4) Trees /14/2015 6:48 AM(2,4) Trees2 Outline and Reading Multi-way search tree (§3.3.1) Definition Search (2,4)
Cache-Oblivious B-Trees
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
Fully Persistent B-Trees 23 rd Annual ACM-SIAM Symposium on Discrete Algorithms, Kyoto, Japan, January 18, 2012 Gerth Stølting Brodal Konstantinos Tsakalidis.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
Nick Harvey & Kevin Zatloukal
I/O-Algorithms Lars Arge Aarhus University March 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
Comparison Based Dictionaries: Fault Tolerance versus I/O Efficiency Gerth Stølting Brodal Allan Grønlund Jørgensen Thomas Mølhave University of Aarhus.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
© 2004 Goodrich, Tamassia (2,4) Trees
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
(B+-Trees, that is) Steve Wolfman 2014W1
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Tirgul 6 B-Trees – Another kind of balanced trees.
Cache-Oblivious Dynamic Dictionaries with Update/Query Tradeoff Gerth Stølting Brodal Erik D. Demaine Jeremy T. Fineman John Iacono Stefan Langerman J.
Splay Trees and B-Trees
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
Review for Exam 2 Topics covered (since exam 1): –Splay Tree –K-D Trees –RB Tree –Priority Queue and Binary Heap –B-Tree For each of these data structures.
Depth First Search Maedeh Mehravaran Big data 1394.
Equivalence Between Priority Queues and Sorting in External Memory
External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets.
Internal Memory Pointer MachineRandom Access MachineStatic Setting Data resides in records (nodes) that can be accessed via pointers (links). The priority.
More Trees. Outline Tree B-Tree 2-3 Tree Tree Red-Black Tree.
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: ec2620m.htm Office: TEL 3049.
arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
Topics covered (since exam 1):
Red-Black Trees v z Red-Black Trees 1 Red-Black Trees
Binary Search Tree Chapter 10.
Advanced Topics in Data Management
(2,4) Trees (2,4) Trees 1 (2,4) Trees (2,4) Trees
Topics covered (since exam 1):
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
B-Trees.
STACS arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
B-Tree.
(2,4) Trees (2,4) Trees (2,4) Trees.
(2,4) Trees 2/15/2019 (2,4) Trees (2,4) Trees.
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
8th Workshop on Massive Data Algorithms, August 23, 2016
(2,4) Trees (2,4) Trees (2,4) Trees.
Topics covered (since exam 1):
CS210- Lecture 20 July 19, 2005 Agenda Multiway Search Trees 2-4 Trees
Presentation transcript:

x1x1 x2x2 top-k y 3-sided x1x1 x2x2 External Memory Three-Sided Range Reporting and Top-k Queries with Sublogarithmic Updates Gerth Stølting Brodal Aarhus University arxiv.org/abs/ k = 3 “the result is obtained by combining already existing techniques (and no new techniques are introduced)” - anonymous reviewer

Updates O(log n) 3-sided O(log n + k) Updates O(log n) 3-sided & top-k O(log n + k) A ++ ++ ++ ++ ++ ++ y x1x1 x2x2 Internal Memory – Priority Search Trees Properties  leaves x-sorted  point p stored on leaf p-to-root path  y-values satisfy heap-order McCreight 1985 Frederickson 1993 BCDEFGHIJKLMNOP B EHIK O P DGJN CM L F 3-sided M D IJKL IK J N M L EFGH EH G F s = O(log n) heap-ordered top-k select k+s largest top-9 -- -- -- -- -- -- -- -- QA Q A B C D E F G H I J K L M N O P A Q in O(k+s) time

External Memory Model CPU Internal External M IO of B consecutive cells Aggarwal & Vitter 1988 B-tree Bayer & McCreight 1972 height O(log Δ (n/B)) degree Δ ≤ B Δ-1 search keys leaves Θ(B) elements Insert, Delete, Search O(log Δ (n/B)) IOs Buffered B-tree Arge 1995 Search O(log Δ (n/B)) IOs Updates amortized O(Δ/B·log Δ (n/B)) IOs buffer of ≤ B delayed updates

External Memory Results UpdatesQuery Ramaswamy, Subramanian 1995 O A (log n·log B)O(log B n + k/B) Subramanian, Ramaswamy 1995 O A (log B n + (log B n) 2 /B)O(log B n + k/B + log ** B) Arge et al. 1999O(log B n)O(log B n + k/B) NEWO A (1/(εB 1-ε ) · log B n)O A (1/ε · log B n + k/B) Afshani et al. 2011(static)O(log B n + k/B) Sheng, Tao 2012O A ((log B n) 2 )O(log B n + k/B) Tao 2014O A (log B n)O(log B n + k/B) NEWO A (1/(εB 1-ε ) · log B n)O A (1/ε · log B n + k/B) O A = amortized 3-sided top-k k = 3 NEW result : Combination of Arge 1995, Arge et al. 1999, Frederickson 1993, Blum et al. 1973

External Memory 3-sided Data Structure x-sorted degree Δ = B ε 3-sided external memory priority search tree root stores topmost Θ(B) points PvPv buffers of O(B) delayed insertions and deletions below v IvIv DvDv v CvCv efficient structure for accessing children’s P sets y-sorted  Insertions / deletions : Update root P v or add to delayed update buffer I v / D v  Update buffer overflow : Flush recursively to child with most updates (≥ B 1-ε )  Leaf overflow : split leaf, and recursively split ancestors of degree Δ+1  Underflowing point buffer P v : pull elements recursively from children using C v  3-sided query : i) Identify nodes to visit using C v structures. ii) flush updates down from ancestors of visited nodes. iii) report from nodes using P v, C v and update buffers x1x1 x2x2

Child Structure C v degree Δ = B ε 3-sided PvPv IvIv DvDv v CvCv  Capacity : B 1+ε  Insetion /deletion buffer O(B) points  O(B ε ) blocks  Catalog block  y-samples block (new) Arge et al y b1b1 b8b8 b7b7 b6b6 b5b5 b4b4 b3b3 b2b2 b 6,7 b 1,2 b 6,8 b 3,4 b 1,8 b 1,5 b 3,5 Insert / delete s points : O(1 + s/B 1-ε ) IOs 3-sided query : O(1 + k/B) IOs y-samples for range [x 1,x 2 ] : O(1) IOs (new)

External Memory Top-k – Overall Approach 3-sided top-k k = 3 x1x1 x2x2 x1x1 x2x2 y find magic y 3-sided query select k-largest Superset O(B·log Δ (n/B) + k) Answer Blum et al Construct (on demand) a binary heap over the samples of every Θ(B)’th element in the C v structures – and select the Θ(log Δ (n/B) + k/B)’th element using Frederickson 1993 All steps require O(log Δ (n/B) + k/B) IOs The 3-sided query is amortized

Summary O A = amortized 3-sided top-k k = 3 Open problem : Remove amortization ? – The End UpdatesQuery Ramaswamy, Subramanian 1995 O A (log n·log B)O(log B n + k/B) Subramanian, Ramaswamy 1995 O A (log B n + (log B n) 2 /B)O(log B n + k/B + log ** B) Arge et al. 1999O(log B n)O(log B n + k/B) NEWO A (1/(εB 1-ε ) · log B n)O A (1/ε · log B n + k/B) Afshani et al. 2011(static)O(log B n + k/B) Sheng, Tao 2012O A ((log B n) 2 )O(log B n + k/B) Tao 2014O A (log B n)O(log B n + k/B) NEWO A (1/(εB 1-ε ) · log B n)O A (1/ε · log B n + k/B)