8th Workshop on Massive Data Algorithms, August 23, 2016

Slides:



Advertisements
Similar presentations
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Advertisements

I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
External Memory Geometric Data Structures
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
6/14/2015 6:48 AM(2,4) Trees /14/2015 6:48 AM(2,4) Trees2 Outline and Reading Multi-way search tree (§3.3.1) Definition Search (2,4)
Heaps and heapsort COMP171 Fall Sorting III / Slide 2 Motivating Example 3 jobs have been submitted to a printer in the order A, B, C. Sizes: Job.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
Fully Persistent B-Trees 23 rd Annual ACM-SIAM Symposium on Discrete Algorithms, Kyoto, Japan, January 18, 2012 Gerth Stølting Brodal Konstantinos Tsakalidis.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University March 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
Comparison Based Dictionaries: Fault Tolerance versus I/O Efficiency Gerth Stølting Brodal Allan Grønlund Jørgensen Thomas Mølhave University of Aarhus.
© 2004 Goodrich, Tamassia (2,4) Trees
Cache Oblivious Search Trees via Binary Trees of Small Height
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
AALG, lecture 11, © Simonas Šaltenis, Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees and.
Heapsort Based off slides by: David Matuszek
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Data Structure & Algorithm II.  Delete-min  Building a heap in O(n) time  Heap Sort.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
Data Structure II So Pak Yeung Outline Review  Array  Sorted Array  Linked List Binary Search Tree Heap Hash Table.
Review for Exam 2 Topics covered (since exam 1): –Splay Tree –K-D Trees –RB Tree –Priority Queue and Binary Heap –B-Tree For each of these data structures.
Queues, Stacks and Heaps. Queue List structure using the FIFO process Nodes are removed form the front and added to the back ABDC FrontBack.
Depth First Search Maedeh Mehravaran Big data 1394.
Binomial Tree B k-1 B0B0 BkBk B0B0 B1B1 B2B2 B3B3 B4B4 Adapted from: Kevin Wayne B k : a binomial tree B k-1 with the addition of a left child with another.
Equivalence Between Priority Queues and Sorting in External Memory
External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets.
Mergeable Heaps David Kauchak cs302 Spring Admin Homework 7?
Internal Memory Pointer MachineRandom Access MachineStatic Setting Data resides in records (nodes) that can be accessed via pointers (links). The priority.
X1x1 x2x2 top-k y 3-sided x1x1 x2x2 External Memory Three-Sided Range Reporting and Top-k Queries with Sublogarithmic Updates Gerth Stølting Brodal Aarhus.
arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
Multiway Search Trees Data may not fit into main memory
Experimental evaluation of Navigation piles
Topics covered (since exam 1):
Priority Queues Chuan-Ming Liu
Red-Black Trees v z Red-Black Trees 1 Red-Black Trees
Binary Search Tree Chapter 10.
(edited by Nadia Al-Ghreimil)
Interval Heaps Complete binary tree.
Advanced Topics in Data Management
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
(2,4) Trees (2,4) Trees 1 (2,4) Trees (2,4) Trees
Topics covered (since exam 1):
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
STACS arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
B-Tree.
(2,4) Trees (2,4) Trees (2,4) Trees.
(2,4) Trees 2/15/2019 (2,4) Trees (2,4) Trees.
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
(edited by Nadia Al-Ghreimil)
CPS216: Advanced Database Systems
(2,4) Trees (2,4) Trees (2,4) Trees.
CS210- Lecture 20 July 19, 2005 Agenda Multiway Search Trees 2-4 Trees
Presentation transcript:

8th Workshop on Massive Data Algorithms, August 23, 2016 External Memory Three-Sided Range Reporting and Top-k Queries with Sublogarithmic Updates y 3-sided x1 x2 x1 x2 top-k “the result is obtained by combining already existing techniques (and no new techniques are introduced)” - anonymous reviewer Same reviewer said ”I like the paper” and “the combination is quite intricate (because of how different pieces interact with each other) and non-trivial” Gerth Stølting Brodal Aarhus University k = 3 8th Workshop on Massive Data Algorithms, August 23, 2016

Internal Memory – Priority Search Trees McCreight 1985 Frederickson 1993 Properties leaves x-sorted point p stored on leaf p-to-root path y-values satisfy heap-order B C D E F G H I J K L M N O P L C Q M D G J N Updates O(log n) 3-sided O(log n + k) Updates O(log n) 3-sided & top-k O(log n + k) A E H I K P Q A A B F O y x1 x2 top-k Q A B C D E F G H I J K L M N O P 3-sided Grey nodes = empty nodes (store no point) + D s = O(log n) L top-9 E F G H M heap-ordered N - I J K L M select k+s largest in O(k+s) time

External Memory Model B-tree Buffered B-tree Aggarwal & Vitter 1988 M CPU External Internal M IO of B consecutive cells B-tree Insert, Delete, Search O(logΔ (n/B)) IOs Bayer & McCreight 1972 degree Δ ≤ B Δ-1 search keys height O(logΔ (n/B)) leaves Θ(B) elements Buffered B-tree Arge 1995 buffer of ≤ B delayed updates Search O(logΔ (n/B)) IOs Updates amortized O(Δ/B·logΔ (n/B)) IOs

External Memory Results Updates Query Ramaswamy , Subramanian 1995 OA(log n·log B) O(logB n + k/B) Subramanian, Ramaswamy 1995 OA(logB n + (logB n)2/B) O(logB n + k/B + log** B) Arge et al. 1999 O(logB n) NEW OA(1/(εB1-ε) · logB n) OA (1/ε · logB n + k/B) Afshani et al. 2011 (static) Sheng, Tao 2012 OA((logB n)2) Tao 2014 OA(logB n) OA(1/ε · logB n + k/B) 3-sided top-k k = 3 Arge 1995 = buffering of updates in a B-tree Arge 1999 = external memory priority search tree Frederickson 1993 = binary heap selection Blum et al. 1973 = linear time selection that also works in the external memory setting OA = amortized NEW result : Combination of Arge 1995, Arge et al. 1999, Frederickson 1993, Blum et al. 1973

External Memory 3-sided Data Structure buffers of O(B) delayed insertions and deletions below v Iv Dv external memory priority search tree root stores topmost Θ(B) points Pv degree Δ = Bε v efficient structure for accessing children’s P sets Cv y-sorted x-sorted Queries: Deletion buffers can at most remove a constant fraction of the content found in the P_v sets. Visit v = All points in P_v should be reported x1 x2 Insertions / deletions : Update root Pv or add to delayed update buffer Iv / Dv Update buffer overflow : Flush recursively to child with most updates (≥ B1-ε) Leaf overflow : split leaf, and recursively split ancestors of degree Δ+1 Underflowing point buffer Pv : pull elements recursively from children using Cv 3-sided query : i) Identify nodes to visit using Cv structures. ii) flush updates down from ancestors of visited nodes. iii) report from nodes using Pv, Cv and update buffers

Child Structure Cv Arge et al. 1999 degree Δ = Bε 3-sided Child Structure Cv Arge et al. 1999 degree Δ = Bε v Pv Iv Dv Cv Insert / delete s points : O(1 + s/B1-ε) IOs 3-sided query : O(1 + k/B) IOs y-samples for range [x1,x2] : O(1) IOs (new) Block = horizontal line = stores all points above line y b1 b8 b7 b6 b5 b4 b3 b2 b6,7 b1,2 b6,8 b3,4 b1,8 b1,5 b3,5 Capacity : B1+ε Insetion /deletion buffer O(B) points O(Bε) blocks Catalog block y-samples block (new)

External Memory Top-k – Overall Approach find magic y 3-sided query select k-largest x1 x2 x1 x2 top-k k = 3 3-sided Superset O(B·logΔ (n/B) + k) Answer y Blum et al. 1973 All steps require O(logΔ (n/B) + k/B) IOs The 3-sided query is amortized Construct (on demand) a binary heap over the samples of every Θ(B)’th element in the Cv structures – and select the Θ(logΔ (n/B) + k/B)’th element using Frederickson 1993

Open problem : Remove amortization ? Summary – The End Updates Query Ramaswamy , Subramanian 1995 OA(log n·log B) O(logB n + k/B) Subramanian, Ramaswamy 1995 OA(logB n + (logB n)2/B) O(logB n + k/B + log** B) Arge et al. 1999 O(logB n) NEW OA(1/(εB1-ε) · logB n) OA (1/ε · logB n + k/B) Afshani et al. 2011 (static) Sheng, Tao 2012 OA((logB n)2) Tao 2014 OA(logB n) OA(1/ε · logB n + k/B) 3-sided top-k k = 3 log** is the iterated log*, i.e. number of times to apply log* to get down to a constant Open problem: If bounds were worst-case, the bounds would be an optimal query-update trade-off (Brodal-Fagerberg 2003) OA = amortized Open problem : Remove amortization ? Gerth Stølting Brodal gerth@cs.au.dk