COMP 5704 Project Presentation Parallel Buffer Trees and Searching Cory Fraser School of Computer Science Carleton University, Ottawa, Canada

Slides:



Advertisements
Similar presentations
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advertisements

I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Advanced Data Structures NTUA Spring 2007 B+-trees and External memory Hashing.
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
B+-trees. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B = n pages I/O complexity:
Hierarchy-conscious Data Structures for String Analysis Carlo Fantozzi PhD Student (XVI ciclo) Bioinformatics Course - June 25, 2002.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
B+-trees and Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: page = b bytes or B records (or block) If r is the size of a.
B+-tree and Hashing.
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
6/14/2015 6:48 AM(2,4) Trees /14/2015 6:48 AM(2,4) Trees2 Outline and Reading Multi-way search tree (§3.3.1) Definition Search (2,4)
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
Primary Indexes Dense Indexes
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Advanced Algorithm Design and Analysis (Lecture 2) SW5 fall 2004 Simonas Šaltenis E1-215b
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
B-Trees and Red Black Trees. Binary Trees B Trees spread data all over – Fine for memory – Bad on disks.
Indexing.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
MotivationFundamental ProblemsProblems on Graphs Parallel processors are becoming common place. Each core of a multi-core processor consists of a CPU and.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Xiaoying Gao, Peter Andreae, VUW B Trees and B+ Trees COMP 261.
CSED101 INTRODUCTION TO COMPUTING TREE 2 Hwanjo Yu.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
Arboles B External Search The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er)
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Advanced Algorithm Design and Analysis (Lecture 1) SW5 fall 2004 Simonas Šaltenis E1-215b
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
Michal Balas1 I/O-efficient Point Location using Persistent B-Trees Lars Arge, Andrew Danner, and Sha-Mayn Teh Department of Computer Science, Duke University.
COMP261 Lecture 23 B Trees.
Multiway Search Trees Data may not fit into main memory
Advanced Topics in Data Management
Database Management Systems (CS 564)
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
B-Trees.
(2,4) Trees 2/15/2019 (2,4) Trees (2,4) Trees.
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
Chapter 12 Query Processing (1)
Low Depth Cache-Oblivious Algorithms
General External Merge Sort
External Sorting Dina Said
CSE 190D Database System Implementation
Presentation transcript:

COMP 5704 Project Presentation Parallel Buffer Trees and Searching Cory Fraser School of Computer Science Carleton University, Ottawa, Canada

COMP 5704 Project Presentation Outline Computational Model Parallel Buffer Trees Implementation Results

COMP 5704 Project Presentation Computational Model Sequential Buffer Tree operated in the External Memory Model. Minimizes transfers from hard disk -> RAM. Parallel Buffer Tree operates in the Parallel External Memory Model. Minimizes transfers from RAM -> CPU cache.

COMP 5704 Project Presentation Related Search Data Structures Binary Search Trees Usually analyzed in RAM/PRAM model. O(nlogn) build time, O(logn) operation time. B-trees Analyzed in EM / PEM model. O(nlog B n) build time, O(log B n) operation time. Buffer Tree has O((n/B) log B (n/B)) build time.

COMP 5704 Project Presentation What is a Parallel Buffer Tree? An offline data structure. An (a,b)-tree variant. Performs tree operations in batches to reduce I/Os. Good when there’s a continual large flow of operations to execute.

COMP 5704 Project Presentation Parallel Buffer Tree Complexity For sequences of N insert/delete/find(/range) operations: O(sort P (N)) I/Os without range search O(sort P (N) + K/PB) I/Os with range searches. sort P (N) = O(N/PB log B N/B) I/Os Parallel B-tree needs O(N/Plog B N) I/Os.

COMP 5704 Project Presentation Required Parallel Algorithms Parallel sorting for batch operations. Parallel merge sort used. Parallel prefix sums Needed for range query support. Distributes batched operations in buckets.

COMP 5704 Project Presentation Implementation Overview Intel Cilk++ SDK with GCC used. Available at us/articles/download-intel-cilk-sdkhttp://software.intel.com/en- us/articles/download-intel-cilk-sdk Parallel merge sort from class used. Range query extension not implemented.

COMP 5704 Project Presentation Implementation Details Buffer tree is an (a,b)-tree, a=f/4, b=f, f>= PB Each leaf stores up to B elements. Each non-leaf has a buffer of size 2fB. Internal nodes have k-1 routing elements to direct values to children. k = num. of children

COMP 5704 Project Presentation Implementation Details - Operations Tree builds up batches of PB operations before executing them. An operation is its type, value, and timestamp. The PB batches operations are split into P blocks and sent to the root in parallel.

COMP 5704 Project Presentation Emptying Non-fringe buffers Sort the buffer by value and timestamp. Answer Find operations with matching Insert/Delete operations. Cancel out matching Insert/Deleting operations. Distribute buffer elements to children based on the routing elements. Recursively empty children buffers with more than fB operations.

COMP 5704 Project Presentation Emptying Fringe Buffers Convert all values within children nodes into insert operations with negative infinity timestamp. Sort the buffer by value and timestamp. Answer Find operations, cancel out Insert/Deletes. Based on remaining operations: If <= fB then remake child nodes. If > fB then create new siblings for each fB/2 operations. Tree rebalancing may be required.

COMP 5704 Project Presentation Node Rebalancing

COMP 5704 Project Presentation Results Test System Specs: Quad-core running Fedora GB of RAM. Sequential comparison structures: C++ std::set online structure Parallel Buffer Tree with 1 worker.

COMP 5704 Project Presentation Results So Far – Build Times

COMP 5704 Project Presentation Conclusions Parallel speedup vs sequential version is high with enough input. Performance is not competitive against equivalent online data structures thus far. Would need about 12 cores to match std::set. May be practical for high volume external memory applications.

COMP 5704 Project Presentation Questions What is an offline data structure? What kind of I/O operations is the Parallel External Memory (PEM) model concerned with? Why can a Buffer tree be loaded with N elements faster than a B-tree according to big- O?

COMP 5704 Project Presentation References N. Sitchinava, N. Zeh, A Parallel Buffer Tree L. Arge, External Memory Data Structures - c.dk/research/algorithms/Kurser/AA/2002F/Uge 11/handbook.pdf c.dk/research/algorithms/Kurser/AA/2002F/Uge 11/handbook.pdf