CS 295: Modern Systems Efficient Use of High Performance Storage

Slides:



Advertisements
Similar presentations
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Advertisements

CS4432: Database Systems II
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Storage CMSC 461 Michael Wilson. Database storage  At some point, database information must be stored in some format  It’d be impossible to store hundreds.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
SUYASH BHARDWAJ FACULTY OF ENGINEERING AND TECHNOLOGY GURUKUL KANGRI VISHWAVIDYALAYA, HARIDWAR.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
COMP261 Lecture 23 B Trees.
CPS216: Data-intensive Computing Systems
CS522 Advanced database Systems
CS 540 Database Management Systems
Multiway Search Trees Data may not fit into main memory
Tree-Structured Indexes
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Tree Indices Chapter 11.
B-Trees 7/5/2018 4:26 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Database System Implementation CSE 507
Extra: B+ Trees CS1: Java Programming Colorado State University
B+-Trees.
B+ Tree.
Hashing Exercises.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B+ Trees Similar to B trees, with a few slight differences
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
B+ Trees Similar to B trees, with a few slight differences
B-Trees.
B+-Trees and Static Hashing
CS222/CS122C: Principles of Data Management Notes #07 B+ Trees
Tree-Structured Indexes
Tree-Structured Indexes
B-Tree.
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
(2,4) Trees (2,4) Trees (2,4) Trees.
Tree-Structured Indexes
Multiway Trees Searching and B-Trees Advanced Tree Structures
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
(2,4) Trees 2/15/2019 (2,4) Trees (2,4) Trees.
Database Design and Programming
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
CPS216: Advanced Database Systems
Tree-Structured Indexes
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #06 B+ trees Instructor: Chen Li.
CS210- Lecture 20 July 19, 2005 Agenda Multiway Search Trees 2-4 Trees
CS222P: Principles of Data Management UCI, Fall Notes #06 B+ trees
Presentation transcript:

CS 295: Modern Systems Efficient Use of High Performance Storage Sang-Woo Jun Spring, 2019

Queue Depth and Performance For high bandwidth, enough requests must be in flight to keep many chips busy With fread/read/mmap, need to spawn many threads to have concurrent requests Traditionally with thread pool that makes synchronous requests (POSIX AIO library and many others) Billy Tallis, “Intel Optane SSD DC P4800X 750GB Hands-On Review,” AnandTech 2017

Some Background – Page Cache Linux keeps a page cache in the kernel that stores some pages previously read from storage Automatically tries to expand into unused memory space Page cache hit results in high performance Data reads involve multiple copies (Device → Kernel → User) Tip: Write “3” to /proc/sys/vm/drop_caches to flush all caches Page cache can be bypasses via “direct mode” “open” syscall with O_DIRECT Lower operating system overhead, but no benefit of page cache hits Useful if application performs own caching, or knows there is zero reuse

Asynchronous I/O Many in-flight requests created via non-blocking requests Generate a lot of I/O requests from a single thread Storage Host Synchronous Asynchronous with queue depth >= 7

Asynchronous I/O Option 1: POSIX AIO library Creates thread pool to offload blocking I/O operations – Queue depth limited by thread count Part of libc, so easily portable Can work with page caches Option 2: Linux kernel AIO library (libaio) Asynchrony management offloaded to kernel (not dependent on thread pool) Despite efforts, does not support page cache yet (Only O_DIRECT) Especially good for applications that manage own cache (e.g., DBMSs)

Linux Kernel libaio Basic flow aio_context_t created via io_setup struct iocb created for each io request, and submitted via io_submit Check for completion using io_getevents Multiple aio_context_t may be created for multiple queues Best performance achieved by multiple contexts across threads, each with large nr_events Multi thread not because of aio overhead, but actual data processing overhead

libaio Example Create context Send request Poll results Arguments to recognize results Poll results Recognize results with arguments

Significant Performance Benefits! Typically single thread can saturate NVMe storage If the thread is not doing any processing Let’s look at performance numbers Surprisingly few applications use libaio, but many do

Data Structures for Storage Most of what we learned about cache-oblivious data structures also work here

B-Tree Generalization of a binary search tree, where each node can have more than two children Typically enough children for each node to fill a file system page (Data loaded from storage is not wasted) If page size is known, very effective data structure Remember the performance comparison with van Emde Boas tree Brodal et.al., “Cache Oblivious Search Trees via Binary Trees of Small Height,” SODA 02

B-Tree – Quick Recap Self-balancing structure! Insertion is always done at a leaf If the leaf is full, it is split If leaf splitting results in a parent overflow, split parent, repeat upwards If root overflows, create a new root, and split old root Tree height always increases from the root, balancing the tree

B-Tree – Quick Recap Deletion is also always done on the leaf If leaf underflows (according to a set parameter), perform rotation to keep balance Rotation always maintains balance If left sibling has enough children, take largest from it Element separating siblings in the parent is moved to current node to separate the new child If right sibling has enough children, take largest from it The same If no siblings have enough children, merge two siblings Remove the separating element in the parent. If this causes an underflow, repeat rotation at parent

B-Tree – Quick Recap – Rotation Dr. Fang Tang, “CS241: Data Structures and Algorithms II,” 2017

B+Tree B-Tree modified to efficiently deal with key-value pairs Two separate types of nodes: internal and leaf B-Tree had elements in both intermediate nodes and leaves Internal nodes only contain keys for keeping track of children Values are only stored in leaf nodes All leaves are also connected in a linked list, for efficient range querying-

Implicit B+Tree An implicit tree is when the structure of the tree is implicitly determined from the layout in memory For example, an implicit binary tree can be represented as an array, where elements are laid out in breadth-first order Wasted space if tree is not balanced B-Tree and variants are self-balancing Implicit B+Tree node uses less memory Useful for when data is in storage, but index is in memory Useful when data is static! opendatastructures.org

Log-Structured Merge (LSM) Tree Storage-optimized tree structure Key component of many modern DBMSs (RocksDB,Bigtable,Cassandra, …) Consists of mutable in-memory data structure, and multiple immutable external (in-storage) data structures Updates applied to in-memory data structure In-memory data structure regularly flushed to new instance in storage Lookups must search the in-memory structure, and potentially all instances in storage if not

Log-Structured Merge (LSM) Tree In-memory: mutable, search-optimized data structure like B-Tree After it reaches a certain size (or some time limit reached), flushed to storage and starts new External component: many immutable trees Typically search optimized external structure like Sorted String Tables New one created every time memory flushes Updates are determined by timestamp, deletions by placeholder markers Search from newest file to old Like clustered indices Alex Petrov, “Algorithms Behind Modern Storage Systems,” ACM Queue, 2018

Log-Structured Merge (LSM) Tree Because external structures are immutable and only increase, periodic compaction is required Since efficient external data structures are sorted, typically simple merge-sort is efficient Key collisions are handled by only keeping new data

Some Performance Numbers Data from iibench for MongoDB Small Datum, “Read-modify-write optimized,” 2014 (http://smalldatum.blogspot.com/2014/12/read-modify-write-optimized.html)

Bloom Filter Probabilistic data structures to test if an element is a member of a set Not actually storage-aware algorithm Maintain a relatively small m array of bits Insert: use k hash functions and set all hashed locations to 1 Check: use k hash functions, if a hash location is not 1, the element definitely does not exist in the set Effective with even small m False positives, and becomes less useful with more insertions Source: Wikipedia