Join Processing for Flash SSDs: Remembering Past Lessons

Slides:



Advertisements
Similar presentations
- Dr. Kalpakis CMSC Dr. Kalpakis 1 Outline In implementing DBMS we need to answer How should the system store and manage very large amounts of data?
Advertisements

Paper by: Yu Li, Jianliang Xu, Byron Choi, and Haibo Hu Department of Computer Science Hong Kong Baptist University Slides and Presentation By: Justin.
A New Cache Management Approach for Transaction Processing on Flash-based Database Da Zhou
Robust query processing Goetz Graefe, Christian König, Harumi Kuno, Volker Markl, Kai-Uwe Sattler Dagstuhl – September 2010.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Query Processing and Optimizing on SSDs Flash Group Qingling Cao
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
Hystor : Making the Best Use of Solid State Drivers in High Performance Storage Systems Presenter : Dong Chang.
Analyzing the Energy Efficiency of a Database Server Hanskamal Patel SE 521.
Tape is Dead Disk is Tape Flash is Disk RAM Locality is King Jim Gray Microsoft December 2006 Presented at CIDR2007 Gong Show
Lecture 11: DMBS Internals
C-Store: Column Stores over Solid State Drives Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 19, 2009.
Flash research report Da Zhou Outline Query Processing Techniques for Solid St ate Drives (Research Paper) Join Processing for Flash SSDs: Rememb.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
Logging in Flash-based Database Systems Lu Zeping
A Case for Flash Memory SSD in Enterprise Database Applications Authors: Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, Sang-Woo Kim Published.
Hardware. Make sure you have paper and pen to hand as you will need to take notes and write down answers and thoughts that you can refer to later on.
Module 1: Concepts of information Technology.  Central processing unit (CPU)  Hard disk  Common input and output devices  Types of memory Main Parts.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
PROBLEM STATEMENT A solid-state drive (SSD) is a non-volatile storage device that uses flash memory rather than a magnetic disk to store data. SSDs provide.
Disk Basics CS Introduction to Operating Systems.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection.
Activity 1 Review the work from last lesson so that you can explain the following: -What is the purpose of a CPU. -What steps does the CPU take to process.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
1 Cache-Oblivious Query Processing Bingsheng He, Qiong Luo {saven, Department of Computer Science & Engineering Hong Kong University of.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
Chapter 10: Mass-Storage Systems
CS 540 Database Management Systems
Join Processing for Flash SSDs: Remembering Past Lessons
Storage Access Paging Buffer Replacement Page Replacement
Hardware specifications
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
External Sorting Chapter 13 (Sec ): Ramakrishnan & Gehrke and
CS 554: Advanced Database System Notes 02: Hardware
End of XQuery DBMS Internals
Oracle SQL*Loader
File Processing : Storage Media
Lecture 11: DMBS Internals
Repairing Write Performance on Flash Devices
Lecture 9: Data Storage and IO Models
Disk Storage, Basic File Structures, and Buffer Management
Database Management Systems (CS 564)
CS 140 Lecture Notes: Technology and Operating Systems
Lecture#12: External Sorting (R&G, Ch13)
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
File Processing : Storage Media
Yan Huang - CSCI5330 Database Implementation – Access Methods
Building a Database on S3
CS 140 Lecture Notes: Technology and Operating Systems
Tape is Dead Disk is Tape Flash is Disk RAM Locality is King
Lecture 2- Query Processing (continued)
7.2 Hardware Storage Devices
Lecture 18: DMBS Overview and Data Storage
The Gamma Database Machine Project
Tape is Dead Disk is Tape Flash is Disk RAM Locality is King
CS 245: Database System Principles Notes 02: Hardware
CSE 190D Database System Implementation
Introduction to Operating Systems
Presentation transcript:

Join Processing for Flash SSDs: Remembering Past Lessons Jaeyoung Do, Jignesh M. Patel Department of Computer Sciences University of Wisconsin-Madison All right. Welcome to my presentation. My name is Jaeyoung, Do. Today I’m going to talk about the performance of join algorithms on flash SSDs.

Flash Solid State Drives (SSDs) ▶ Benefits of Flash SSDs No seek time, small latency ▶ Flash Densities 1995. 16 MB NAND Flash 2005. 16 GB NAND Flash 2010. 1 TB NAND Flash (predicted by Samsung) ▶ Flash Prices Decreasing continuously GB What is flash Solid State Drives. Flash SSDs are data storage devices that use NAND flash memory chips. Currently, the most available and popular storage device is absolutely magnetic hard disk drives. However, since flash SSDs have many benefits compared to magnetic hard disk drives, they are expected to gradually replace hard disks as the primary permanent storage media in large data centers. These two graphs show the change of flash densities and flash prices as time goes by. As its capacity continues to increase and price drops, Jim Gray’s prediction of “Flash is disk, disk is tape, and tape is dead” is coming close to reality in many applications. Then, what are advantages of flash SSDs over magnetic hard disks? $/MB DaMoN 09

Flash SSDs for DBMSs still apply to flash SSDs? ▶ Many previous works with flash SSDs Flash-based DBMS [Gray ACM Queue 08, Graefe DaMoN07] In-Page Logging [Lee VLDB07] Transaction Processing [Lee VLDB08] B+ Tree Index [Li ICDE09] I/O Benchmarks [Bouganim CIDR09] ▶ We focus on Join algorithms New join algorithms [Shah DaMoN08, Tsirogiannis SIGMOD09] So, the use of flash SSDs for DBMSs are being actively studied in various fields including transaction processing and b+ tree index. In this work, we focus on joins which are common, but expensive query processing operations. There are already a few researches in order to develop new join algorithms on flash SSDs. Rather than developing new join algorithm, our aim of this research is to see what lessons join techniques that worked for magnetic hard disk drives continue to work with flash SSDs. So, we wanna see what happen if we just replace magnetic HDDs with flash SSDs, and examine the effects of tuning parameters of join performance. But we already got plenty of lessons that we have learnt over three decases of desinging and turinng join algorihtms for hard disk drive based systems. If joins were done with flash SSDs, would theses lessons be meaningless? As we will see later, the answer is NO. Obviously many lessons learnt for magneis HDDs still work for flash SSDs. The focust of this work Purpose here is to Different XX however, What lessons learnt for magnetic HDDs still apply to flash SSDs? DaMoN 09

Goals 2. Providing better insights for join performance on flash SSDs ▶Demonstrate the importance of recalling past lessons about efficient joins in magnetic HDDs ▶ Explore the effects of various parameters for joins on flash SSDs 1. Not inventing new join algorithms 2. Providing better insights for join performance on flash SSDs Therefore, the first goal of our research is that we want to recall some of the important lessons about efficient join processing in magnetic hard disks determining if these lessons also apply to joins using flash SSDs. In addition, we want to see how tuned parameters based on characteristic of magnetic HDDs works for joins on flash SSDs. Better staring points when designing new join algori Wthms for flash SSDs. We want to see the effects of tuned parameters that are based on characteristics of HDD, and how they work for join on flahs SSDs DaMoN 09

Our Approach ▶ Investigate four popular ad hoc join algorithms Block Nested Loops Join (BNL) Sort-Merge Join (SM) Grace Hash Join (GH) Hybrid Hash Join (HH) ▶ Conduct experiments as varying various parameters Memory buffer pool size Page size I/O unit size To achieve the goals, we have investigated four ad hoc join algorithms which are popular in both literacture and industry. four popular ad hoc join algorithms, namely BNL, SM, GH, HH. Then, we have conducted some experiments to see the effect of tuned parameters such as … DaMoN 09

Roadmap ▶ Introduction / Goals ▶ Ad hoc join algorithms ▶ Experimental Results ▶ Conclusion / Future Work DaMoN 09

Assumptions ▶ Blocked I/O is available ▶ We use the buffer allocation strategy tuned for magnetic HDDs [Haas et al. VLDB97] Before explaining the join algorithms considered in our work, let me give some assumptions. First, Rather than fetching one page in each I/O operation we used blocked I/O to sequentially read and write multiple pages per an I/O operation. In the case of magnetic hard disk drives, blocked I/O is useful because it amortizes the cost of expensive disk seeks and rotational delays. Also, a lot is known about how to optimize joins with magnetic HDDs to use the available memory buffer pool effectively. Specifically, Hass showed that the right buffer pool allocation strategy can have a huge impact – upto 400% improvements in some cases. In this work, we use the same buffer alolcation method for both flash SSDs and magnetic HDDs. While these allocations may not be optimal for flash SSDs, our goal here is to start with the best allocation strategy for magnetic HDDs, and explore what happens if we simply use the same settings when replacing a magnetic HDD with a flash SSD. per a disk I/O. DaMoN 09

Block Nested Loops Join IS= Block Nested Loops Join Input Buffer for R Result Input Buffer for S Buffer Pool R S IS= , IR=B-IS Disk DaMoN 09

Sort-Merge Join Buffer Pool Buffer Pool Result Disk Disk Working Space Input Buffer Input Buffer Result Input Buffer Output Buffer Input Buffer R Optimization is out of concerns why?> S Disk Disk DaMoN 09

Grace Hash Join Buffer Pool Buffer Pool Result Disk Disk Input Bucket 1 Z Input Buffer for R Result Input Buffer for S h Bucket k R S Disk Disk DaMoN 09

Hybrid Hash Join Buffer Pool Buffer Pool Result Result Disk Disk In Memory Bucket Bucket 1 Result Input Buffer for R Result Input Buffer Input Buffer for S h Bucket k R S Disk Disk DaMoN 09

Roadmap ▶ Introduction / Goals ▶ Ad-hoc join algorithms ▶ Experimental Results ▶ Conclusion / Future Work DaMoN 09

Experimental Setup ▶ A single-thread and light-weight database engine ▶ Flash SSD and magnetic HDD OCZ Core Series 2.5” SATA 60 GB TOSHIBA 5400 RPM 320 GB ▶ Data Set: TPC-H Customer: 730 MB Orders: 5 GB ▶ Platform Dual Core 3.2 GHz Intel Pentium, Red Hat Max. DB buffer pool size: 600 MB SSD HDD None Avg. Seek Time 12 ms 0.35 ms Avg. Latency 5.56 ms 120 MB/sec Read Data Transfer Rate 34 MB/sec 80 MB/sec Write Data 230.99 $ (3.85 $/GB) Price 129.99 $ (0.36 $/GB) MAX BUFER pAGE Source: OCZ and TOSHIBA DaMoN 09

Effect of Blocked I/O (HDD) 500 MB Buffer Pool, 8 KB Page Size Non-Blocked I/O Blocked I/O 2.1X CPU time I/O time 2.2X Join Time (sec) 2.0X 2.3X 500 MB, 8 KB BNL SM GH HH BNL SM GH HH HDD DaMoN 09

Effect of Blocked I/O (SSD) 500 MB Buffer Pool, 8 KB Page Size Non-Blocked I/O Blocked I/O CPU time I/O time 1.7X Join Time (sec) 1.6X 1.7X 1.9X 184 346 530 124 155 279 BNL SM GH HH BNL SM GH HH SSD Using blocked I/O is critical DaMoN 09

Joins are I/O Bound? (HDD) 8 KB Page Size, Blocked I/O Buffer Pool Size 200 MB 500 MB CPU time I/O time 0.68 0.65 0.69 Join Time (sec) Join Time (sec) 0.62 0.58 0.34 0.61 0.31 8 KB, Blocked I/O 200 MB 500 MB GH SM is IO bound, 8 KB Page Size, Blocked I/O BNL SM GH HH BNL SM GH HH HDD DaMoN 09

Joins are I/O Bound? (SSD) 8 KB Page Size, Blocked I/O Buffer Pool Size 200 MB 500 MB CPU time I/O time Join Time (sec) 1.78 1.09 0.70 1.35 1.69 0.64 1.79 0.8 8 KB, Blocked I/O 200 MB 500 MB GH SM is IO bound, 8 KB Page Size, Blocked I/O BNL SM GH HH BNL SM GH HH SSD Joins may become CPU-bound sooner DaMoN 09

Effect of Varying the Page Size (SSD) More details in our DaMoN’09 paper Effect of Varying the Page Size (SSD) 500 MB Buffer Pool, Blocked I/O BNL SM GH HH CPU time I/O time Join Time (sec) bigger 2 8 32 2 8 32 2 8 32 2 8 32 Page size (KB) When using Blocked I/O, the page size has a small impact on join performance DaMoN 09

Conclusions / Future Work ▶ Traditional Join optimizations continue to be important with flash SSDs Blocked I/O dramatically improves join performance Buffer allocation strategy has an impact on join performance It is even more critical to consider both CPU and I/O costs ▶ Future Work Expand the range of hardware, and consider other HDD-based configurations Derive detailed cost models for existing join algorithms, and explore the optimal buffer allocations for flash SSDs Provide better insights for join performance on flash SSDs 608-886-7154 Before you go to develop new join algorithms on flash SSDs, m ake sure that you can get large performace improvements with basic join algorithms by using many traditional join optimizations. DaMoN 09