Join Processing for Flash SSDs: Remembering Past Lessons

Slides:



Advertisements
Similar presentations
Paper by: Yu Li, Jianliang Xu, Byron Choi, and Haibo Hu Department of Computer Science Hong Kong Baptist University Slides and Presentation By: Justin.
Advertisements

Flash storage memory and Design Trade offs for SSD performance
Robust query processing Goetz Graefe, Christian König, Harumi Kuno, Volker Markl, Kai-Uwe Sattler Dagstuhl – September 2010.
Query Processing and Optimizing on SSDs Flash Group Qingling Cao
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
Disk Access Model. Using Secondary Storage Effectively In most studies of algorithms, one assumes the “RAM model”: –Data is in main memory, –Access to.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
Hystor : Making the Best Use of Solid State Drivers in High Performance Storage Systems Presenter : Dong Chang.
Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…
Buying a Laptop. 3 Main Components The 3 main components to consider when buying a laptop or computer are Processor – The Bigger the Ghz the faster the.
Operating Systems CMPSC 473 I/O Management (2) December Lecture 24 Instructor: Bhuvan Urgaonkar.
Tape is Dead Disk is Tape Flash is Disk RAM Locality is King Jim Gray Microsoft December 2006 Presented at CIDR2007 Gong Show
Slide 1 Windows PC Accelerators Reporter :吳柏良. Slide 2 Outline l Introduction l Windows SuperFetch l Windows ReadyBoost l Windows ReadyDrive l Conclusion.
Lecture 11: DMBS Internals
C-Store: Column Stores over Solid State Drives Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 19, 2009.
Flash research report Da Zhou Outline Query Processing Techniques for Solid St ate Drives (Research Paper) Join Processing for Flash SSDs: Rememb.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
Logging in Flash-based Database Systems Lu Zeping
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
ICS 321 Fall 2011 Overview of Storage & Indexing (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/9/20111Lipyeow.
A Case for Flash Memory SSD in Enterprise Database Applications Authors: Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, Sang-Woo Kim Published.
Hardware. Make sure you have paper and pen to hand as you will need to take notes and write down answers and thoughts that you can refer to later on.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
Programming for GCSE Topic 5.1: Memory and Storage T eaching L ondon C omputing William Marsh School of Electronic Engineering and Computer Science Queen.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
PROBLEM STATEMENT A solid-state drive (SSD) is a non-volatile storage device that uses flash memory rather than a magnetic disk to store data. SSDs provide.
Disk Basics CS Introduction to Operating Systems.
A Semi-Preemptive Garbage Collector for Solid State Drives
연세대학교 Yonsei University Data Processing Systems for Solid State Drive Yonsei University Mincheol Shin
The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection.
Internal Parallelism of Flash Memory-Based Solid-State Drives
Storage HDD, SSD and RAID.
TYPES OF MEMORY.
Chapter 10: Mass-Storage Systems
CS 540 Database Management Systems
Storage Access Paging Buffer Replacement Page Replacement
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
CS 554: Advanced Database System Notes 02: Hardware
Oracle SQL*Loader
Join Processing for Flash SSDs: Remembering Past Lessons
File Processing : Storage Media
Lecture 11: DMBS Internals
Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: illusion of having more physical memory program relocation protection.
Repairing Write Performance on Flash Devices
Lecture 9: Data Storage and IO Models
Disk Storage, Basic File Structures, and Buffer Management
Secondary Storage Devices
Database Management Systems (CS 564)
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
File Processing : Storage Media
Building a Database on S3
Tape is Dead Disk is Tape Flash is Disk RAM Locality is King
Persistence: hard disk drive
Lecture 18: DMBS Overview and Data Storage
Tape is Dead Disk is Tape Flash is Disk RAM Locality is King
CS 245: Database System Principles Notes 02: Hardware
CSE 190D Database System Implementation
Introduction to Operating Systems
Presentation transcript:

Join Processing for Flash SSDs: Remembering Past Lessons Jaeyoung Do, Jignesh M. Patel Department of Computer Sciences University of Wisconsin-Madison All right. Welcome to my presentation. My name is Jaeyoung, Do. Today I’m going to talk about the performance of join algorithms on flash SSDs.

Flash Solid State Drives (SSDs) ▶ Benefits of Flash SSDs Low power consumption High shock resistance Fast Random Access ▶ Flash Densities 1995. 16 MB NAND Flash 2005. 16 GB NAND Flash 2010. 1 TB NAND Flash (predicted by Samsung) ▶ Flash Prices Decreasing continuously “Flash is disk, disk is tape, and tape is dead” GB What is flash Solid State Drives. Flash SSDs are data storage devices that use NAND flash memory chips. Currently, the most available and popular storage device is absolutely magnetic hard disk drives. However, since flash SSDs have many benefits compared to magnetic hard disk drives, they are expected to gradually replace hard disks as the primary permanent storage media in large data centers. These two graphs show the change of flash densities and flash prices as time goes by. As its capacity continues to increase and price drops, Jim Gray’s prediction of “Flash is disk, disk is tape, and tape is dead” is coming close to reality in many applications. Then, what are advantages of flash SSDs over magnetic hard disks? $/MB DaMoN 09

Flash SSDs for DBMSs still apply to flash SSDs? ▶ Many previous works with flash SSDs Flash-based DBMS [Gray ACM Queue 08, Graefe DaMoN07] In-Page Logging [Lee VLDB07] Transaction Processing [Lee VLDB08] B+ Tree Index [Li ICDE09] I/O Benchmarks [Bouganim CIDR09] ▶ We focus on Join algorithms New join algorithms [Shah DaMoN08, Tsirogiannis SIGMOD09] How can we use flash SSDs for DBMSs? So, the use of flash SSDs for DBMSs are being actively studied in various fields including transaction processing and b+ tree index. In this work, we focus on joins which are common, but expensive query processing operations. There are already a few researches in order to develop new join algorithms on flash SSDs. Rather than developing new join algorithm, our aim of this research is to see what lessons join techniques that worked for magnetic hard disk drives continue to work with flash SSDs. So, we wanna see what happen if we just replace magnetic HDDs with flash SSDs, and examine the effects of tuning parameters of join performance. For joins, we already have a plenty of lessons that we have learnt over three decases of desinging and turinng join algorihtms for hard disk drive based systems. If joins were done with flash SSDs, would theses lessons be meaningless? As we will see later, the answer is NO. Many lessons still important with flash SSDs Obviously many lessons learnt for magneis HDDs still work for flash SSDs. The focust of this work Purpose here is to Different XX however, What lessons learnt for magnetic HDDs still apply to flash SSDs? DaMoN 09

Goals 2. Providing better insights for join performance on flash SSDs ▶Demonstrate the importance of recalling past lessons about efficient joins in magnetic HDDs ▶ Explore the effects of various parameters for joins on flash SSDs 1. Not inventing new join algorithms 2. Providing better insights for join performance on flash SSDs Therefore, the first goal of our research is that we want to recall some of the important lessons about efficient join processing in magnetic hard disks determining if these lessons also apply to joins using flash SSDs. In addition, we want to see how tuned parameters based on characteristic of magnetic HDDs works for joins on flash SSDs. Better staring points when designing new join algori Wthms for flash SSDs. We want to see the effects of tuned parameters that are based on characteristics of HDD, and how they work for join on flahs SSDs DaMoN 09

Our Approach ▶ Investigate four popular ad hoc join algorithms Block Nested Loops Join (BNL) Sort-Merge Join (SM) Grace Hash Join (GH) Hybrid Hash Join (HH) ▶ Conduct experiments as varying various parameters Memory buffer pool size Page size I/O unit size To achieve the goals, we have investigated four ad hoc join algorithms which are popular in both literacture and industry. four popular ad hoc join algorithms, namely BNL, SM, GH, HH. Then, we have conducted some experiments to see the effect of tuned parameters such as … DaMoN 09

Roadmap ▶ Introduction / Goals ▶ Ad hoc join algorithms ▶ Experimental Results ▶ Conclusion / Future Work DaMoN 09

Assumptions ▶ Blocked I/O is available ▶ We use the buffer allocation strategy tuned for magnetic HDDs [Haas et al. VLDB97] Before explaining the join algorithms considered in our work, let me give some assumptions. First, Rather than fetching one page in each I/O operation we used blocked I/O to sequentially read and write multiple pages per an I/O operation. In the case of magnetic hard disk drives, blocked I/O is useful because it amortizes the cost of expensive disk seeks and rotational delays. Buffer allocation strategy means how we divide the buffer pool in the course of join processings. Also, a lot is known about how to optimize joins with magnetic HDDs to use the available memory buffer pool effectively. Specifically, Hass showed that the right buffer pool allocation strategy can have a huge impact – upto 400% improvements in some cases. In this work, we use the same buffer alolcation method for both flash SSDs and magnetic HDDs. While these allocations may not be optimal for flash SSDs, our goal here is to start with the best allocation strategy for magnetic HDDs, and explore what happens if we simply use the same settings when replacing a magnetic HDD with a flash SSD. per a disk I/O. DaMoN 09

Block Nested Loops Join IS= Block Nested Loops Join Input Buffer for R Result Input Buffer for S Buffer Pool R S IS= , IR=B-IS Disk DaMoN 09

Sort-Merge Join Buffer Pool Buffer Pool Result Disk Disk Working Space Input Buffer Input Buffer Result Input Buffer Output Buffer Input Buffer R Optimization is out of concerns why?> S Disk Disk DaMoN 09

Grace Hash Join Buffer Pool Buffer Pool Result Disk Disk Input Bucket 1 Input Buffer for R Result Input Buffer for S h Bucket k R S Disk Disk DaMoN 09

Hybrid Hash Join Buffer Pool Buffer Pool Result Result Disk Disk In Memory Bucket Bucket 1 Result Input Buffer for R Result Input Buffer Input Buffer for S h Bucket k R S Disk Disk DaMoN 09

Roadmap ▶ Introduction / Goals ▶ Ad-hoc join algorithms ▶ Experimental Results ▶ Conclusion / Future Work DaMoN 09

Experimental Setup ▶ A single-thread and light-weight database engine ▶ Flash SSD and magnetic HDD OCZ Core Series 2.5” SATA 60 GB TOSHIBA 5400 RPM 320 GB ▶ Data Set: TPC-H Customer : 730 MB Orders : 5 GB ▶ Platform Dual Core 3.2 GHz Intel Pentium, Red Hat Max. DB buffer pool size 600 MB SSD HDD None Avg. Seek Time 12 ms 0.35 ms Avg. Latency 5.56 ms 120 MB/sec Read Data Transfer Rate 34 MB/sec 80 MB/sec Write Data 230.99 $ (3.85 $/GB) Price 129.99 $ (0.36 $/GB) MAX BUFER pAGE Source: OCZ and TOSHIBA DaMoN 09

Effect of Blocked I/O (HDD) 500 MB Buffer Pool, 8 KB Page Size Non-Blocked I/O Blocked I/O 2.1X CPU time I/O time 2.2X Join Time (sec) 2.0X 2.3X 500 MB, 8 KB BNL SM GH HH BNL SM GH HH HDD DaMoN 09

Effect of Blocked I/O (SSD) 500 MB Buffer Pool, 8 KB Page Size Non-Blocked I/O Blocked I/O CPU time I/O time 1.7X Join Time (sec) 1.6X 1.7X 1.9X 184 346 530 124 155 279 BNL SM GH HH BNL SM GH HH SSD Using blocked I/O is critical DaMoN 09

Joins are I/O Bound? (HDD) 8 KB Page Size, Blocked I/O Buffer Pool Size 200 MB 500 MB CPU time I/O time 0.68 0.65 0.69 Join Time (sec) Join Time (sec) 0.62 0.58 0.34 0.61 0.31 8 KB, Blocked I/O 200 MB 500 MB GH SM is IO bound, 8 KB Page Size, Blocked I/O BNL SM GH HH BNL SM GH HH HDD DaMoN 09

Joins are I/O Bound? (SSD) 8 KB Page Size, Blocked I/O Buffer Pool Size 200 MB 500 MB CPU time I/O time Join Time (sec) 1.78 1.09 0.70 1.35 1.69 0.64 1.79 0.8 8 KB, Blocked I/O 200 MB 500 MB GH SM is IO bound, 8 KB Page Size, Blocked I/O BNL SM GH HH BNL SM GH HH SSD Joins may become CPU-bound sooner DaMoN 09

Effect of Varying the Page Size (SSD) 500 MB Buffer Pool, Blocked I/O BNL SM GH HH CPU time I/O time Join Time (sec) bigger 2 8 32 2 8 32 2 8 32 2 8 32 Page size (KB) When using Blocked I/O, the page size has a small impact on join performance DaMoN 09

Performance Tendency (SSD) 8 KB Page Size, Blocked I/O Buffer Pool Size 200 MB 400 MB 600 MB CPU time I/O time Join Time (sec) BNL SM GH HH BNL SM GH HH BNL SM GH HH 1. Superiority of HH DaMoN 09

Performance Tendency (SSD) 8 KB Page Size, Blocked I/O Buffer Pool Size 200 MB 400 MB 600 MB CPU time I/O time Join Time (sec) BNL SM GH HH BNL SM GH HH BNL SM GH HH 2. Compatibility of BNL DaMoN 09

Performance Tendency (SSD) More details in our DaMoN’09 paper Performance Tendency (SSD) 8 KB Page Size, Blocked I/O Buffer Pool Size 200 MB 400 MB 600 MB CPU time I/O time 1.6X 2.8X 1.7X Join Time (sec) BNL SM GH HH BNL SM GH HH BNL SM GH HH 3. GH is not the winner! [Shah et al. DaMoN08] DaMoN 09

Conclusions / Future Work ▶ Traditional Join optimizations continue to be important with flash SSDs Blocked I/O dramatically improves join performance Buffer allocation strategy has an impact on join performance It is even more critical to consider both CPU and I/O costs ▶ Future Work Expand the range of hardware, and consider other HDD-based configurations Derive detailed cost models for existing join algorithms, and explore the optimal buffer allocations for flash SSDs Provide better insights for join performance on flash SSDs 608-886-7154 Before you go to develop new join algorithms on flash SSDs, m ake sure that you can get large performace improvements with basic join algorithms by using many traditional join optimizations. DaMoN 09

Backup Slides Research developers, not the vendors. Not to trying to say “this is what oricle you should do” Look. We took old stnadard algorihtms. And we modified some parameters We got performance differences there Before you go to develop new join algorihtms on flash, Make sure you really tuned basic algorithms with which you can get performacne improvements. Just don’t throw the algorihtms, and Make sure that you can do with these basic algorithms. You can improve almost the factor of two. Just by doing blocking. If you look at other papers, they claims new join They don’t do this blocking, they don’t do this buffer allocation strategy DaMoN 09

Joins are I/O Bound? HDD 8 KB Page Size, Blocked I/O Join Time (sec) CPU time I/O time 8 KB Page Size, Blocked I/O Buffer Pool Size HDD Avg. Seek Time 8.6 msec Avg. Latency 4.2 msec Read Data Transfer Rate 45 MB/sec Write Data 44 MB/sec 200 MB 500 MB 0.78 0.73 0.76 0.79 0.65 Join Time (sec) 0.40 0.80 0.34 BNL SM GH HH BNL SM GH HH Dell 500GB 720 RPM SATA HDD HDD DaMoN 09

CPU times of block nested loops join Magnetic HDD Flash SSD Page size User Kernel 2 KB 187.5 sec 108.4 sec 204.4 sec 324.0 sec 4 KB 192.8 sec 49.0 sec 205.5 sec 157.9 sec 8 KB 190.5 sec 27.9 sec 183.6 sec 80.6 sec 16 KB 186.6 sec 15.5 sec 188.8 sec 39.8 sec 32 KB 187.2 sec 8.6 sec 185.9 sec 22.4 sec DaMoN 09

Buffer Allocations DaMoN 09

Block Nested Loops Join ▶ BNL shows the biggest performance improvement Buffer Pool Size Algorithm 100 MB 200 MB 300 MB 400 MB 500 MB 600 MB BNL 1.64 X 1.59X 1.72X 1.73X 1.67X 1.65X SM 1.41X 1.45X 1.44X 1.43X 1.48X GH 1.34X 1.29X 1.33X 1.39X 1.30X HH 1.55X 1.35X 1.51X 1.50X DaMoN 09

SM and GH ▶ Random writes show poor performance with flash SSDs Buffer Pool Size 200 MB 500 MB 200 MB 500 MB CPU time I/O time 8 KB Page Size Blocked I/O Join Time (sec) 500 MB, 8 KB SM GH SM GH SM GH SM GH HDD SDD DaMoN 09

An Internal Structure of Flash SSDs Flash Chip Flash block Flash page Operation Time of NAND Flash Time Page read 20 μs Page write 200 μs Block erase 1.5 ms Samsung Electronics [2005] Example) Case 1) Write 256 KB in an I/O operation = one flash block erase + 64 flash page writes Case 2) Write 256 KB in 64 I/O operations = (at worst) 64 flash block erases + 64 flash page writes DaMoN 09

An Internal Structure of Flash SSDs Mapping Table LBA PBA 1 2 3 Example) Sequential Writes –> 1, 2, 3 = one flash block erase + 3 flash page writes Random Writes -> 3, 100, 99, 1, 2 = (at worst) two flash block erases + 3 flash page moves + 3 flash page writes DaMoN 09

Flash SSDs VS. Magnetic HDDs Enterprise 2.5” SATA Consumer Drive Type 15K RPM 3.5” SCSI 160 7200 RPM SATA Intel X25-E Memoright GT Model Seagate ST3300655LC Seagate Barracuda 7200.12 64 GB 32 GB Capacity 300 GB 750 GB $ 749 ($12) $ 450 ($14) Price ($/GB) $440 ($1.47) $ 94 ($0.13) HDDs ▶ Lower power consumption SSDs To see the advantages, We compare two flash SSDs and two magnetic hard disks, which are widely-used in many researches. As you can see in this table, current offerings of flash SSDs are more expensive than magnetic hard disk drives. But, flash SSDs offer really fascinating advantages such as low power consumption and high shock resistance. And, perhaps the most curial benefit of flash SSDs over magnetic HDD is that they don’t have mechanically moving parts. ▶ Higher shock resistance SSDs HDDs DaMoN 09

Flash SSDs VS. Magnetic HDDs Enterprise 2.5” SATA Consumer Drive Type 15K RPM 3.5” SCSI 160 7200 RPM SATA Intel X25-E Memoright GT Model Seagate ST3300655LC Seagate Barracuda 7200.12 NONE Avg. Seek Time 4.0 ms 8.9 ms 0.085 ms 0.1 ms Avg. Latency 2 ms 4.17 ms 222 MB/sec 87 MB/sec Read Transfer Rate 57 MB/sec 47 MB/sec 178 MB/sec 85 MB/sec Write Transfer Rate 55 MB/sec 44 MB/sec ▶ Fast Access Time Fast Random Reads SSDs SSDs Therefore, with flash SSDs, there are no seek times and really small latency compared to magnetic hard disk drives. Consequently, they privde larger sequential read and write bandwidths and offer much faster access time. (Some of you might be wondering why …. I’ll explain this later) HDDs HDDs DaMoN 09