1/25 Flash Device Support for Database Management Luc Bouganim, INRIA, Paris – Rocquencourt, France Philippe Bonnet, ITU Copenhagen, Denmark CIDR 2011.

Slides:



Advertisements
Similar presentations
Storing Data: Disk Organization and I/O
Advertisements

M AINTAINING L ARGE A ND F AST S TREAMING I NDEXES O N F LASH Aditya Akella, UW-Madison First GENI Measurement Workshop Joint work with Ashok Anand, Steven.
Paper by: Yu Li, Jianliang Xu, Byron Choi, and Haibo Hu Department of Computer Science Hong Kong Baptist University Slides and Presentation By: Justin.
1 uFLIP: Understanding Flash IO Patterns Luc Bouganim, INRIA Rocquencourt, France Philippe Bonnet, DIKU Copenhagen, Denmark Björn Þór Jónsson, RU Reykjavík,
Arjun Suresh S7, R College of Engineering Trivandrum.
Myoungsoo Jung (UT Dallas) Mahmut Kandemir (PSU)
Trading Flash Translation Layer For Performance and Lifetime
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
International Conference on Supercomputing June 12, 2009
Chapter 11: File System Implementation
Boost Write Performance for DBMS on Solid State Drive Yu LI.
Embedded Real-Time Systems Design Selecting memory.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Hystor : Making the Best Use of Solid State Drivers in High Performance Storage Systems Presenter : Dong Chang.
Comparing Coordinated Garbage Collection Algorithms for Arrays of Solid-state Drives Junghee Lee, Youngjae Kim, Sarp Oral, Galen M. Shipman, David A. Dillow,
Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Operating Systems CMPSC 473 I/O Management (2) December Lecture 24 Instructor: Bhuvan Urgaonkar.
Tape is Dead Disk is Tape Flash is Disk RAM Locality is King Jim Gray Microsoft December 2006 Presented at CIDR2007 Gong Show
Understanding Intrinsic Characteristics and System Implications of Flash Memory based Solid State Drives Feng Chen, David A. Koufaty, and Xiaodong Zhang.
Lecture 11: DMBS Internals
C-Store: Column Stores over Solid State Drives Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 19, 2009.
Storage Systems: Advanced Topics Learning Objectives: To understand major characteristics of SSD To understand Logical Volume Management – its motivations.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
Chapter 10 Storage and File Structure Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Exploiting Flash for Energy Efficient Disk Arrays Shimin Chen (Intel Labs) Panos K. Chrysanthis (University of Pittsburgh) Alexandros Labrinidis (University.
Origianal Work Of Hyojun Kim and Seongjun Ahn
Introduction. Outline What is database tuning What is changing The trends that impact database systems and their applications What is NOT changing The.
Seminar on Linux-based embedded systems
Logging in Flash-based Database Systems Lu Zeping
/38 Lifetime Management of Flash-Based SSDs Using Recovery-Aware Dynamic Throttling Sungjin Lee, Taejin Kim, Kyungho Kim, and Jihong Kim Seoul.
Speaker: 吳晋賢 (Chin-Hsien Wu) Embedded Computing and Applications Lab Department of Electronic Engineering National Taiwan University of Science and Technology,
Full-Datapath Secure Data Deletion Sarah Diesburg 5/4/
Tag line, tag line Power Management in Storage Systems Kaladhar Voruganti Technical Director CTO Office, Sunnyvale June 12, 2009.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
A Case for Flash Memory SSD in Enterprise Database Applications Authors: Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, Sang-Woo Kim Published.
Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee and Bongki Moon Presented by Chris Homan.
Embedded System Lab. Jung Young Jin The Design and Implementation of a Log-Structured File System D. Ma, J. Feng, and G. Li. LazyFTL:
1 Amit Berman Reliable Architecture for Flash Memory Joint work with Uri C. Weiser, Acknowledgement: thanks to Idit Keidar Department of Electrical Engineering,
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Computer Architecture Lecture 32 Fasih ur Rehman.
Operating Systems CMPSC 473 I/O Management (3) December 07, Lecture 24 Instructor: Bhuvan Urgaonkar.
A Semi-Preemptive Garbage Collector for Solid State Drives
Operating Systems: Wrap-Up Questions answered in this lecture: What is an Operating System? Why are operating systems so interesting? What techniques can.
연세대학교 Yonsei University Data Processing Systems for Solid State Drive Yonsei University Mincheol Shin
A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions Youyou Lu 1, Jiwu Shu 1, Jia Guo 1, Shuai Li 1, Onur Mutlu 2 LightTx:
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.
XIP – eXecute In Place Jiyong Park. 2 Contents Flash Memory How to Use Flash Memory Flash Translation Layers (Traditional) JFFS JFFS2 eXecute.
Application-Managed Flash
 The emerged flash-memory based solid state drives (SSDs) have rapidly replaced the traditional hard disk drives (HDDs) in many applications.  Characteristics.
CPSC 426: Building Decentralized Systems Persistence
Short History of Data Storage
Internal Parallelism of Flash Memory-Based Solid-State Drives
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
Storage Devices CS 161: Lecture 11 3/21/17.
Parallel-DFTL: A Flash Translation Layer that Exploits Internal Parallelism in Solid State Drives Wei Xie1 , Yong Chen1 and Philip C. Roth2 1. Texas Tech.
Database Management Systems (CS 564)
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
An Adaptive Data Separation Aware FTL for Improving the Garbage Collection Efficiency of Solid State Drives Wei Xie and Yong Chen Texas Tech University.
Operating Systems ECE344 Lecture 11: SSD Ding Yuan
CSI 400/500 Operating Systems Spring 2009
Repairing Write Performance on Flash Devices
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
Outline Motivation and background Read Write
PARAMETER-AWARE I/O MANAGEMENT FOR SOLID STATE DISKS
Parallel Garbage Collection in Solid State Drives (SSDs)
Use ECP, not ECC, for hard failures in resistive memories
COS 518: Advanced Computer Systems Lecture 9 Michael Freedman
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

1/25 Flash Device Support for Database Management Luc Bouganim, INRIA, Paris – Rocquencourt, France Philippe Bonnet, ITU Copenhagen, Denmark CIDR 2011 This work is partially supported by the Danish Strategic Research Council.

2/25 Outline Motivation Flash device behavior The Good, the Bad and the FTL Minimal FTL Bimodal FTL Example: Hash join on Bimodal FTL Conclusion Note: These slides are an extended version of the slides shown at CIDR 2011

3/25 DBMS on (or using) flash devices NAND flash performance is impressive  Flash devices is part of the memory hierarchy  Replace or complement hard disks DBMS design = 3 decades of optimization based on the (initial) hard disk behavior Revisit the DBMS design wrt. flash device behavior? Need to understand the behavior of flash devices

4/25 Some examples of behavior (Samsung) SR, SW and RR have similar (good) performance RW, not shown, are much more expensive, 10-30ms IO size (KB) Response time (μs)

5/25 Some examples of behavior (Samsung) Average performance can vary of an order of magnitude depending on the device state Random Writes (16KB) Out of the box Random Writes (16 KB) After filling the device

6/25 Some examples of behavior (Intel X25-E) RW (16 KB) performance varies from 100 μs to 100 ms!! (x 1000) SR, SW and RW have similar performance. RR are more costly!

7/25 Some examples of behavior (Fusion IO) Capacity vs Performance tradeoff Sensitivity to device state Low level formatted Fully written Response time (μs) IO Size = 4KB

8/25 Flash device behavior (1) Understanding flash behavior [uFLIP, CIDR 2009]  Flash devices (e.g., SSDs) do not behave as flash chips  Flash devices performance is difficult to measure (device state) – Need for an adequate methodology  We proposed a wide benchmark to cover current and future devices.  We also observed a common behavior and deduced design hints – Not true anymore on recent devices! Making assumptions about flash behavior  Consider the behavior of flash chips (embedded context)  Consider the behavior of a given device or of a class of devices

9/25 Flash device behavior (2) What is actually the behavior of flash devices?  Update in place are inefficient?  Random writes are slower than sequential ones?  Better not filling the whole device if we want good performance? ➪ Behavior varies across devices and firmware updates Should we continue running after the flash technology? In this talk, we propose another way to include flash devices in the DBMS landscape

10/25 The Good Flash devices performance is impressive! A single flash chip offers great performance  e.g., 40 MB/s Read, 10 MB/s Write  Random access is as fast as sequential access  Low energy consumption A flash device contains many (e.g., 16, 32) flash chips and provides inter-chips parallelism Flash devices include some (power-failure resistant) cache  e.g., MB of RAM

11/25 The Bad Flash chips have severe constraints! C1: Write granularity:  Writes must be performed at flash page granularity (e.g. 4 KB) C2: Must erase a block (e.g., 64 pages) before rewriting a page C3: Writes must be sequential within a flash block C4: Limited lifetime (from 10 4 up to 10 6 erase operations) Write granularity: a page (4 KB) Writes must be sequential within the block (64 pages) Erase granularity: a block (256 KB)

12/25 The Flash Translation Layer (FTL) emulates a classical block device, handling flash constraints Distribute erase across flash (wear leveling)  Address C4 (limited lifetime) Make out-of-place updates (using reserved flash blocks)  Address C2 (erase before write) and C1 (writes smaller than a page  updates) Maintain a logical to physical address mapping  Necessary for out-of-place updates and wear leveling, address C3 (seq. writes) A garbage collector is necessary! And The FTL

13/25 Logical to physical mapping Beside these two extremes, many techniques were designed, using temporal/spatial locality, caching, detecting “hotness” of data, distinguishing RW and SW, grouping blocks, etc. FTL is a complex piece of software, generally kept secret by flash device manufacturers Block Mapping: Mapping table ( 12 MB for a 1 TB flash ) BlockPage Search for the correct page Page Mapping: Mapping table ( 900 MB for a 1 TB flash Problem Block Page

14/25 FTL designers vs DBMS designers goals Flash device designers goals:  Hide the flash device constraints (usability)  Improve the performance for most common workloads  Make the device auto-adaptive  Mask design decision to protect their advantage (black box approach) DBMS designers goals:  Have a model for IO performance (and behavior) – Predictable – Clear distinction between efficient and inefficient IO patterns ➪ To design the storage model and query processing/optimization strategies  Reach best performance, even at the price of higher complexity (having a full control on actual IOs) These goals are conflicting!

15/25 Minimal FTL: Take the FTL out of equation! FTL provides only wear leveling, using block mapping to address C4 (limited lifetime) Pros  Maximal performance for – SR, RR, SW – Semi-Random Writes  Maximal control for the DBMS Cons  All complexity is handled by the DBMS  All IOs must follow C1-C3 – The whole DBMS must be rewritten – The flash device is dedicated Flash chips Block mapping, Wear Leveling (C4) DBMS Constrained Patterns only (C1, C2, C3) (C1) Write granularity (C2) Erase before write (C3) Sequential writes within a block (C4) Limited lifetime Minimal flash device

16/25 Semi-random writes (uFLIP [CIDR09]) Inter-blocks : Random Intra-block : Sequential Example with 3 blocks of 10 pages: IO address time

17/25 Bimodal FTL: a simple idea … Bimodal Flash Devices:  Provide a tunnel for those IOs that respect constraints C1-C3 ensuring maximal performance  Manage other unconstrained IOs in best effort  Minimize interferences between these two modes of operation Pros  Flexible  Maximal performance and control for the DBMS for constrained IOs Cons  No behavior guarantees for unconstrained IOs. Flash chips Block map., Wear Leveling (C4) DBMS unconstrained constr. patterns patterns (C1, C2, C3) (C1) Write granularity (C2) Erase before write (C3) Sequential writes within a block (C4) Limited lifetime Bimodal flash device Update mgt, Garb. Coll. (C1, C2, C3)

18/25 Bimodal FTL: easy to implement Constrained IOs lead to optimal blocks Optimal blocks can be trivially  mapped using a small map table in safe cache  detected using a flag and cursor in safe cache No interferences! No change to the block device interface:  Need to expose two constants: block size and page size 16 MB for a 1TB device Page 0 Page 1 Page 2 Page 3 Page 4 Page 5 Flag = Optimal CurPos=6 Page 0 Page 1 Page 1’ Page 1’’ Page 0’ Page 2 Flag = Non-Optimal CurPos=6

19/25 Bimodal FTL: better than Minimal + FTL Free (CurPos = 0) Optimal TRIM Garbage collector actions Write ≠ CurPos Write CurPos++ Write CurPos++ TRIM Non optimal Non-optimal block can become optimal (thanks to GC) Page 0’ Page 1’’ Page 2 Flag = Optimal CurPos=3 Page 0 Page 1 Page 1’ Page 1’’ Page 0’ Page 2 Flag = Non-Optimal CurPos=6

20/25 Bimodal FTL does not exist yet! A simple test Device must support TRIM operation  Only recent SSDs Results on Intel X25-M

21/25 Impact on DBMS Design Using bimodal flash devices, we have a solid basis for designing efficient DBMS on flash: What IOs should be constrained?  i.e., what part of the DBMS should be redesigned? How to enforce these constraints? Revisit literature:  Solutions based on flash chip behavior enforce C1-C3 constraints  Solutions based on existing classes of devices might not.

22/25 Example: Hash Join on HDD Tradeoff: IOSize vs Memory consumption IOSize should be as large as possible, e.g., 256KB – 1 MB  To minimize IO cost when writing or reading partitions IOSize should be as small as possible  To minimize memory consumption: One pass partitioning needs 2 x IOSize x NbPartitions in RAM  Insufficient memory  multi-pass  performance degrades! One pass partitioningMulti-pass partitioning (2 passes)

23/25 Hash join on SSD and on bimodal SSD With non bimodal SSDs  No behavior guarantees but…  Choosing IOSize = Block size (128 – 256 KB) should bring good performance With bimodal SSDs  Maximal performance are guaranteed (constrained patterns)  Use semi-random writes  IOSize can be reduced up to page size (2 – 4 KB) with no penalty  Memory savings  Performance improvement

24/25 Conclusion Adding bimodality is necessary to support efficiently DBMS on flash devices  DBMS designer retains control over IO performance  DBMS leverages performance potential of flash chips Adding bimodality to FTL does not hinder competition between flash device manufacturers, they can  bring down the cost of constrained IO patterns (e.g., using parallelism)  bring down the cost of unconstrained IO patterns without jeopardizing DBMS design This study is very preliminary – many issues to explore  More complex storage systems (e.g., RAID, ASM, etc)  What abstraction for flash device? – Memory abstraction (block device interface) – Network abstraction (two systems collaborating)

25/25 More information Bimodal Flash devices: P. Bonnet, L. Bouganim : Flash Device Support for Database Management. 5th Biennial Conference on Innovative Data Systems Research (CIDR), January Benchmark: L. Bouganim, B. Jónsson, P. Bonnet. uFLIP: Understanding Flash IO Patterns, 4th Biennial Conference on Innovative Data Systems Research (CIDR), (Best paper award), January paper award Energy consumption: M. Bjørling, P. Bonnet, L. Bouganim, Björn Þór Jónsson, uFLIP: Understanding the Energy Consumption of Flash Devices, IEEE Data Engineering Bulletin, vol. 33, n°4, December Demonstration: M. Bjørling, L. Le Folgoc, A. Mseddi, P. Bonnet, L. Bouganim, Björn Þór Jónsson, Performing Sound Flash Device Measurements: The uFLIP Experience, 29th ACM International Conference on Management of Data (ACM SIGMOD), June Web Sites: Authors: