연세대학교 Yonsei University Data Processing Systems for Solid State Drive Yonsei University Mincheol Shin 2015.11.23.

Slides:



Advertisements
Similar presentations
Tuning the Dennis Shasha and Philippe Bonnet, 2013.
Advertisements

Solid State Drive. Advantages Reliability in portable environments and no noise No moving parts Faster start up Does not need spin up Extremely low.
Paper by: Yu Li, Jianliang Xu, Byron Choi, and Haibo Hu Department of Computer Science Hong Kong Baptist University Slides and Presentation By: Justin.
Flash storage memory and Design Trade offs for SSD performance
Query Processing and Optimizing on SSDs Flash Group Qingling Cao
International Conference on Supercomputing June 12, 2009
Boost Write Performance for DBMS on Solid State Drive Yu LI.
Computer ArchitectureFall 2008 © November 12, 2007 Nael Abu-Ghazaleh Lecture 24 Disk IO.
Cse Feb-001 CSE 451 Section February 24, 2000 Project 3 – VM.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…
SQL Server 2008 & Solid State Drives Jon Reade SQL Server Consultant SQL Server 2008 MCITP, MCTS Co-founder SQLServerClub.com, SSC
Just a really fast drive Jakub Topič, I3.B
Operating Systems CMPSC 473 I/O Management (2) December Lecture 24 Instructor: Bhuvan Urgaonkar.
Redundant Array of Inexpensive Disks (RAID). Redundant Arrays of Disks Files are "striped" across multiple spindles Redundancy yields high data availability.
Understanding Intrinsic Characteristics and System Implications of Flash Memory based Solid State Drives Feng Chen, David A. Koufaty, and Xiaodong Zhang.
Lecture 11: DMBS Internals
C-Store: Column Stores over Solid State Drives Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 19, 2009.
Flash research report Da Zhou Outline Query Processing Techniques for Solid St ate Drives (Research Paper) Join Processing for Flash SSDs: Rememb.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
Chapter 10 Storage and File Structure Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Origianal Work Of Hyojun Kim and Seongjun Ahn
Logging in Flash-based Database Systems Lu Zeping
SOLID STATE DRIVES By: Vaibhav Talwar UE84071 EEE(5th Sem)
Speaker: 吳晋賢 (Chin-Hsien Wu) Embedded Computing and Applications Lab Department of Electronic Engineering National Taiwan University of Science and Technology,
A Case for Flash Memory SSD in Enterprise Database Applications Authors: Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, Sang-Woo Kim Published.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
Resolving Journaling of Journal Anomaly in Android I/O: Multi-Version B-tree with Lazy Split Wook-Hee Kim 1, Beomseok Nam 1, Dongil Park 2, Youjip Won.
DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS.
Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee and Bongki Moon Presented by Chris Homan.
Embedded System Lab. Jung Young Jin The Design and Implementation of a Log-Structured File System D. Ma, J. Feng, and G. Li. LazyFTL:
Wei-Shen, Hsu 2013 IEE5011 –Autumn 2013 Memory Systems Solid State Drive with Flash Memory Wei-Shen, Hsu Department of Electronics Engineering National.
1/25 Flash Device Support for Database Management Luc Bouganim, INRIA, Paris – Rocquencourt, France Philippe Bonnet, ITU Copenhagen, Denmark CIDR 2011.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
PROBLEM STATEMENT A solid-state drive (SSD) is a non-volatile storage device that uses flash memory rather than a magnetic disk to store data. SSDs provide.
A Semi-Preemptive Garbage Collector for Solid State Drives
A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions Youyou Lu 1, Jiwu Shu 1, Jia Guo 1, Shuai Li 1, Onur Mutlu 2 LightTx:
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.
1 Design Issues of Flash-based SSD& Hybrid Disks Han-Lin Li Dept. Computer Science and Information Engineering National Taiwan University Advisor: Prof.
Sungkyunkwan University Sector Level Mappinng FTL Computer engineering, Sungkyunkwan Univ. Oh Gihwan, Han Gyuhwa, Hong Gyeonghwan Jasmine Open-SSD Project.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.
대용량 플래시 SSD의 시스템 구성, 핵심기술 및 기술동향
XIP – eXecute In Place Jiyong Park. 2 Contents Flash Memory How to Use Flash Memory Flash Translation Layers (Traditional) JFFS JFFS2 eXecute.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
 The emerged flash-memory based solid state drives (SSDs) have rapidly replaced the traditional hard disk drives (HDDs) in many applications.  Characteristics.
DATABASE OPERATORS AND SOLID STATE DRIVES Geetali Tyagi ( ) Mahima Malik ( ) Shrey Gupta ( ) Vedanshi Kataria ( )
1 Paolo Bianco Storage Architect Sun Microsystems An overview on Hybrid Storage Technologies.
- History and Motivations
CS422 Principles of Database Systems Disk Access Chengyu Sun California State University, Los Angeles.
Internal Parallelism of Flash Memory-Based Solid-State Drives
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
Parallel-DFTL: A Flash Translation Layer that Exploits Internal Parallelism in Solid State Drives Wei Xie1 , Yong Chen1 and Philip C. Roth2 1. Texas Tech.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
Lecture 11: DMBS Internals
Repairing Write Performance on Flash Devices
Lecture 9: Data Storage and IO Models
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
PARAMETER-AWARE I/O MANAGEMENT FOR SOLID STATE DISKS
Parallel Garbage Collection in Solid State Drives (SSDs)
Lecture 11: Flash Memory and File System Abstraction
COS 518: Advanced Computer Systems Lecture 9 Michael Freedman
CS 295: Modern Systems Storage Technologies Introduction
Design Tradeoffs for SSD Performance
Presentation transcript:

연세대학교 Yonsei University Data Processing Systems for Solid State Drive Yonsei University Mincheol Shin

Overview Main Target : Data Processing Systems with SSD Purpose : Improving I/O Performance Data Processing System – Relational Database Management System e.g. Oracle, MySQL, PostgreSQL, SQLite – Distributed Data Processing System e.g. Hadoop Distributed File System, MapReduce, Hive, Hbase, Tajo, Spark – Key-value Store e.g. Redis

Outline Solid State Drive (SSD) RDBMS on Solid State Drive Big Data Processing for Solid State Drive

Solid State Drive: Flash Memory [VLDB2011Tut2] Great Performance !! – High I/O Performance: 41 MB/s Read, 7.5 MB/s Program [Micron 2014] – Fast Random Access: Under 0.1 ms (HDD: 2.9 to 12 ms) – Low Energy Consumption Four Constraints of NAND Flash Memory – C1: Program granularity (2KB~16KB) – C2: Must erase a block before updating a page (256KB ~ 1MB) – C3: Pages must be programmed sequentially within a block – C4: Limited lifetime (10 4 ~ 10 5 ) 4k Page A Erase Block (1 MB) [VLDB2011Tut2] P. Bonnet, L. Bouganim, I. Koltsidas, S. D. Viglas, VLDB 2011 Tutorial: System Co-Design and Data management for Flash Devices

Solid State Drive Solid State Drive (SSD) – Definition: Persistent data storage without disks nor a drive motor. – Support Traditional Block I/O Characteristics for SSD – Fast Random Access (inherited from flash memory) – Read/Write Imbalance (inherited from flash memory) – Exploiting Internal Parallelism (SSD internal structure) – In-Storage Processing SSD Host I/F (SATA, SAS, PCIE) Read(addr) Write(addr, data) Internal Algorithm (FTL) Map ping Wear leveling Garbage Collection Physical Storage Flash Chips Read Program Erase

Solid State Drive: Flash Translation Layer (FTL) Flash Translation Layer – Convert the block I/O operations to internal operations – Three Major Components Mapping – Map Logical Block Address(LBA) to physical page Garbage Collection Wear Leveling – To extend lifetime of SSD Logical Physical Block 1 Block 2Block 3Block 4 Update vvvvIIvIvv Block 2Block 3Block 4 vvvvIIIIvvv Block 2Block 3Block 4 Erase

Solid State Drive: Internal Parallelism SSD can read/write the data in parallel SSD Host I/F (SATA, SAS, PCIE) Flash Package Channel-level Parallelism (N Parallel Channels) Package-level parallelism (Interleaving) Memory Time Read 1 Transfer 1 Read 3 Transfer 3 Read 5 Transfer 5 Read 7 Transfer 7 Read 2 Transfer 2 Read 4 Transfer 4 Read 6 Transfer 6 Read 8 Transfer 8 Package 1 (Ch. 1) Package 2 (Ch. 1) Package 3 (Ch. 2) Package 4 (Ch. 2) Channel 1 Channel 2 Data 2Data 4Data 6Data 8 Data 1Data 3Data 5Data 7

Solid State Drive: Internal Parallelism Using internal parallelism, SSD achieves – High performance for sequential I/O Similar to Striping (RAID 0) Seq. bw for SATA SSD – Write : 450 MB/s – Read : 500 MB/s – High performance for concurrent I/O [VLDB2012Roh] H. Roh, S. Park, S. Kim, M. Shin, S-W. Lee, B+-tree index optimization by exploiting internal parallelism of flash-based Solid State Drives

Solid State Drive: In-Storage Processing SSD has CPU and Memory for FTL Host Interface is bottleneck ! – H/I has lower bandwidth than internal bandwidth of SSD Two approaches – Light-weight filter in SSD Transfer smaller data through H/F Filter tuples using predicates – Sub-modules in SSD e.g. Transaction management with COW Need special SSD to implement ISP – OpenSSD, SmartSSD and so or

DBMS on Solid State Drive Main research areas: – Buffer Management – Index Management – Query Processing – Transaction Management Most of researches using SSDs focused on storage I/O

DBMS on Solid State Drive: Index Management FD-tree – Exploit sequential bandwidths of SSDs – B-Tree + sorted runs PIO B-tree – Exploit internal parallelism of SSDs – Access to multiple B-tree node along multiple paths

DBMS on Solid State Drive: Query Processing FlashJoin: PAX based query processing – NSM layout Most typical page layout Tuples are stored in a contiguous region – PAX layout Values of columns are stored in contiguous region (minipage) Originally, PAX is designed for reducing cache miss in CPU cache – FlashScan reads only needed minipages – FlashJoin joins minipages read by flashScan

DBMS on Solid State Drive: Query Processing FMSort – Exploit internal parallelism of SSD – During merge phase,

DBMS on Solid State Drive: Transaction Mgmt. X-FTL: Shadow Paging in SSD – Writing operations of SSD is similar to Copy-on-write When a page is updated, the modified page is written to an empty page. And then, invalidate old page – X-FTL maintains old pages until transaction is committed. – There is no copying the original pages

Big Data on Solid State Drive 3 approaches to improve performance using SSDs – Complete replacement Higher cost per capacity – Selective replacement e.g. intermediate results on SSDs, HDFS data on HDDs – SSD as a cache Commercial/Noncommercial cache SW exist Open source : bcache, flashcache, enhanced IO, DM-cache Project with SK Telecom Archival Storage of HDFS – Store replica into 4 tiers of storage ARHIVE : slowest and biggest capacity storage (petabyte of storage) DISK, SSD, RAM_DISK hdfs/ArchivalStorage.html#Storage_Types:_ARCHIVE_DISK_SSD_and_RAM_DISK hdfs/ArchivalStorage.html#Storage_Types:_ARCHIVE_DISK_SSD_and_RAM_DISK Issues – Industry leads Big Data processing platform area – There is no standard model – Because CPU overhead are too high