Design of Flash-Based DBMS: An In-Page Logging Approach

Slides:



Advertisements
Similar presentations
Storing Data: Disk Organization and I/O
Advertisements

Flash storage memory and Design Trade offs for SSD performance
A New Cache Management Approach for Transaction Processing on Flash-based Database Da Zhou
Trading Flash Translation Layer For Performance and Lifetime
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
International Conference on Supercomputing June 12, 2009
Boost Write Performance for DBMS on Solid State Drive Yu LI.
Avishai Wool lecture Introduction to Systems Programming Lecture 8.3 Non-volatile Memory Flash.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Lecture 11: DMBS Internals
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
Chapter 10 Storage and File Structure Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Origianal Work Of Hyojun Kim and Seongjun Ahn
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
KOCSEA’09, Las Vegas, December COMPUTER SCIENCE DEPARTMENT Flash Memory Database Systems and In-Page Logging Bongki Moon Department of Computer.
Logging in Flash-based Database Systems Lu Zeping
Speaker: 吳晋賢 (Chin-Hsien Wu) Embedded Computing and Applications Lab Department of Electronic Engineering National Taiwan University of Science and Technology,
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
2010 IEEE ICECS - Athens, Greece, December1 Using Flash memories as SIMO channels for extending the lifetime of Solid-State Drives Maria Varsamou.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
A Case for Flash Memory SSD in Enterprise Database Applications Authors: Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, Sang-Woo Kim Published.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
Resolving Journaling of Journal Anomaly in Android I/O: Multi-Version B-tree with Lazy Split Wook-Hee Kim 1, Beomseok Nam 1, Dongil Park 2, Youjip Won.
DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS.
Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee and Bongki Moon Presented by Chris Homan.
Embedded System Lab. Jung Young Jin The Design and Implementation of a Log-Structured File System D. Ma, J. Feng, and G. Li. LazyFTL:
Flash-Based DBMS Pao-Shih Huang and Li Zhong, October 29th, 2013 [1]: Design of Flash-Based DBMS: An In-Page Logging Approach, Sang-Won Lee and Bongki.
Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee and Bongki Moon Presented by RuBao Li, Zinan Li.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CS333 Intro to Operating Systems Jonathan Walpole.
연세대학교 Yonsei University Data Processing Systems for Solid State Drive Yonsei University Mincheol Shin
A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions Youyou Lu 1, Jiwu Shu 1, Jia Guo 1, Shuai Li 1, Onur Mutlu 2 LightTx:
1 Virtual Memory. Cache memory: provides illusion of very high speed Virtual memory: provides illusion of very large size Main memory: reasonable cost,
Lecture 22 SSD. LFS review Good for …? Bad for …? How to write in LFS? How to read in LFS?
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
대용량 플래시 SSD의 시스템 구성, 핵심기술 및 기술동향
XIP – eXecute In Place Jiyong Park. 2 Contents Flash Memory How to Use Flash Memory Flash Translation Layers (Traditional) JFFS JFFS2 eXecute.
1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
 The emerged flash-memory based solid state drives (SSDs) have rapidly replaced the traditional hard disk drives (HDDs) in many applications.  Characteristics.
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
Jonathan Walpole Computer Science Portland State University
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
File Processing : Storage Media
Lecture 11: DMBS Internals
Lecture 9: Data Storage and IO Models
Disk Storage, Basic File Structures, and Buffer Management
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
File Processing : Storage Media
Andy Wang Operating Systems COP 4610 / CGS 5765
PARAMETER-AWARE I/O MANAGEMENT FOR SOLID STATE DISKS
Contents Memory types & memory hierarchy Virtual memory (VM)
Virtual Memory: Working Sets
COS 518: Advanced Computer Systems Lecture 9 Michael Freedman
Sarah Diesburg Operating Systems CS 3430
Sarah Diesburg Operating Systems COP 4610
Introduction to Operating Systems
Dong Hyun Kang, Changwoo Min, Young Ik Eom
The Design and Implementation of a Log-Structured File System
Presentation transcript:

Design of Flash-Based DBMS: An In-Page Logging Approach SIGMOD’07 Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee School of Info & Comm Eng Sungkyunkwan University Suwon, Korea 440-746 wonlee@ece.skku.ac.kr Bongki Moon Department of Computer Science University of Arizona Tucson, AZ 85721, U.S.A. bkmoon@cs.arizona.edu COMPUTER SCIENCE DEPARTMENT

Outline Flash memory Disk-Based DBMS on Flash Memory Flash-Based DBMS: In-Paging Logging approach Reviews COMPUTER SCIENCE DEPARTMENT

Flash Memory Flash memory is a type of electrically-erasable programmable read-only memory (EEPROM) Page is the unit of read and write operations Typical value: 2KB Write operation can only clear bits (change their value from 1 to 0). The only way to change value from 0 to 1 is erase an entire region memory. This region has fixed-size, called erase units, erase block or just block. Typical value: 128KB for large flash memory COMPUTER SCIENCE DEPARTMENT

Characteristics of Flash No in-place update the data item need to be erased first before writing it again. An erase unit (16KB or 128 KB) is much larger than a sector (512 bytes). No mechanical latency Flash memory is an electronic device without moving parts Provides uniform random access speed without seek/rotational latency Asymmetric read & write speed Read speed is typically at least twice faster than write speed Write (and erase) optimization is critical COMPUTER SCIENCE DEPARTMENT

Magnetic Disk vs Flash Memory Read time Write time Erase time Magnetic Disk 12.7 msec 13.7 msec N/A NAND Flash 80 sec 200 sec 1.5 msec Magnetic Disk : Seagate Barracuda 7200.7 ST380011A NAND Flash : Samsung K9WAG08U1A 16 Gbits SLC NAND Unit of read/write: 2KB, Unit of erase: 128KB COMPUTER SCIENCE DEPARTMENT

Category of Flash Memory NAND Vs. NOR Flash NOR: high erase cost (several seconds), directly addressable (access is by bit or byte) NAND: relative low erase cost (several ms), access is by pages MLC Vs. SLC NAND Flash MLC (Multiple Level Cell): it stores multiple bits per cell, but significantly slower read and write speeds; 10x lower read/write lifetime SLC (Single Level Cell): it stores only one single bit per cell SLC flash has much better performance, lifetime, and reliability properties than MLC COMPUTER SCIENCE DEPARTMENT

Small flash Vs Large flash Small flash memory was widely used for PDA, MP3, mobile phone, sensor network… Advantages: size, weight, shock resistance, power consumption, noise … Typical size: a few gigabytes Recently, some vendors developed large flash memory called Flash SSD (Solid State Disk) Mainly used for notebook PC. Apple AirBook / Thinkpad X300 Typical size: > 16G COMPUTER SCIENCE DEPARTMENT

NAND flash system architecture Flash Translation Layer (FTL): software layer to make NAND flash fully emulate magnetic disks. Logical-to-physical mapping Garbage collection Power-off recovery Wear-leveling Bad block management Error correction code (ECC) Power management COMPUTER SCIENCE DEPARTMENT

Different FTLs for Large and Small Flash Memory Page-mapping FTL (used for small flash memory) maintains the mapping information between the logical page and the physical page separately Log-structured achitecture Large memory for its mapping information must be reconstructed by scanning the whole flash memory at start-up, and this may result in long mount time Block-mapping FTL Small memory for its mapping information Any update causes a whole block rewrite (that is why random writes are so slow!) In real production, there are some optimizations for improving concentrated updates COMPUTER SCIENCE DEPARTMENT

Flash memory for server application More recently, because of the advantages of flash memory and the increasing capacity, there is a new trend that use large flash memory for database server application Jim Gray said: Tape is Dead, Disk is Tape, Flash is Disk! COMPUTER SCIENCE DEPARTMENT

Outline Disk-Based DBMS on Flash Memory Flash memory Flash-Based DBMS: In-Paging Logging approach My reviews COMPUTER SCIENCE DEPARTMENT

Disk-Based DBMS on Flash Memory What happens if disk-based DBMS runs on Flash memory? Due to No In-place Update, it writes the whole block into another clean block Consume free blocks quickly causing frequent garbage collection and erase SQL: Update / Insert / Delete Update Buffer Mgr. Page : 4KB Erase Unit: 128KB Dirty Block Write Flash Memory Data Block Area COMPUTER SCIENCE DEPARTMENT

Disk-Based DBMS Performance Run SQL queries on a commercial DBMS Sequential scan or update of a table Non-sequential read or update of a table (via B-tree index) Experimental settings Storage: Magnetic disk vs M-Tron SSD (Samsung flash chip) Data page of 8KB 10 tuples per page, 640,000 tuples in a table (64,000 pages, 512MB) COMPUTER SCIENCE DEPARTMENT

Disk-Based DBMS Performance Read performance : The result is not surprising at all Disk Flash Sequential 14.0 sec 11.0 sec Non-sequential 61.1 ~ 172.0 sec 12.1 ~ 13.1 sec Hard disk Read performance is poor for non-sequential accesses, mainly because of seek and rotational latency Flash memory Read performance is insensitive to access patterns COMPUTER SCIENCE DEPARTMENT

Disk-Based DBMS Performance Write performance Disk Flash Sequential 34.0 sec 26.0 sec Non-sequential 151.9 ~ 340.7 sec 61.8 ~ 369.9 sec Hard disk Write performance is poor for non-sequential accesses, mainly because of seek and rotational latency Flash memory Write performance is poor (worse than disk) for non-sequential accesses due to out-of-place update and erase operations Demonstrate the need of write optimization for DBMS running on Flash COMPUTER SCIENCE DEPARTMENT

Outline Flash-Based DBMS: In-Paging Logging approach Flash memory Disk-Based DBMS on Flash Memory Flash-Based DBMS: In-Paging Logging approach My reviews COMPUTER SCIENCE DEPARTMENT

In-Page Logging (IPL) Approach Design Principles Take advantage of the characteristics of flash memory Fast read speed Overcome the “erase-before-write” limitation of flash memory Minimize the changes to the DBMS architecture Limited to buffer manager and storage manager COMPUTER SCIENCE DEPARTMENT

Design of the IPL Logging on Per-Page basis in both Memory and Flash An In-memory log sector can be associated with a buffer frame in memory Allocated on demand when a page becomes dirty An In-flash log segment is allocated in each erase unit update-in-place Database Buffer in-memory data page (8KB) in-memory log sector (512B) Flash Memory 15 data pages (8KB each) Erase unit: 128KB log area (8KB): 16 sectors …. …. The log area is shared by all the data pages in an erase unit COMPUTER SCIENCE DEPARTMENT

Update / Insert / Delete IPL Write Whenever an update is performed on a data page, the in-memory copy of data page is updated immediately. In addition, IPL buffer manager adds a log record to the in-memory log sector When a dirty page is evicted by replacement policy or the in-memory log sector is full, the content of data page is not written to flash memory. Instead, In-memory log sector is written to the in-flash log segment Update / Insert / Delete update-in-place Sector : 512B physiological log Page : 8KB Buffer Mgr. Block : 128KB Flash Memory Data Block Area COMPUTER SCIENCE DEPARTMENT

IPL Read When a page is read from flash, the current version is computed on the fly Pi Apply the “physiological action” to the copy read from Flash (CPU overhead) Buffer Mgr. Re-construct the current in-memory copy Read from Flash Original copy of Pi All log records belonging to Pi (IO overhead) data area (120KB): 15 pages Flash Memory log area (8KB): 16 sectors COMPUTER SCIENCE DEPARTMENT

IPL Merge When all free log sectors in an erase unit are consumed Log records are applied to the corresponding data pages The current data pages are copied into a new erase unit A Physical Flash Block log area (8KB): 16 sectors Bold Bnew clean log area 15 up-to-date data pages Merge COMPUTER SCIENCE DEPARTMENT

Why IPL can improve write performance of DBMS? The number of disk writes doesn’t decrease Actually, #writes may increase because: It introduces excess disk writes if the log sector is full The merge operation introduces overhead Then why can IPL improve write performance? IPL overcomes the erase-before-write property of flash Reduces the number of erasures COMPUTER SCIENCE DEPARTMENT

IPL Simulation with TPC-C TPC-C Log Data Generation Run a commercial DBMS to generate reference streams of TPC-C benchmark HammerOra utility used for TPC-C workload generation Each trace contains log records of physiological updates as well as physical page writes Average length of a log record: 20 ~ 50B TPC-C Traces 100M.20M.10u: 100MB DB, 20 MB buffer, 10 simulated users 1G.20M.100u: 1GB DB, 20 MB buffer, 100 simulated users 1G.40M.100u: 1GB DB, 40 MB buffer, 100 simulated users Parameter setting Write (2KB): 200 us Merge (128KB): 20 ms COMPUTER SCIENCE DEPARTMENT

Log Segment Size vs Merges TPC-C Write frequencies are highly skewed (and low temporal locality) Erase units containing hot pages consume log sectors quickly Could cause a large number of erase operations More storage but less frequent merges with more log sectors COMPUTER SCIENCE DEPARTMENT 24

Estimated Write Performance Performance trend with varying buffer sizes The size of log segment was fixed at 8KB Estimated write time With IPL = (# of sector writes) × 200us + (# of merges) × 20ms Without IPL =  × (# of page writes) × 20ms  is the probability that a page write causes erase operation COMPUTER SCIENCE DEPARTMENT

Support for Recovery IPL helps realize a lean recovery mechanism Additional logging: transaction log and list of dirty pages Transaction Commit Similarly to flushing log tail An in-memory log sector is forced out to flash if it contains at least one log record of a committing transaction No explicit REDO action required at system restart Transaction Abort De-apply the log records of an aborting transaction Use selective merge instead of regular merge, because it’s irreversible If committed, merge the log record If aborted, discard the log record If active, carry over the log record to a new erase unit To avoid a thrashing behavior, allow an erase unit to have overflow log sectors No explicit UNDO action required COMPUTER SCIENCE DEPARTMENT

Conclusion Clear and present evidence that Flash can replace Disk IPL approach demonstrates its potential for TPC-C type database applications by Overcoming the “erase-before-write” limitation Exploiting the fast and uniform random access IPL also helps realize a lean recovery mechanism COMPUTER SCIENCE DEPARTMENT

Outline Reviews Flash memory Disk-Based DBMS on Flash Memory Flash-Based DBMS: In-Paging Logging approach Reviews COMPUTER SCIENCE DEPARTMENT

Reviews IPL hurts read performance No Experiment Result Simulation For each read operation, it has to read data page and log sector page Read performance will be about 2X slower No Experiment Result The authors only give the result through the I/O access simulation Simulation The data size of simulation is too small (1G). Didn’t show the overall performance of TPC-C. (most operations in TPC-C are read operations) COMPUTER SCIENCE DEPARTMENT

Any Questions? Q & A COMPUTER SCIENCE DEPARTMENT