Repairing Write Performance on Flash Devices

Slides:



Advertisements
Similar presentations
Solid-state storage & DBMS CIDR 2013 Manos Athanassoulis 1.
Advertisements

1 uFLIP: Understanding Flash IO Patterns Luc Bouganim, INRIA Rocquencourt, France Philippe Bonnet, DIKU Copenhagen, Denmark Björn Þór Jónsson, RU Reykjavík,
Challenges in Getting Flash Drives Closer to CPU Myoungsoo Jung (UT-Dallas) Mahmut Kandemir (PSU) The University of Texas at Dallas.
Query Processing and Optimizing on SSDs Flash Group Qingling Cao
Snapshots in a Flash with ioSnap TM Sriram Subramanian, Swami Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau Copyright © 2014.
Myoungsoo Jung (UT Dallas) Mahmut Kandemir (PSU)
FlashVM: Virtual Memory Management on Flash Mohit Saxena and Michael M. Swift Introduction Flash storage is the largest change to memory and storage systems.
International Conference on Supercomputing June 12, 2009
Boost Write Performance for DBMS on Solid State Drive Yu LI.
DB system design for new hardware and sciences Anastasia Ailamaki École Polytechnique Fédérale de Lausanne and Carnegie Mellon University.
Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Hystor : Making the Best Use of Solid State Drivers in High Performance Storage Systems Presenter : Dong Chang.
Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…
A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions Youyou Lu 1, Jiwu Shu 1, Jia Guo 1, Shuai Li 1, Onur Mutlu 2 LightTx:
Tape is Dead Disk is Tape Flash is Disk RAM Locality is King Jim Gray Microsoft December 2006 Presented at CIDR2007 Gong Show
Understanding Intrinsic Characteristics and System Implications of Flash Memory based Solid State Drives Feng Chen, David A. Koufaty, and Xiaodong Zhang.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
Origianal Work Of Hyojun Kim and Seongjun Ahn
Logging in Flash-based Database Systems Lu Zeping
Fragmentation in Large Object Repositories Russell Sears Catharine van Ingen CIDR 2007 This work was performed at Microsoft Research San Francisco with.
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS.
Embedded System Lab. Jung Young Jin The Design and Implementation of a Log-Structured File System D. Ma, J. Feng, and G. Li. LazyFTL:
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
1/25 Flash Device Support for Database Management Luc Bouganim, INRIA, Paris – Rocquencourt, France Philippe Bonnet, ITU Copenhagen, Denmark CIDR 2011.
Operating Systems: Wrap-Up Questions answered in this lecture: What is an Operating System? Why are operating systems so interesting? What techniques can.
연세대학교 Yonsei University Data Processing Systems for Solid State Drive Yonsei University Mincheol Shin
A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions Youyou Lu 1, Jiwu Shu 1, Jia Guo 1, Shuai Li 1, Onur Mutlu 2 LightTx:
The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection.
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.
Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016.
Application-Managed Flash
 The emerged flash-memory based solid state drives (SSDs) have rapidly replaced the traditional hard disk drives (HDDs) in many applications.  Characteristics.
The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection.
Lecture 17 Raid. Device Protocol Variants Status checks: polling vs. interrupts Data: PIO vs. DMA Control: special instructions vs. memory-mapped I/O.
Internal Parallelism of Flash Memory-Based Solid-State Drives
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
CS 540 Database Management Systems
Join Processing for Flash SSDs: Remembering Past Lessons
Reducing OLTP Instruction Misses with Thread Migration
Seth Pugsley, Jeffrey Jestes,
Parallel-DFTL: A Flash Translation Layer that Exploits Internal Parallelism in Solid State Drives Wei Xie1 , Yong Chen1 and Philip C. Roth2 1. Texas Tech.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
Using non-volatile memory (NVDIMM-N) as block storage in Windows Server 2016 Tobias Klima Program Manager.
FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs Scribed by Vinh Ha.
Scaling the Memory Power Wall with DRAM-Aware Data Management
HPE Persistent Memory Microsoft Ignite 2017
An Adaptive Data Separation Aware FTL for Improving the Garbage Collection Efficiency of Solid State Drives Wei Xie and Yong Chen Texas Tech University.
Operating Systems ECE344 Lecture 11: SSD Ding Yuan
Join Processing for Flash SSDs: Remembering Past Lessons
Lecture 11: DMBS Internals
The Composite-File File System: Decoupling the One-to-one Mapping of Files and Metadata for Better Performance Shuanglong Zhang, Helen Catanese, Andy An-I.
HashKV: Enabling Efficient Updates in KV Storage via Hashing
Lecture 9: Data Storage and IO Models
HyperLoop: Group-Based NIC Offloading to Accelerate Replicated Transactions in Multi-tenant Storage Systems Daehyeok Kim Amirsaman Memaripour, Anirudh.
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
Tape is Dead Disk is Tape Flash is Disk RAM Locality is King
PARAMETER-AWARE I/O MANAGEMENT FOR SOLID STATE DISKS
Parallel Garbage Collection in Solid State Drives (SSDs)
13.3 Accelerating Access to Secondary Storage
Device Mgmt © 2004, D. J. Foreman.
Optimizing NAND Flash-Based SSDs via Retention Relaxation Ren-Shou Liu, Chia-Lin Yang, Wei Wu National Taiwan University and Intel Corporation USENIX.
Device Mgmt © 2004, D. J. Foreman.
COS 518: Advanced Computer Systems Lecture 9 Michael Freedman
CS 295: Modern Systems Storage Technologies Introduction
Tape is Dead Disk is Tape Flash is Disk RAM Locality is King
Dong Hyun Kang, Changwoo Min, Young Ik Eom
Presentation transcript:

Repairing Write Performance on Flash Devices Radu Stoica‡, Manos Athanassoulis‡, Ryan Johnson‡§, Anastasia Ailamaki‡ ‡Ecole Polytechnique Fédérale de Lausanne §Carnegie Mellon

Tape is Dead, Disk is Tape, Flash is Disk* Slowly replacing HDDs (price , capacity ) Fast, reliable, efficient Potentially huge impact Slow random write Read/write asymmetry -> not a HDD drop-in replacement *Jim Gray, CIDR 2007

DBMS I/O today Inadequate device abstraction Request DBMS Data requirements HDD optimized I/O pattern Block Device API Flash optimized I/O pattern Flash device Flash memory access Inadequate device abstraction Flash devices are not HDD drop-in replacements

Random Writes – Fusion ioDrive Microbenchmark – 8 kiB random writes Time(hours) Throughput (MiB/s) 100 200 300 150 50 250 350 5 10 15 20 Average over 1s Moving average 80000 80200 10 20 94% performance drop Unpredictability

Stabilizing Random Writes Change data placement Flash friendly I/O pattern Avoid all random writes Minimal changes to database engine 6-9x speedup for OLTP-like access patterns

Overview Random Write: how big of a problem? Random Write: why still a problem? Append-Pack Data Placement Experimental results

No solution for OLTP workloads Related work Request DBMS Data requirements Flash-opt. DB Algs. HDD optimized I/O pattern Data placement Block Device API Flash FS Flash optimized I/O pattern Flash device FTL Flash memory access No solution for OLTP workloads

Random Write – Other devices Vendor advertised performance Rand. Write « Rand. Read 0.1 1 10 100 5000 10000 IO number Response time (ms) rt 13000 Pause length Seq. Reads Random Writes IO number Mtron SSD Rand. Write causes unpredictability *Graph from uFlip, Bouganim et al. CIDR 2009

Random Writes – Fusion ioDrive Microbenchmark – 8 kiB random writes Time(hours) Throughput (MiB/s) 100 200 300 150 50 250 350 5 10 15 20 Average over 1s Moving average

Sequential Writes – Fusion ioDrive Microbenchmark – 128kiB sequential write Throughput (MiB/s) Time(s) Seq. Writing: Good & Stable Performance

Idea – Change Data Placement Flash friendly I/O pattern Avoid all Random Writes Write in big chunks Tradeoffs – additional work: Give up seq. reads (SR and RR similar performance) More seq. writing Other overheads

Overview Random Write: how big of a problem? Random Write: why still a problem? Append-Pack Data Placement Theoretical model Experimental results

Append-Pack Algorithm Update page Update page Update page No more space Write hot dataset Write seq. Reclaim space No in-place updates Filter cold pages Write cold dataset Reclaim space Log start Valid page Log end Invalid page How much additional work?

Theoretical Page Reclaiming Overhead Update pages uniformly Equal prob. to replace a page # valid pages? α = sizeof (disk) sizeof (hotset) prob(valid) = f (α) → e -α α Worst case: 36% Easily achievable: 6-11%

Theoretical Speedup Up to 7x speedup Traditional Random Write I/O latency: TRW New latency: TSW+prob(valid)∙(TRR + TSW) Conservative assumption: TRW = 10∙TSW α = sizeof(device) / sizeof(data) Up to 7x speedup

Overview RW: how big of a problem? RW: why still a problem? Append-Pack Data Layout Experimental results

Experimental setup 4x Quad-core Opteron X86_64-linux v2.6.18 Fusion ioDrive 160GB PCIe 8 kiB I/Os, Direct I/O Parallel threads ≥ 16 Firmware runs on host Append-Pack implemented as shim library

OLTP microbenchmark Microbenchmark – 50% Rand Write / 50% Rand Read 1000 100 200 300 400 500 Throughput (MiB/s) Append-Pack 3000 4000 3000 4000 Average over 1s Moving average 1000 FTL? 9x improvement Time (s)

OLTP Microbenchmark Overview Performance better than predicted

What to remember Flash ≠ HDD We leverage Sequential Writing to avoid Random Writing Random Reading as good as Sequential Reading Append-pack – eliminate Random Writes 6-9x speedup

Thank you! http://dias.epfl.ch

Backup

FTLs Fully-associative sector translation [Lee et al. ’07] Superblock FTL [Kang et el. ‘06] Locality-Aware Sector Translation [Lee et al. ‘08] No solution for all workloads: Static tradeoffs & workload independence Lack of semantic knowledge Wrong I/O patterns -> complicated software layers destroy predictability

Other Flash Devices - Backup RR (IOPS) RW (IOPS) SW (MB/s) SR (MB/s) Intel x25-E 35,000 3,300 170 250 Memoright GT 10,000 500 130 120 Solidware 1,000 110 Fusion ioDrive 116,046 93,199 (75/25 mix) 750 670 Vendor advertised performance

Experimental Results - Backup RR/RW Baseline Append/Pack Speedup Prediction 50/50 38 MiB/s 349 MiB/s 9.1 6.2 75/25 48 MiB/s 397 MiB/s 8.3 4.3 90/10 131 MiB/s 541 MiB/s 4.1 2.5 (α = 2 in all experiments)

OLTP microbenchmark - Backup 50% RW/50% RR - before 50% RW/50% RR - after

OLTP Microbenchmark - Backup Traditional I/O

OLTP Microbenchmark - Backup Append-Pack