Repairing Write Performance on Flash Devices Radu Stoica‡, Manos Athanassoulis‡, Ryan Johnson‡§, Anastasia Ailamaki‡ ‡Ecole Polytechnique Fédérale de Lausanne §Carnegie Mellon
Tape is Dead, Disk is Tape, Flash is Disk* Slowly replacing HDDs (price , capacity ) Fast, reliable, efficient Potentially huge impact Slow random write Read/write asymmetry -> not a HDD drop-in replacement *Jim Gray, CIDR 2007
DBMS I/O today Inadequate device abstraction Request DBMS Data requirements HDD optimized I/O pattern Block Device API Flash optimized I/O pattern Flash device Flash memory access Inadequate device abstraction Flash devices are not HDD drop-in replacements
Random Writes – Fusion ioDrive Microbenchmark – 8 kiB random writes Time(hours) Throughput (MiB/s) 100 200 300 150 50 250 350 5 10 15 20 Average over 1s Moving average 80000 80200 10 20 94% performance drop Unpredictability
Stabilizing Random Writes Change data placement Flash friendly I/O pattern Avoid all random writes Minimal changes to database engine 6-9x speedup for OLTP-like access patterns
Overview Random Write: how big of a problem? Random Write: why still a problem? Append-Pack Data Placement Experimental results
No solution for OLTP workloads Related work Request DBMS Data requirements Flash-opt. DB Algs. HDD optimized I/O pattern Data placement Block Device API Flash FS Flash optimized I/O pattern Flash device FTL Flash memory access No solution for OLTP workloads
Random Write – Other devices Vendor advertised performance Rand. Write « Rand. Read 0.1 1 10 100 5000 10000 IO number Response time (ms) rt 13000 Pause length Seq. Reads Random Writes IO number Mtron SSD Rand. Write causes unpredictability *Graph from uFlip, Bouganim et al. CIDR 2009
Random Writes – Fusion ioDrive Microbenchmark – 8 kiB random writes Time(hours) Throughput (MiB/s) 100 200 300 150 50 250 350 5 10 15 20 Average over 1s Moving average
Sequential Writes – Fusion ioDrive Microbenchmark – 128kiB sequential write Throughput (MiB/s) Time(s) Seq. Writing: Good & Stable Performance
Idea – Change Data Placement Flash friendly I/O pattern Avoid all Random Writes Write in big chunks Tradeoffs – additional work: Give up seq. reads (SR and RR similar performance) More seq. writing Other overheads
Overview Random Write: how big of a problem? Random Write: why still a problem? Append-Pack Data Placement Theoretical model Experimental results
Append-Pack Algorithm Update page Update page Update page No more space Write hot dataset Write seq. Reclaim space No in-place updates Filter cold pages Write cold dataset Reclaim space Log start Valid page Log end Invalid page How much additional work?
Theoretical Page Reclaiming Overhead Update pages uniformly Equal prob. to replace a page # valid pages? α = sizeof (disk) sizeof (hotset) prob(valid) = f (α) → e -α α Worst case: 36% Easily achievable: 6-11%
Theoretical Speedup Up to 7x speedup Traditional Random Write I/O latency: TRW New latency: TSW+prob(valid)∙(TRR + TSW) Conservative assumption: TRW = 10∙TSW α = sizeof(device) / sizeof(data) Up to 7x speedup
Overview RW: how big of a problem? RW: why still a problem? Append-Pack Data Layout Experimental results
Experimental setup 4x Quad-core Opteron X86_64-linux v2.6.18 Fusion ioDrive 160GB PCIe 8 kiB I/Os, Direct I/O Parallel threads ≥ 16 Firmware runs on host Append-Pack implemented as shim library
OLTP microbenchmark Microbenchmark – 50% Rand Write / 50% Rand Read 1000 100 200 300 400 500 Throughput (MiB/s) Append-Pack 3000 4000 3000 4000 Average over 1s Moving average 1000 FTL? 9x improvement Time (s)
OLTP Microbenchmark Overview Performance better than predicted
What to remember Flash ≠ HDD We leverage Sequential Writing to avoid Random Writing Random Reading as good as Sequential Reading Append-pack – eliminate Random Writes 6-9x speedup
Thank you! http://dias.epfl.ch
Backup
FTLs Fully-associative sector translation [Lee et al. ’07] Superblock FTL [Kang et el. ‘06] Locality-Aware Sector Translation [Lee et al. ‘08] No solution for all workloads: Static tradeoffs & workload independence Lack of semantic knowledge Wrong I/O patterns -> complicated software layers destroy predictability
Other Flash Devices - Backup RR (IOPS) RW (IOPS) SW (MB/s) SR (MB/s) Intel x25-E 35,000 3,300 170 250 Memoright GT 10,000 500 130 120 Solidware 1,000 110 Fusion ioDrive 116,046 93,199 (75/25 mix) 750 670 Vendor advertised performance
Experimental Results - Backup RR/RW Baseline Append/Pack Speedup Prediction 50/50 38 MiB/s 349 MiB/s 9.1 6.2 75/25 48 MiB/s 397 MiB/s 8.3 4.3 90/10 131 MiB/s 541 MiB/s 4.1 2.5 (α = 2 in all experiments)
OLTP microbenchmark - Backup 50% RW/50% RR - before 50% RW/50% RR - after
OLTP Microbenchmark - Backup Traditional I/O
OLTP Microbenchmark - Backup Append-Pack