Migrating Server Storage to SSDs: Analysis of Tradeoffs

Slides:

Advertisements

Similar presentations

Key Metrics for Effective Storage Performance and Capacity Reporting.

Advertisements

MS SQL Server & Solid State Storage November 2013 Gavin McLaughlin Solutions Development Director X-IO International Cutting through the marketing hype.

Write off-loading: Practical power management for enterprise storage D. Narayanan, A. Donnelly, A. Rowstron Microsoft Research, Cambridge, UK.

1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.

Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)

International Conference on Supercomputing June 12, 2009

What will my performance be? Resource Advisor for DB admins Dushyanth Narayanan, Paul Barham Microsoft Research, Cambridge Eno Thereska, Anastassia Ailamaki.

Migrating Server Storage to SSDs: Analysis of Tradeoffs Dushyanth Narayanan Eno Thereska Austin Donnelly Sameh Elnikety Antony Rowstron Microsoft Research.

Intel Confidential Key Points Buy More Sell More.

Systems & networking MSR Cambridge Tim Harris 2 July 2009.

Everest: scaling down peak loads through I/O off-loading D. Narayanan, A. Donnelly, E. Thereska, S. Elnikety, A. Rowstron Microsoft Research Cambridge,

STORAGE Virtualization

Ruston Panabaker Architect Windows Hardware Innovation Group

Solid-State Drive Ding Ruogu Kong Liang. A solid-state drive (SSD) is a data storage device that uses solid-state memory to store persistent data.

CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.

SSDs versus HDDs. 2 HDD options  HDD options are well known  Enterprise/Desktop/Laptop  SGI sells virtually no desktop or laptop drives  15K, 10K,

Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious.

Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.

Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.

Hystor : Making the Best Use of Solid State Drivers in High Performance Storage Systems Presenter : Dong Chang.

Comparing Coordinated Garbage Collection Algorithms for Arrays of Solid-state Drives Junghee Lee, Youngjae Kim, Sarp Oral, Galen M. Shipman, David A. Dillow,

Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…

SQL Server 2008 & Solid State Drives Jon Reade SQL Server Consultant SQL Server 2008 MCITP, MCTS Co-founder SQLServerClub.com, SSC

Just a really fast drive Jakub Topič, I3.B

“Five minute rule ten years later and other computer storage rules of thumb” Authors: Jim Gray, Goetz Graefe Reviewed by: Nagapramod Mandagere Biplob Debnath.

Operating Systems CMPSC 473 I/O Management (2) December Lecture 24 Instructor: Bhuvan Urgaonkar.

Slide 1 Windows PC Accelerators Reporter ：吳柏良. Slide 2 Outline l Introduction l Windows SuperFetch l Windows ReadyBoost l Windows ReadyDrive l Conclusion.

DISKS IS421. DISK  A disk consists of Read/write head, and arm  A platter is divided into Tracks and sector  The R/W heads can R/W at the same time.

Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.

Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.

Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.

X-IO Technologies All Flash Arrays – Saviour of the storage world ? October 2013 Jim Litke Principal Systems Engineer X-IO.

Introduction. Outline What is database tuning What is changing The trends that impact database systems and their applications What is NOT changing The.

SOLID STATE DRIVES By: Vaibhav Talwar UE84071 EEE(5th Sem)

FlashSystem family 2014 © 2014 IBM Corporation IBM® FlashSystem™ V840 Product Overview.

Lecture 16: Storage and I/O EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.

IT253: Computer Organization

Chapter Twelve Memory Organization

I/O Computer Organization II 1 Introduction I/O devices can be characterized by – Behavior: input, output, storage – Partner: human or machine – Data rate:

Eric Burgener VP, Product Management A New Approach to Storage in Virtual Environments March 2012.

PROBLEM STATEMENT A solid-state drive (SSD) is a non-volatile storage device that uses flash memory rather than a magnetic disk to store data. SSDs provide.

Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.

1EMC CONFIDENTIAL—INTERNAL USE ONLY FAST VP and Exchange Server 2010 Don Turner Consultant Systems Integration Engineer Microsoft TPM.

연세대학교 Yonsei University Data Processing Systems for Solid State Drive Yonsei University Mincheol Shin

CLOUD BASED STORAGE Amy. Cloud Based Storage Cloud based storage is “the storage of data online in the cloud”

Tackling I/O Issues 1 David Race 16 March 2010.

Jérôme Jaussaud, Senior Product Manager

Maximizing Performance – Why is the disk subsystem crucial to console performance and what’s the best disk configuration. Extending Performance – How.

Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.

1 Paolo Bianco Storage Architect Sun Microsystems An overview on Hybrid Storage Technologies.

Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.

CDA 3101 Spring 2016 Introduction to Computer Organization Physical Memory, Virtual Memory and Cache 22, 29 March 2016.

Decentralized Distributed Storage System for Big Data Presenter: Wei Xie Data-Intensive Scalable Computing Laboratory(DISCL) Computer Science Department.

Lecture 17 Raid. Device Protocol Variants Status checks: polling vs. interrupts Data: PIO vs. DMA Control: special instructions vs. memory-mapped I/O.

System Storage TM © 2007 IBM Corporation IBM System Storage™ DS3000 Series Jüri Joonsaar Tartu.

Internal Parallelism of Flash Memory-Based Solid-State Drives

COS 518: Advanced Computer Systems Lecture 8 Michael Freedman

Database Management Systems (CS 564)

BD-CACHE Big Data Caching for Datacenters

HPE Persistent Memory Microsoft Ignite 2017

Short Circuiting Memory Traffic in Handheld Platforms

Upgrading to Microsoft SQL Server 2014

Lecture 9: Data Storage and IO Models

reFresh SSDs: Enabling High Endurance, Low Cost Flash in Datacenters

COS 518: Advanced Computer Systems Lecture 8 Michael Freedman

CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.

Mark Zbikowski and Gary Kimura

CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.

COS 518: Advanced Computer Systems Lecture 9 Michael Freedman

Presentation transcript:

Migrating Server Storage to SSDs: Analysis of Tradeoffs Dushyanth Narayanan Eno Thereska Austin Donnelly Sameh Elnikety Antony Rowstron Microsoft Research Cambridge, UK

Solid-state drive (SSD) Block storage interface Persistent Flash Translation Layer (FTL) Random-access NAND Flash memory Low power Cost, Parallelism, FTL complexity USB drive Laptop SSD “Enterprise” SSD

Enterprise storage is different Laptop storage Low speed disks Form factor Single-request latency Ruggedness Battery life Enterprise storage High-end disks, RAID Fault tolerance Throughput under load (deep queues) Capacity Energy ($)

Replacing disks with SSDs Match performance Match capacity Disks $$ Flash $$$$$ Flash $

SSD as intermediate tier? DRAM buffer cache Capacity Performance Read cache + write-ahead log $ $$$$

Other options? Hybrid drives? Modify file system? Flash inside the disk can pin hot blocks Volume-level tier more sensible for enterprise Modify file system? Put metadata in the SSD? We want to plug in SSDs transparently Replace disks by SSDs Add SSD tier for caching and/or write logging

Challenge Given a workload We traced many real enterprise workloads Which device type, how many, 1 or 2 tiers? We traced many real enterprise workloads Benchmarked enterprise SSDs, disks And built an automated provisioning tool Takes workload, device models And computes best configuration for workload

Roadmap Introduction Devices and workloads Solving for best configuration Results

High-level design

Sequential throughput Random-access throughput Devices (2008) Device Price Size Sequential throughput Random-access throughput Seagate Cheetah 10K $123 146 GB 85 MB/s 288 IOPS Seagate Cheetah 15K $172 88 MB/s 384 IOPS Memoright MR25.2 $739 32 GB 121 MB/s 6450 IOPS Intel X25-E (2009) $415 32GB 250 MB/s 35000 IOPS Seagate Momentus 7200 $53 160 GB 64 MB/s 102 IOPS First two are enterprise disks. Next two are enterprise SSDs. Last is a low power disk usually not used for enterprise. That’s for power, which is discussed in the paper but we won’t be going into it in detail in the talk. So we won’t be looking at the Momentus in this talk. So: scaled by dollar cost. Disks win on capacity. SSDs win big on IOPS/$. Sequential is about comparable. X25-E was not available at the time ... The perf numbers are marketing numbers from Intel. So we won’t be showing results from that, but we will come back to it later in the talk when we discuss cost/capacity of SSDs since it is significantly cheaper than the Memoright per gigabyte.

Characterizing devices Sequential vs random, read vs write Some SSDs have slow random writes Newer SSDs remap internally to sequential We model both “vanilla” and “remapped” Multiple capacity versions per device Different cost/capacity/performance tradeoffs We consider several versions when solving

Device metrics Metric Unit Source Price $ Retail Capacity GB Vendor Random-access read rate IOPS Measured Random-access write rate Sequential read rate MB/s Sequential write rate Power W

Enterprise workload traces I/O traces from live production servers Exchange server (5000 users): 24 hr trace MSN back-end file store: 6 hr trace 13 servers from small DC (MSRC) File servers, web server, web cache, etc. 1 week trace 15 servers, 49 volumes, 313 disks, 14 TB Volumes are RAID-1, RAID-10, or RAID-5 More details in the paper.

Enterprise workload traces Traces are at volume (block device) level Below buffer cache, above RAID controller Timestamp, LBN, size, read/write Each volume’s trace is a workload We consider each volume separately

Workload metrics Metric Unit Capacity GB Peak random-access read rate IOPS Peak random-access write rate Peak random-access I/O rate (reads+writes) Peak sequential read rate MB/s Peak sequential write rate Fault tolerance Redundancy level

Workload trace  metrics Capacity largest LBN accessed in trace Performance = peak (or 99th pc) load Highest observed IOPS of random I/Os Highest observed transfer rate (MB/s) Fault tolerance Set to same as current configuration 1 redundant device

What is the best config? Cheapest one that meets requirements Config  device type, #devices, #tiers Requirements capacity, perf, fault-tolerance Re-run/replay trace? Cannot provision h/w just to ask “what if” Simulators not always available/reliable First-order models of device performance Based on measured metrics Workload and volume are interchangeable

Solver For each workload, device type Compute #devices needed in RAID array Throughput, capacity scaled linearly with #devices Must match every workload requirement “Most costly” workload metric determines #devices Add devices need for fault tolerance Compute total cost

Two-tier model

Solving for two-tier model Feed I/O trace to cache simulator Emits top-tier, bottom-tier trace  solver Iterate over cache sizes, policies Write-back, write-through for logging LRU, LTR (long-term random) for caching Inclusive cache model Can also model exclusive (partitioning) More complexity, negligible capacity savings

Model assumptions First-order models Open-loop traces Ok for provisioning  coarse-grained Not for detailed performance modelling Open-loop traces I/O rate not limited by traced storage h/w Traced servers are well-provisioned with disks So bottleneck is elsewhere: assumption is ok

Roadmap Introduction Devices and workloads Finding the best configuration Analysis results

Single-tier results Cheetah 10K best device for all workloads! SSDs cost too much per GB Capacity or read IOPS determines cost Not read MB/s, write MB/s, or write IOPS For SSDs, always capacity For disks, either capacity or read IOPS Read IOPS vs. GB is the key tradeoff

Workload IOPS vs GB

SSD break-even point When will SSDs beat disks? When IOPS dominates cost Break even price point (SSD$/GB) is when Cost of GB (SSD) = Cost of IOPS (disk) Our tool also computes this point New SSD  compare its $/GB to break-even Then decide whether to buy it

Break-even point CDF

Break-even point CDF

Break-even point CDF

Capacity limits SSD On performance, SSD already beats disk $/GB too high by 1-3 orders of magnitude Except for small (system boot) volumes SSD price has gone down but This is per-device price, not per-byte price Raw flash $/GB also needs to drop By a lot At the end of the talk I’ll talk a little bit about the economics of this.

SSD as intermediate tier Read caching benefits few workloads Servers already cache in DRAM SSD tier doesn’t reduce disk tier provisioning Persistent write-ahead log is useful A small log can improve write latency But does not reduce disk tier provisioning Because writes are not the limiting factor

Power and wear SSDs use less power than Cheetahs But overall $ savings are small Cannot justify higher cost of SSD Flash wear is not an issue SSDs have finite #write cycles But will last well beyond 5 years Workloads’ long-term write rate not that high You will upgrade before you wear device out

Conclusion Capacity limits flash SSD in enterprise Not performance, not wear Flash might never get cheap enough If all Si capacity moved to flash today, will only match 12% of HDD production [Hetzler2008] There are more profitable uses of Si capacity Need higher density/scale (PCM?)

This space intentionally left blank

What are SSDs good for? Mobile, laptop, desktop Maybe niche apps for enterprise SSD Too big for DRAM, small enough for flash And huge appetite for IOPS Single-request latency Power Fast persistence (write log)

Assumptions that favour flash IOPS = peak IOPS Most of the time, load << peak Faster storage will not help: already underutilized Disk = enterprise disk Low power disks have lower $/GB, $/IOPS LTR caching uses knowledge of future Looks through entire trace for randomly-accessed blocks

Supply-side analysis [Hetzler2008] Disks: 14,000 PB/year, fab cost $1B MLC NAND flash: 390 PB/year, $3.4B If all Si capacity moved to MLC flash today Will only match 12% of HDD production Revenue: $35B HDD, $280B Silicon No economic incentive to use fabs for flash Steven Hetzler is an IBM Fellow at IBM Almaden lab.

Device characteristics Memoright SSD Cheetah 10K Cheetah 15K Momentus 7200 Price $739 $339 $172 $150 Capacity 32 GB 300 GB 146 GB 200 GB Power 1.0 W 10.1 W 12.5 W 0.8 W Read (seq) 121 MB/s 85 MB/s 88 MB/s 64 MB/s Write (seq) 126 MB/s 84 MB/s 54 MB/s Read (random) 6450 IOPS 277 IOPS 384 IOPS 102 IOPS Write (random) 351 IOPS 256 IOPS 269 IOPS 118 IOPS

9 of 49 benefit from caching

Energy savings << SSD cost

Wear-out times