Myoungsoo Jung (UT Dallas) Mahmut Kandemir (PSU)

Slides:



Advertisements
Similar presentations
Paper by: Yu Li, Jianliang Xu, Byron Choi, and Haibo Hu Department of Computer Science Hong Kong Baptist University Slides and Presentation By: Justin.
Advertisements

Myoungsoo Jung (UT-Dallas)
Sprinkler: Maximizing Resource Utilization in Many-Chip Solid State Disk Myoungsoo Jung (UT Dallas) Mahmut Kandemir (PSU) University of Texas at Dallas.
Challenges in Getting Flash Drives Closer to CPU Myoungsoo Jung (UT-Dallas) Mahmut Kandemir (PSU) The University of Texas at Dallas.
Flash storage memory and Design Trade offs for SSD performance
Recent Progress In Embedded Memory Controller Design
Snapshots in a Flash with ioSnap TM Sriram Subramanian, Swami Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau Copyright © 2014.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Disks and RAID.
SSDs: advantages exhibit higher speed than disks drive down power consumption offer standard interfaces like HDDs do.
SYSTOR2010, Haifa Israel Optimization of LFS with Slack Space Recycling and Lazy Indirect Block Update Yongseok Oh The 3rd Annual Haifa Experimental Systems.
International Conference on Supercomputing June 12, 2009
Impact of Data Locality on Garbage Collection in SSDs: A General Analytical Study Yongkun Li, Patrick P. C. Lee, John C. S. Lui, Yinlong Xu The Chinese.
And How It Effects SQL Server. NAND Flash Structure MLC and SLC Compared NAND Flash Read Properties NAND Flash Write Properties Wear-Leveling Garbage.
Memory access scheduling Authers: Scott RixnerScott Rixner,William J. Dally,Ujval J. Kapasi, Peter Mattson, John D. OwensWilliam J. DallyUjval J. KapasiPeter.
Hystor : Making the Best Use of Solid State Drivers in High Performance Storage Systems Presenter : Dong Chang.
Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…
 A basic overview  Presented by:  Steve Jones, Gran-IT Consulting, Inc.
Operating Systems CMPSC 473 I/O Management (2) December Lecture 24 Instructor: Bhuvan Urgaonkar.
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
Understanding Intrinsic Characteristics and System Implications of Flash Memory based Solid State Drives Feng Chen, David A. Koufaty, and Xiaodong Zhang.
Flash memory Yi-Chang Li
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Origianal Work Of Hyojun Kim and Seongjun Ahn
1 Hardware Support for Collective Memory Transfers in Stencil Computations George Michelogiannakis, John Shalf Computer Architecture Laboratory Lawrence.
Embedded System Lab. 서동화 HIOS: A Host Interface I/O Scheduler for Solid State Disk.
F2FS: A New File System for Flash Storage
2010 IEEE ICECS - Athens, Greece, December1 Using Flash memories as SIMO channels for extending the lifetime of Solid-State Drives Maria Varsamou.
Data Storage Systems: A Survey Abdullah Aldhamin July 29, 2013 CMPT 880: Large-Scale Multimedia Systems and Cloud Computing Course Project.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
Server to Server Communication Redis as an enabler Orion Free
PROBLEM STATEMENT A solid-state drive (SSD) is a non-volatile storage device that uses flash memory rather than a magnetic disk to store data. SSDs provide.
1 Presented By: Michael Bieniek. Embedded systems are increasingly using chip multiprocessors (CMPs) due to their low power and high performance capabilities.
Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
A Semi-Preemptive Garbage Collector for Solid State Drives
A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions Youyou Lu 1, Jiwu Shu 1, Jia Guo 1, Shuai Li 1, Onur Mutlu 2 LightTx:
A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd.
Sunpyo Hong, Hyesoon Kim
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.
대용량 플래시 SSD의 시스템 구성, 핵심기술 및 기술동향
Application-Managed Flash
 The emerged flash-memory based solid state drives (SSDs) have rapidly replaced the traditional hard disk drives (HDDs) in many applications.  Characteristics.
Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.
Elastic Parity Logging for SSD RAID Arrays Yongkun Li*, Helen Chan #, Patrick P. C. Lee #, Yinlong Xu* *University of Science and Technology of China #
NVMMU: A Non-Volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures Jie Zhang1, David Donofrio3, John Shalf3, Mahmut Kandemir2,and Myoungsoo.
Persistent Memory (PM)
Internal Parallelism of Flash Memory-Based Solid-State Drives
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
Chapter 10: Mass-Storage Systems
FlashTier: A Lightweight, Consistent and Durable Storage Cache
Parallel-DFTL: A Flash Translation Layer that Exploits Internal Parallelism in Solid State Drives Wei Xie1 , Yong Chen1 and Philip C. Roth2 1. Texas Tech.
Shiqin Yan, Huaicheng Li, Mingzhe Hao,
I/O System Chapter 5 Designed by .VAS.
Using non-volatile memory (NVDIMM-N) as block storage in Windows Server 2016 Tobias Klima Program Manager.
An Adaptive Data Separation Aware FTL for Improving the Garbage Collection Efficiency of Solid State Drives Wei Xie and Yong Chen Texas Tech University.
Operating Systems ECE344 Lecture 11: SSD Ding Yuan
Understanding System Characteristics of Online Erasure Coding on Scalable, Distributed and Large-Scale SSD Array Systems Sungjoon Koh, Jie Zhang, Miryeong.
File Processing : Storage Media
Repairing Write Performance on Flash Devices
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
File Processing : Storage Media
Arash Tavakkol, Mohammad Sadrosadati, Saugata Ghose,
Parallel Garbage Collection in Solid State Drives (SSDs)
COS 518: Advanced Computer Systems Lecture 9 Michael Freedman
CS 295: Modern Systems Storage Technologies Introduction
Memory Management & Virtual Memory
Introduction to Operating Systems
Dong Hyun Kang, Changwoo Min, Young Ik Eom
Presentation transcript:

Revisiting Widely Held SSD Expectations and Rethinking System-Level Implication Myoungsoo Jung (UT Dallas) Mahmut Kandemir (PSU) The University of Texas at Dallas Computer Architecture and Memory Systems Lab.

Testing Expectations on Motivation Evaluation Setup Testing Expectations on Reads Writes Advanced Schemes

Testing Expectations on Motivation Evaluation Setup Testing Expectations on Reads Writes Advanced Schemes

We know SSDs! Reads Writes 10x~100x better than writes Reliable (no erase) Fast random accesses Less overheads Writes GC Impacts DRAM Buffer Faster than HDD

We are carefully using them!! Read Cache Reads Memory Extension Read-Only Storage Virtual Memory Writes Burst Buffer Checkpointing Swap/Hibernation Management

Then, why do we need to rethink? NAND Core Architecture Have shrank 5x to 2x nm Less reliable / extra operations Longer latency Multiple channels and pipelining Queue/IO rescheduling methods Internal DRAM buffer cache Firmware/OS Packaging Multiple dies and planes Package-level queue ECC engines Fast data movement interfaces Advanced mapping algorithms TRIM Background task managements

Write Performance/Background Tasks OS Admin HPC App Reliability on Reads Write Performance/Background Tasks OS Supports Read Performance Firmware/OS Architecture Packaging NAND Core

Testing Expectations on Motivation Evaluation Setup Testing Expectations on Reads Writes Advanced Schemes

SSD Test-Beds Multi-core SSD DRAM-less SSD High-reliable SSD Over-provisioned SSD

Tools Intel Iometer LeCroy Sierra M6-1 SATA protocol analyzer In-house AHCI minport driver

Firmware/OS Architecture Packaging NAND Core Write Performance/ Reliability on Reads Write Performance/ Background Tasks OS Supports Read Performance Firmware/OS Architecture Packaging NAND Core

Are SSDs good for applications that exhibit mostly random reads? Observation Performance values with random read accesses are worse than other types of access patterns and operations

Are SSDs good for applications that exhibit mostly random reads? 39% 59% 23% [SSD-C] [SSD-L] [SSD-Z]

Are SSDs good for applications that exhibit mostly random reads? 23% ~ 59% of latency values with random reads [SSD-C] [SSD-L] [SSD-Z]

Are SSDs good for applications that exhibit mostly random reads? No! Why? HOST Address translation for reads/ address remapping for writes

Are SSDs good for applications that exhibit mostly random reads? No! Why? Rand. writes  Seq. writes (by remapping addresses on writes) Lack of internal parallelism on random reads Resource conflicts on random accesses

Can we achieve sustained read performance with seq. accesses? Observation Sequential read performance characteristics get worse with aging and as I/O requests are being processed

Can we achieve sustained read performance with seq. accesses? Most I/O requests are served in 200 us [SSD-C] [SSD-L] [SSD-Z]

Can we achieve sustained read performance with seq. accesses? 2x ~ 3x worse than pristine state SSDs [SSD-C] [SSD-L] [SSD-Z]

Can we achieve sustained read performance with seq. accesses? No! Why? We believe that this performance degradation on reads is mainly caused by Fragmented physical data layout

Firmware/OS Architecture Packaging NAND Core Write Performance/ Reliability on Reads Write Performance/ Background Tasks OS Supports Read Performance Firmware/OS Architecture Packaging NAND Core

Do program/erase (PE) cycles of SSDs increase during read only operations? Observation Read requests can shorten the SSDs lifespan PE cycles on reads are not well managed by underlying firmware

Do program/erase (PE) cycles of SSDs increase during read only operations? Reach 1% ~ 50% of PE cycles on writes 247x 12x [PE cycles on seq. access pattern] [PE cycles on rand. access pattern] 1 hour I/O services per evaluation round

Do program/erase (PE) cycles of SSDs increase during read only operations? Unfortunately, Yes. Why? Vpass Vpass 0V Vpass Vpass Can gain charge (need to perform an erase)

Firmware/OS Architecture Packaging NAND Core Write Performance/ Reliability on Reads Write Performance/ Background Tasks OS Supports Read Performance Firmware/OS Architecture Packaging NAND Core

TRIM FILE A FILE B TRIM Can be wiped out VALID INVALID INVALID VALID

Can TRIM command reduce GC overheads? Observation SSDs do not trim all the data SSD performance with TRIM command is strongly related to the TRIM command submission patterns (SEQ-TRIM vs. RND-TRIM)

Can TRIM command reduce GC overheads? SEQ-TRIM = Pristine Pristine (Trimmed SSD = pristine state SSD??) RND-TRIM = NON-TRIM [SSD-C] [SSD-Z]

Can TRIM command reduce GC overheads? [SSD-C] [SSD-Z]

Please take a look!! There exist 25 questions we tested In the paper, we have 59 different types of empirical evaluation including : Overheads on runtime bad block managements, ECC overheads Physical data layout performance impact DRAM caching impact Background tasks and etc.

Thank you!

Backup

Firmware/OS Architecture Packaging NAND Core Reliability on Reads Write Performance OS Supports Read Performance Firmware/OS Architecture Packaging NAND Core

How much impact does the worst-case latency have on modern SSDs? The worst-case latencies on fully-utilized SSDs are much worse than that of HDDs

How much impact does the worst-case latency have on modern SSDs? 2x ~ 173x better than Enterprise-scale HDD 12x ~ 17x worse than 10K HDD [Average latency -- SSDs vs. enterprise HDD] [Worst-case latency -- SSDs vs. enterprise HDD]

What is the correlation between the worst-case latency and throughput? Observation SSD latency and bandwidth become 11x and 3x respectively worse than normal writes Performance degradation on the writes is not recovered even after many GCs are executed

What is the correlation between the worst-case latency and throughput? Recovered immediately [SSD-C] [SSD-L]

What is the correlation between the worst-case latency and throughput? Write-cliff Performance is not recovered [SSD-C] [SSD-L]

What is the correlation between the worst-case latency and throughput? Write Cliff. Why? INVALID VALID The range of random access addresses is not covered by the reclaimed block INVALID VALID Update Block VALID INVALID New Block VALID INVALID Data Block Free Block Pool

Could DRAM buffer help the firmware to reduce GC overheads? Observation DRAM buffers (before write cliff) offer 4x shorter latency (after write cliff kicks in) introduce 2x ~ 16x worse latency

Could DRAM buffer help the firmware to reduce GC overheads? 16x worse 4x better [SSD-C] Write-Cliff [SSD-L]

Could DRAM buffer help the firmware to reduce GC overheads? No! Why? Flushing of buffered data introduces large number of random accesses, which can in turn accelerate GC invocation on write-cliffs

Can background tasks of current SSDs guarantee sustainable performance? 0.1% (30 idle secs) 7% (1 idle hour) [SSD-C BFLUSH] [SSD-L BGC]

Why can’t BGC help with the foreground tasks? Endurance characteristics Block erasure acceleration Power consumption problem on idle

Does TRIM command incur any overheads? Moderns SSDs require much longer latencies to trim data than normal I/O operation would take I-TRIM: data invalidation based on address an d prompt response E-TRIM: block erasure in real-time

Does TRIM command incur any overheads? [SSD-Z] [SSD-L]

Does TRIM command incur any overheads? [SSD-C] [SSD-A]

What types of background tasks exist in modern SSDs? BFLUSH: flushing in-memory data into flush medium, creating extra room, which can be used to buffer the new incoming I/O req. BGC: performed GCs in the background

What types of background tasks exist in modern SSDs? [Cache-on SSD] [Cache-off SSD]

What types of background tasks exist in modern SSDs? [Cache-on SSD] [Cache-off SSD]

What types of background tasks exist in modern SSDs? Excluding BFLUSH, there is only one SSD (SSD-L) perform BGC There exist several benchmark results published assuming BGC, and even SSD maker indicated that they alleviate GC overheads by utilizing idle times

Read Overheads ECC Recovery Runtime Bad Block Management