1 Stochastic Modeling of Large-Scale Solid-State Storage Systems: Analysis, Design Tradeoffs and Optimization Yongkun Li, Patrick P. C. Lee and John C.S.

Slides:



Advertisements
Similar presentations
Solid State Drive. Advantages Reliability in portable environments and no noise No moving parts Faster start up Does not need spin up Extremely low.
Advertisements

Paper by: Yu Li, Jianliang Xu, Byron Choi, and Haibo Hu Department of Computer Science Hong Kong Baptist University Slides and Presentation By: Justin.
Flash storage memory and Design Trade offs for SSD performance
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
Thank you for your introduction.
1 NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System Yuchong Hu 1, Chiu-Man Yu 2, Yan-Kit Li 2 Patrick P. C.
Myoungsoo Jung (UT Dallas) Mahmut Kandemir (PSU)
Trading Flash Translation Layer For Performance and Lifetime
SSDs: advantages exhibit higher speed than disks drive down power consumption offer standard interfaces like HDDs do.
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
Experiments on Query Expansion for Internet Yellow Page Services Using Log Mining Summarized by Dongmin Shin Presented by Dongmin Shin User Log Analysis.
FlashVM: Virtual Memory Management on Flash Mohit Saxena and Michael M. Swift Introduction Flash storage is the largest change to memory and storage systems.
International Conference on Supercomputing June 12, 2009
Impact of Data Locality on Garbage Collection in SSDs: A General Analytical Study Yongkun Li, Patrick P. C. Lee, John C. S. Lui, Yinlong Xu The Chinese.
A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation Yinlong Xu University of Science and Technology of.
FAWN: A Fast Array of Wimpy Nodes Presented by: Aditi Bose & Hyma Chilukuri.
Scalable Adaptive Data Dissemination Under Heterogeneous Environment Yan Chen, John Kubiatowicz and Ben Zhao UC Berkeley.
Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana.
Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious.
Comparing Coordinated Garbage Collection Algorithms for Arrays of Solid-state Drives Junghee Lee, Youngjae Kim, Sarp Oral, Galen M. Shipman, David A. Dillow,
Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…
Operating Systems CMPSC 473 I/O Management (2) December Lecture 24 Instructor: Bhuvan Urgaonkar.
Understanding Intrinsic Characteristics and System Implications of Flash Memory based Solid State Drives Feng Chen, David A. Koufaty, and Xiaodong Zhang.
CSE 321b Computer Organization (2) تنظيم الحاسب (2) 3 rd year, Computer Engineering Winter 2015 Lecture #4 Dr. Hazem Ibrahim Shehata Dept. of Computer.
Flash memory Yi-Chang Li
Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Evaluation of software engineering. Software engineering research : Research in SE aims to achieve two main goals: 1) To increase the knowledge about.
/38 Lifetime Management of Flash-Based SSDs Using Recovery-Aware Dynamic Throttling Sungjin Lee, Taejin Kim, Kyungho Kim, and Jihong Kim Seoul.
Speaker: 吳晋賢 (Chin-Hsien Wu) Embedded Computing and Applications Lab Department of Electronic Engineering National Taiwan University of Science and Technology,
2010 IEEE ICECS - Athens, Greece, December1 Using Flash memories as SIMO channels for extending the lifetime of Solid-State Drives Maria Varsamou.
Data Storage Systems: A Survey Abdullah Aldhamin July 29, 2013 CMPT 880: Large-Scale Multimedia Systems and Cloud Computing Course Project.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS.
Embedded System Lab. Jung Young Jin The Design and Implementation of a Log-Structured File System D. Ma, J. Feng, and G. Li. LazyFTL:
1 CloudVS: Enabling Version Control for Virtual Machines in an Open- Source Cloud under Commodity Settings Chung-Pan Tang, Tsz-Yeung Wong, Patrick P. C.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Wei-Shen, Hsu 2013 IEE5011 –Autumn 2013 Memory Systems Solid State Drive with Flash Memory Wei-Shen, Hsu Department of Electronics Engineering National.
1 Enabling Efficient and Reliable Transitions from Replication to Erasure Coding for Clustered File Systems Runhui Li, Yuchong Hu, Patrick P. C. Lee The.
Adaptive Hopfield Network Gürsel Serpen Dr. Gürsel Serpen Associate Professor Electrical Engineering and Computer Science Department University of Toledo.
Container Marking : Combining Data Placement, Garbage Collection and Wear Leveling for Flash MASCOTS '11 Xiao-Yu Hu, Robert Haas, and Eleftheriou Evangelos.
Operating Systems CMPSC 473 I/O Management (3) December 07, Lecture 24 Instructor: Bhuvan Urgaonkar.
A Semi-Preemptive Garbage Collector for Solid State Drives
A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions Youyou Lu 1, Jiwu Shu 1, Jia Guo 1, Shuai Li 1, Onur Mutlu 2 LightTx:
Lecture 22 SSD. LFS review Good for …? Bad for …? How to write in LFS? How to read in LFS?
Carnegie Mellon University, *Seagate Technology
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.
대용량 플래시 SSD의 시스템 구성, 핵심기술 및 기술동향
Application-Managed Flash
 The emerged flash-memory based solid state drives (SSDs) have rapidly replaced the traditional hard disk drives (HDDs) in many applications.  Characteristics.
Rethinking RAID for SSD based HPC Systems Yugendra R. Guvvala, Yong Chen, and Yu Zhuang Department of Computer Science, Texas Tech University, Lubbock,
1 Paolo Bianco Storage Architect Sun Microsystems An overview on Hybrid Storage Technologies.
Elastic Parity Logging for SSD RAID Arrays Yongkun Li*, Helen Chan #, Patrick P. C. Lee #, Yinlong Xu* *University of Science and Technology of China #
Solid State Disk Prof. Moinuddin Qureshi Georgia Tech.
Internal Parallelism of Flash Memory-Based Solid-State Drives
QoS-aware Flash Memory Controller
Understanding Modern Flash Memory Systems
Towards Scalable Traffic Management in Cloud Data Centers
Parallel-DFTL: A Flash Translation Layer that Exploits Internal Parallelism in Solid State Drives Wei Xie1 , Yong Chen1 and Philip C. Roth2 1. Texas Tech.
Q-Learning for Policy Improvement in Network Routing
An Adaptive Data Separation Aware FTL for Improving the Garbage Collection Efficiency of Solid State Drives Wei Xie and Yong Chen Texas Tech University.
Repairing Write Performance on Flash Devices
Design Tradeoffs for SSD Performance
Disk scheduling In multiprogramming systems several different processes may want to use the system's resources simultaneously. The disk drive needs some.
HashKV: Enabling Efficient Updates in KV Storage via Hashing
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
TECHNICAL SEMINAR PRESENTATION
Parallel Garbage Collection in Solid State Drives (SSDs)
CS 295: Modern Systems Storage Technologies Introduction
Design Tradeoffs for SSD Performance
Presentation transcript:

1 Stochastic Modeling of Large-Scale Solid-State Storage Systems: Analysis, Design Tradeoffs and Optimization Yongkun Li, Patrick P. C. Lee and John C.S. Lui The Chinese University of Hong Kong, Hong Kong Sigmetrics’13

SSD Storage is Emerging  Solid-state drives (SSDs) are widely adopted in data centers Examples: EMC XtremIO Array, NetApp Sandisk, Micron P420m  Pros of SSDs: High I/O throughput, low power, high reliability  Cons of SSDs: Wear-out 2 EMC XtremIO [Source:

How SSDs Work?  Organized into blocks  Each block has 64 or 128 pages of size 4KB each  Three basic operations: read, write, erase Read, write: per-page basis Erase: per-block basis  Out-of-place write for updates: Write to a clean page and mark it as valid Mark original page as invalid 3

 Garbage collection (GC) needed to reclaim clean pages Choose a block to erase Move valid pages to another clean block Erase the block  Challenges: Blocks can only be erased a finite number of times SLC: 100K; MLC: 10K; 3-bit MLC: several K to several hundred When a block reaches the limit, it wears out Bit error rates increase as blocks wear down 210 Block A Block B 1. write 2. erase 02 Block A Before GC After GC How SSDs Work? 4

Motivation  Design tradeoff of GC algorithms Minimizing cleaning cost reclaim the block with smallest number of valid pages improve I/O throughput and minimize write amplification Maximizing wear-leveling erase all blocks as evenly as possible improve durability  Problems How to model the performance-durability tradeoff of GC algorithms? How to parameterize a GC algorithm to adapt to different tradeoff requirements? 5

Our Work  Develop a Markov model to characterize I/O dynamics  Use mean-field analysis to derive asymptotic steady state  Develop an optimization framework to analyze the tradeoff  Propose a tunable GC algorithm which operates along the optimal tradeoff curve  Conduct trace-driven simulations on DiskSim with SSD add-ons 6 Construct an analytical model that characterizes tradeoff between cleaning cost and wear-leveling for a general class of GC algorithms

Related Work on GC  Empirical analysis: Agrawal et al. (USENIX ATC08) addressed tradeoff between cleaning cost and wear-leveling in GC  Theoretical analysis on GC: focus on write amplification Hu et al. (SYSTOR09), Bux et al. (Performance10), Desnoyers (SYSTOR12): model different greedy algorithms on GC Benny Van Houdt (Sigmetrics13) also models write amplification of GC algorithms using mean field technique Our work analyzes performance tradeoff of different GC algorithms, with more general access pattern and address mapping; also conducts trace-driven simulations 7

Markov Model 8 012

I/O Dynamics Block A Block B 1. write 2. erase 02 Block A Before GC After GC

I/O Dynamics Before 2103 After 210 Before 210 After

State Transition 11 State transition of a block

State Transition 12

Mean Field Analysis 13 Proof in technical report.

Steady-State Solution 14 Proof in technical report.

General Access Pattern 15

Performance Metrics 16

Tradeoff Analysis 17

Tradeoff Analysis 18

Tradeoff Analysis  Optimal tradeoff 19  Solution Greedy algorithm minimizes cleaning cost Random algorithm maximizes wear-leveling tradeoff

Randomized Greedy Algorithm 20 MD Mitzenmacher, “The Power of Two Choices in Randomized Load Balancing”, 1996

Numerical Results  Performance tradeoff 21  Tradeoff indeed exists for GC algorithms  RGA operates along the optimal tradeoff curve

Experimental Results  Environment: DiskSim with SSD add-ons  System configuration 32GB SSD with 8 flash chips, with 16,384 physical blocks each GC is performed independently in each chip Each block has 64 pages of size 4KB each 15% storage space preserved Threshold for triggering GC: free blocks less than 5%  Datasets Financial, Webmail, Online and Webmail+Online Write intensive (around 80% write requests) 22

Cleaning Cost & Wear-leveling 23  Greedy algorithm has the lowest cleaning cost and random algorithm achieves the highest wear-leveling  RGA balances the tradeoff  See our paper for I/O throughput and durability results

Summary  Propose a stochastic model that characterizes tradeoff between cleaning cost (performance) and wear-leveling (durability) of GC algorithms in SSDs Random algorithm and greedy algorithm stand for the two extreme points in the tradeoff curve  Propose a randomized greedy algorithm that operates on the optimal tradeoff curve  Conduct extensive trace-driven simulations  Future work: Hot/cold separation, address mapping, RAID reliability 24