Arash Tavakkol, Mohammad Sadrosadati, Saugata Ghose,

Slides:

Advertisements

Similar presentations

Application-Aware Memory Channel Partitioning † Sai Prashanth Muralidhara § Lavanya Subramanian † † Onur Mutlu † Mahmut Kandemir § ‡ Thomas Moscibroda.

Advertisements

Differentiated I/O services in virtualized environments

Myoungsoo Jung (UT Dallas) Mahmut Kandemir (PSU)

Boost Write Performance for DBMS on Solid State Drive Yu LI.

I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)

A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.

Distributed Systems Meet Economics: Pricing In The Cloud Authors: Hongyi Wang, Qingfeng Jing, Rishan Chen, Bingsheng He, Zhengping He, Lidong Zhou Presenter:

Hystor : Making the Best Use of Solid State Drivers in High Performance Storage Systems Presenter : Dong Chang.

Comparing Coordinated Garbage Collection Algorithms for Arrays of Solid-state Drives Junghee Lee, Youngjae Kim, Sarp Oral, Galen M. Shipman, David A. Dillow,

A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions Youyou Lu 1, Jiwu Shu 1, Jia Guo 1, Shuai Li 1, Onur Mutlu 2 LightTx:

FlashFQ: A Fair Queueing I/O Scheduler for Flash-Based SSDs Kai Shen and Stan Park University of Rochester USENIX-ATC

Operating Systems CMPSC 473 I/O Management (2) December Lecture 24 Instructor: Bhuvan Urgaonkar.

Virtualization Technology Prof D M Dhamdhere CSE Department IIT Bombay Moving towards Virtualization… Department of Computer Science and Engineering, IIT.

Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.

Embedded System Lab. 서동화 HIOS: A Host Interface I/O Scheduler for Solid State Disk.

Improving Network I/O Virtualization for Cloud Computing.

Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.

Chapter 2 Introduction to OS Chien-Chung Shen CIS, UD

A Semi-Preemptive Garbage Collector for Solid State Drives

A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions Youyou Lu 1, Jiwu Shu 1, Jia Guo 1, Shuai Li 1, Onur Mutlu 2 LightTx:

Silberschatz, Galvin, and Gagne  Applied Operating System Concepts Module 12: I/O Systems I/O hardwared Application I/O Interface Kernel I/O.

1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.

Native Command Queuing (NCQ). NCQ is used to improve the hard disc performance by re-ordering the commands send by the computer to the hard disc drive.

Computer System Structures Storage

Internal Parallelism of Flash Memory-Based Solid-State Drives

Chapter 10: Mass-Storage Systems

FlashTier: A Lightweight, Consistent and Durable Storage Cache

Module 12: I/O Systems I/O hardware Application I/O Interface

UH-MEM: Utility-Based Hybrid Memory Management

Current Generation Hypervisor Type 1 Type 2.

Reducing Memory Interference in Multicore Systems

Parallel-DFTL: A Flash Translation Layer that Exploits Internal Parallelism in Solid State Drives Wei Xie1 , Yong Chen1 and Philip C. Roth2 1. Texas Tech.

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

Database Management Systems (CS 564)

Using non-volatile memory (NVDIMM-N) as block storage in Windows Server 2016 Tobias Klima Program Manager.

BD-CACHE Big Data Caching for Datacenters

Matching Storage Spaces Direct Configurations to Your Workload

Chapter 2 Processes and Threads Today 2.1 Processes 2.2 Threads

Windows Server* 2016 & Intel® Technologies

Introduction to Computing

An Adaptive Data Separation Aware FTL for Improving the Garbage Collection Efficiency of Solid State Drives Wei Xie and Yong Chen Texas Tech University.

Application Slowdown Model

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices Arash Tavakkol, Juan Gómez-Luna, Mohammad Sadrosadati, Saugata Ghose,

Rachata Ausavarungnirun GPU 2 (Virginia EF) Tuesday 2PM-3PM

Rachata Ausavarungnirun

Genome Read In-Memory (GRIM) Filter Fast Location Filtering in DNA Read Mapping with Emerging Memory Technologies Jeremie Kim, Damla Senol, Hongyi Xin,

reFresh SSDs: Enabling High Endurance, Low Cost Flash in Datacenters

Jeremie S. Kim Minesh Patel Hasan Hassan Onur Mutlu

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques HPCA Session 3A – Monday, 3:15 pm,

Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu

Operating System Concepts

13: I/O Systems I/O hardwared Application I/O Interface

Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose,

Reducing DRAM Latency via

PARAMETER-AWARE I/O MANAGEMENT FOR SOLID STATE DISKS

Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu

Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose,

Chapter 13: I/O Systems I/O Hardware Application I/O Interface

Linux Block I/O Layer Chris Gill, Brian Kocoloski

Chapter 11: Mass-Storage Systems

Performance-Robust Parallel I/O

ITAP: Idle-Time-Aware Power Management for GPU Execution Units

CS 295: Modern Systems Storage Technologies Introduction

CROW A Low-Cost Substrate for Improving DRAM Performance, Energy Efficiency, and Reliability Hi, my name is Hasan. Today, I will be presenting CROW, which.

Module 12: I/O Systems I/O hardwared Application I/O Interface

Presented by Florian Ettinger

Saugata Ghose Carnegie Mellon University

Efficient Migration of Large-memory VMs Using Private Virtual Memory

Presentation transcript:

FLIN: Enabling Fairness and Enhancing Performance in Modern NVMe Solid State Drives Arash Tavakkol, Mohammad Sadrosadati, Saugata Ghose, Jeremie S. Kim, Yixin Luo, Yaohua Wang, Nika Mansouri Ghiasi, Lois Orosa, Juan Gómez-Luna, Onur Mutlu Hello, my name is Arash Tavakkol and I will be talking about FLIN a new device-level I/O scheduler for modern SSDs.

Fairness Control Motivation SSDs are widely used as a storage medium SSDs initially adopted conventional host interface protocols (e.g., SATA) Designed for magnetic hard disk drives: only thousands of IOPS per device Process 1 Process 2 Process 3 OS Block Layer SSDs are widely used as a storage medium today due to their performance and power consumption benefits. [CLICK] SSDs initially adopted existing host–interface protocols that were originally designed for lower-performance HDDs. [CLICK] These protocols rely on the OS [CLICK] to manage I/O requests and data transfers between the host system and the SSD. [CLICK] [CLICK] and the OS block layer is responsible for fairness control to equalize the effects of interference across applications. In-DRAM I/O Request Queue I/O Scheduler Fairness Control Hardware dispatch queue SSD Device

Motivation Modern SSDs use high-performance host interface protocols (e.g., NVMe) Takes advantage of SSD throughput: enables millions of IOPS per device Bypasses OS intervention: SSD must perform scheduling, ensure fairness Process 1 Process 2 Process 3 Modern SSDs, on the other hand, use new host–interface protocols to provide applications with fast access to storage. [CLICK] In these protocols the overheads of the OS software stack is eliminated and the SSD has direct access to the application-level I/O request queues. However, these protocols require the SSD device itself to provide fairness. [CLICK] Our fundamental question in this paper is: do modern SSDs provide fairness? And if not, how can we solve this issue. In-DRAM I/O Request Queue Fairness should be provided by the SSD itself. Do modern SSDs provide fairness? SSD Device

Modern NVMe SSDs focus on providing high performance Motivation We study fairness control in real state-of-the-art SSDs An example of two datacenter workloads running concurrently tpce tpcc We study fairness control in real state-of-the-art SSDs. [CLICK] Here is an example where two datacenter workloads, namely tpce and tpcc, are executed on a modern SSD. The upper plot shows the slowdown of [CLICK] tpce and [CLICK] tpcc in concurrent execution with respect to their alone execution. [CLICK] As the plot shows, tpce is highly slowed down in concurrent execution while the effects of concurrent execution on tpcc is very modest, which leads to very low fairness as shown in the lower plot. [CLICK] We conclude that NVMe SSDs focus on providing high performance at the expense of large amounts of unfairness. Modern NVMe SSDs focus on providing high performance at the expense of large amounts of unfairness

Our First Contribution We perform a comprehensive analysis of inter- application interference in state-of-the-art SSDs The intensity of requests sent by each application Differences in request access patterns The ratio of reads to writes Garbage collection Our first contribution in this paper is that, [CLICK] we perform a comprehensive analysis of inter-application interference in modern SSDs, and we find four major sources of interferences, i.e., the difference in, [CLICK] the I/O intensity, [CLICK] access pattern, [CLICK] ratio of reads to writes, and [CLICK] garbage collection demands of the concurrent applications.

Our Second Contribution We propose the Flash-Level INterference-aware scheduler (FLIN) FLIN is a lightweight device-level I/O request scheduling mechanism that provides fairness among requests from different applications FLIN carefully reorders transactions within the SSD controller to balance the slowdowns incurred by concurrent applications Our second contribution is that, [CLICK] we propose a new device-level I/O scheduler, called FLIN, [CLICK] which provides fairness among requests from different applications. [CLICK] FLIN carefully reorders transactions within the SSD controller to balance the slowdowns incurred by concurrent applications.

Our Third Contribution We comprehensively evaluate FLIN using a wide variety of enterprise and datacenter storage workloads On average, 70% fairness and 47% performance improvement over a state-of-the-art device-level I/O request scheduler FLIN is implemented fully within the SSD firmware with a very modest DRAM overhead (< 0.06%) As our third contribution, [CLICK] we comprehensively evaluate FLIN using a wide variety of storage workloads. [CLICK] FLIN, on average, improves fairness by 70% and performance by 47% over a state-of-the-art device-level I/O request scheduler. [CLICK] FLIN is implemented fully within the SSD firmware with a very modest DRAM overhead, i.e., less than 0.06%.

FLIN: Enabling Fairness and Enhancing Performance in Modern NVMe Solid State Drives Arash Tavakkol, Mohammad Sadrosadati, Saugata Ghose, Jeremie S. Kim, Yixin Luo, Yaohua Wang, Nika Mansouri Ghiasi, Lois Orosa, Juan Gómez-Luna, Onur Mutlu Please come to my talk at ISCA 2018, if you would like to learn more or have any questions. Thank you very much.