Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious.

Slides:



Advertisements
Similar presentations
Gecko Storage System Tudor Marian, Lakshmi Ganesh, and Hakim Weatherspoon Cornell University.
Advertisements

A New Cache Management Approach for Transaction Processing on Flash-based Database Da Zhou
Differentiated I/O services in virtualized environments
Snapshots in a Flash with ioSnap TM Sriram Subramanian, Swami Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau Copyright © 2014.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
FlashVM: Virtual Memory Management on Flash Mohit Saxena and Michael M. Swift Introduction Flash storage is the largest change to memory and storage systems.
International Conference on Supercomputing June 12, 2009
Smoke and Mirrors: Shadowing Files at a Geographically Remote Location Without Loss of Performance August 2008 Hakim Weatherspoon, Lakshmi Ganesh, Tudor.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
Cse Feb-001 CSE 451 Section February 24, 2000 Project 3 – VM.
Everest: scaling down peak loads through I/O off-loading D. Narayanan, A. Donnelly, E. Thereska, S. Elnikety, A. Rowstron Microsoft Research Cambridge,
Performance Acceleration Module Tech ONTAP Live August 6 th, 2008.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM J. Wilkes, R. Golding, C. Staelin T. Sullivan HP Laboratories, Palo Alto, CA.
Comparing Coordinated Garbage Collection Algorithms for Arrays of Solid-state Drives Junghee Lee, Youngjae Kim, Sarp Oral, Galen M. Shipman, David A. Dillow,
Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
DISKS IS421. DISK  A disk consists of Read/write head, and arm  A platter is divided into Tracks and sector  The R/W heads can R/W at the same time.
Data Management for Decision Support Session-5 Prof. Bharat Bhasker.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
Nexenta Proprietary Global Leader in Software Defined Storage Nexenta Technical Sales Professional (NTSP) COURSE CONTENT.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
Processes and OS basics. RHS – SOC 2 OS Basics An Operating System (OS) is essentially an abstraction of a computer As a user or programmer, I do not.
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), Lakshmi Ganesh (UT Austin), and Hakim Weatherspoon.
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
Quantifying and Improving I/O Predictability in Virtualized Systems Cheng Li, Inigo Goiri, Abhishek Bhattacharjee, Ricardo Bianchini, Thu D. Nguyen 1.
Storage Management - Chap 10 MANAGING A STORAGE HIERARCHY on-chip --> main memory --> 750ps - 8ns ns. 128kb - 16mb 2gb -1 tb. RATIO 1 10 hard disk.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Chapter 8 External Storage. Primary vs. Secondary Storage Primary storage: Main memory (RAM) Secondary Storage: Peripheral devices  Disk drives  Tape.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
© 2011 IBM Corporation Sizing Guidelines Jana Jamsek ATS Europe.
RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups Chun-Ho Ng, Patrick P. C. Lee The Chinese University of Hong Kong.
A Semi-Preemptive Garbage Collector for Solid State Drives
Jérôme Jaussaud, Senior Product Manager
Programmer’s View of Files Logical view of files: –An a array of bytes. –A file pointer marks the current position. Three fundamental operations: –Read.
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
Copyright © 2010 Hitachi Data Systems. All rights reserved. Confidential – NDA Strictly Required Hitachi Storage Solutions Hitachi HDD Directions HDD Actual.
 The emerged flash-memory based solid state drives (SSDs) have rapidly replaced the traditional hard disk drives (HDDs) in many applications.  Characteristics.
CPSC 426: Building Decentralized Systems Persistence
CS422 Principles of Database Systems Disk Access Chengyu Sun California State University, Los Angeles.
Lecture 17 Raid. Device Protocol Variants Status checks: polling vs. interrupts Data: PIO vs. DMA Control: special instructions vs. memory-mapped I/O.
Elastic Parity Logging for SSD RAID Arrays Yongkun Li*, Helen Chan #, Patrick P. C. Lee #, Yinlong Xu* *University of Science and Technology of China #
E2800 Marco Deveronico All Flash or Hybrid system
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
Solid State Disks Testing with PROOF
Database Management Systems (CS 564)
The (Solid) State Of Drive Technology
HPE Persistent Memory Microsoft Ignite 2017
Oracle SQL*Loader
An Adaptive Data Separation Aware FTL for Improving the Garbage Collection Efficiency of Solid State Drives Wei Xie and Yong Chen Texas Tech University.
Operating Systems ECE344 Lecture 11: SSD Ding Yuan
Repairing Write Performance on Flash Devices
HashKV: Enabling Efficient Updates in KV Storage via Hashing
Lecture 9: Data Storage and IO Models
reFresh SSDs: Enabling High Endurance, Low Cost Flash in Datacenters
Overview Continuation from Monday (File system implementation)
Parallel Garbage Collection in Solid State Drives (SSDs)
Chapter 11: Mass-Storage Systems
CS 295: Modern Systems Storage Technologies Introduction
[Altinbuke, Walsh, Weatherspoon, Bala, Bracy, McKee, and Sirer]
Dong Hyun Kang, Changwoo Min, Young Ik Eom
Presentation transcript:

Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) Gecko: Contention-Oblivious Disk Arrays for Cloud Storage FAST 2013

What happens to storage? Cloud and Virtualization VMM Guest VM Guest VM Guest VM Guest VM … Shared Disk Shared Disk Shared Disk SEQUENTIAL RANDOM 2 I/O CONTENTION causes throughput collapse 4-Disk RAID0

Existing Solutions for I/O Contention? I/O scheduling: reordering I/Os – Entails increased latency for certain workloads – May still require seeking Workload placement: positioning workloads to minimize contention – Requires prior knowledge or dynamic prediction – Predictions may be inaccurate – Limits freedom of placing VMs in the cloud 3

– Log all writes to tail Log-structured File System to the Rescue? [Rosenblum et al. 91] Addr … … N … Write Log Tail Shared Disk 4

Garbage collection is the Achilles’ Heel of LFS [Seltzer et al. 93, 95; Matthews et al. 97] … Shared Disk Challenges of Log-Structured File System … N … Log Tail Shared Disk Shared Disk … GC Garbage Collection (GC) from Log Head 5 Read GC and read still cause disk seek Log Head

Challenges of Log-Structured File System Garbage collection is the Achilles’ Heel of LFS – 2-disk RAID-0 setting of LFS – GC under write-only synthetic workload RAID 0 + LFS 6 Max Sequential Throughput Throughput falls by 10X during GC

Problem: Increased virtualization leads to increased disk seeks and kills performance RAID and LFS do not solve the problem 7

Rest of the Talk Motivation Gecko: contention-oblivious disk storage – Sources of I/O contention – New technique: Chained logging – Implementation Evaluation Conclusion 8

What Causes Disk Seeks? Write-write Read-read Write-read 9 VM 1VM2 Write Read Write Read

What Causes Disk Seeks? Write-write Read-read Write-read Logging – Write-GC read – Read-GC read 10 VM 1VM2 Write Read GC

Principle: A single sequentially accessed disk is better than multiple randomly seeking disks 11

Gecko’s Chained Logging Design Separating the log tail from the body – GC reads do not interrupt the sequential write – 1 uncontended drive >>faster>> N contended drives Disk 2Disk 1Disk 0 Log Tail Physical Addr Space GC Garbage collection from Log Head to Tail Disk 0Disk 1Disk 2 12 Eliminates write-write and reduces write-read contention Eliminates write-GC read contention

Gecko’s Chained Logging Design Smarter “Compact-In-Body” Garbage Collection Disk 1Disk 0 Physical Addr Space GC 13 Garbage collection from Head to Body Log Tail Disk 2 Garbage collection from Log Head to Tail

Gecko Caching What happens to reads going to tail drives? 14 LRU Cache Reduces write-read contention Disk 1Disk 0Disk 2 Write Tail Cache Read Body Cache (Flash ) Reduces read-read contention RAM Hot data RAM Hot data MLC SSD Warm data MLC SSD Warm data

Gecko Properties Summary No write-write contention, No GC-write contention, and Reduced read-write contention 15 Disk 1Disk 0Disk 2 Write Tail Cache Read Body Cache (Flash ) Disk 1’Disk 0’Disk 2’ Fault tolerance + Read performance Mirroring/ Striping Disk 1’Disk 0’ Power saving w/o consistency concerns Reduced read-write and read-read contention

Gecko Implementation Primary map: less than 8 GB RAM for a 8 TB storage Inverse map: 8 GB flash for a 8 TB storage (written every 1024 writes) 4KB pages emptyfilled Data (in disk) Physical-to-logical map (in flash) head 4-byte entries Logical-to-physical map (in memory) 16 R R W W W W tail Disk 0Disk 1Disk 2 WW

Evaluation 1.How well does Gecko handle GC? 2.Performance of Gecko under real workloads? 3.Effect of varying Gecko chain length? 4.Effectiveness of the tail cache? 5.Durability of the flash based tail cache? 17

Evaluation Setup In-kernel version – Implemented as block device for portability – Similar to software RAID Hardware – WD 600GB HDD Used 512GB of 600GB 2.5” 10K RPM SATA-600 – Intel MLC (multi level cell) SSD 240GB SATA-600 User-level emulator – For fast prototyping – Runs block traces – Tail cache support 18

How Well Does Gecko Handle GC? Gecko 19 Log + RAID0 Gecko’s aggregate throughput always remains high 3X higher aggregate & 4X higher application throughput 2-disk setting; write-only synthetic workload Max Sequential Throughput of 1 disk Max Sequential Throughput of 2 disks Aggregate throughput = Max throughput Throughput collapses

How Well Does Gecko Handle GC? 20 App throughput can be preserved using smarter GC Gecko

MS Enterprise and MSR Cambridge Traces Trace Name Estimated Addr Space Total Data Accessed (GB) Total Data Read (GB) Total Data Written (GB)TotalIOReqNumReadReqNumWriteReq prxy1362,0761, ,157,932110,797,98470,359,948 src18203,1072, ,069,60865,172,64519,896,963 proj4,1022,2791, ,841,03155,809,86910,031,162 Exchange4, ,179,16526,324,16334,855,002 usr2,4612,6252, ,091,91550,906,1837,185,732 LiveMapsBE6,7372,3441, ,766,48435,310,4209,456,064 MSNFS1, ,345,08519,729,6119,615,474 DevDivRelease4, ,195,70112,326,4325,869,269 prn ,819,2979,066,2817,753, [Narayanan et al. 08, 09]

What is the Performance of Gecko under Real Workloads? Gecko – Mirrored chain of length 3 – Tail cache (2GB RAM + 32GB SSD) – Body Cache (32GB SSD) Log + RAID10 – Mirrored, Log + 3 disk RAID-0 – LRU cache (2GB RAM + 64GB SSD) 22 Gecko showed less read-write contention and higher cache hit rate Gecko’s throughput is 2X-3X higher Gecko Log + RAID10 Mix of 8 workloads: prn, MSNFS, DevDivRelease, proj, Exchange, LiveMapsBE, prxy, and src1 6 Disk configuration with 200GB of data prefilled

What is the Effect of Varying Gecko Chain Length? Same 8 workloads with 200GB data prefilled Single uncontended disk Separating reads and writes 23 better performance RAID 0 errorbar = stdev

How Effective Is the Tail Cache? Read hit rate of tail cache (2GB RAM+32GB SSD) on 512GB disk 21 combinations of 4 to 8 MSR Cambridge and MS Enterprise traces 24 Tail cache can effectively resolve read-write contention At least 86% of read hit rate RAM handles most of hot data Amount of data changes hit rate Still average 80+ % hit rate

How Durable is Flash Based Tail Cache? Static analysis of lifetime based on cache hit rate Use of 2GB RAM extends SSD lifetime 25 2X-8X Lifetime Extension Lifetime at 40MB/s

Conclusion Gecko enables fast storage in the cloud – Scales with increasing virtualization and number of cores – Oblivious to I/O contention Gecko’s technical contribution – Separates log tail from its body – Separates reads and writes – Tail cache absorbs reads going to tail A single sequentially accessed disk is better than multiple randomly seeking disks 26

Question? 27