Memory and Object Management in a Distributed RAM-based Storage System Thesis Proposal Steve Rumble April 23 rd, 2012.

Slides:



Advertisements
Similar presentations
More on File Management
Advertisements

Part IV: Memory Management
The google file system Cs 595 Lecture 9.
Mendel Rosenblum and John K. Ousterhout Presented by Travis Bale 1.
The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan Hewlett-Packard Laboratories Presented by Sri.
11-May-15CSE 542: Operating Systems1 File system trace papers The Zebra striped network file system. Hartman, J. H. and Ousterhout, J. K. SOSP '93. (ACM.
RAMCloud 1.0 John Ousterhout Stanford University (with Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble, Ankita Kejriwal, and John Ousterhout Stanford University.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
The design and implementation of a log-structured file system The design and implementation of a log-structured file system M. Rosenblum and J.K. Ousterhout.
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
The Google File System.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
The Design and Implementation of a Log-Structured File System Presented by Carl Yao.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Memory and Object Management in RAMCloud Steve Rumble December 3 rd, 2013.
RAMCloud Design Review Recovery Ryan Stutsman April 1,
1 The Google File System Reporter: You-Wei Zhang.
RAMCloud: a Low-Latency Datacenter Storage System John Ousterhout Stanford University
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
Some key-value stores using log-structure Zhichao Liang LevelDB Riak.
Cool ideas from RAMCloud Diego Ongaro Stanford University Joint work with Asaf Cidon, Ankita Kejriwal, John Ousterhout, Mendel Rosenblum, Stephen Rumble,
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
Durability and Crash Recovery for Distributed In-Memory Storage Ryan Stutsman, Asaf Cidon, Ankita Kejriwal, Ali Mashtizadeh, Aravind Narayanan, Diego Ongaro,
RAMCloud: Low-latency DRAM-based storage Jonathan Ellithorpe, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro,
Log-structured File System Sriram Govindan
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
26-Oct-15CSE 542: Operating Systems1 File system trace papers The Design and Implementation of a Log- Structured File System. M. Rosenblum, and J.K. Ousterhout.
1 File Management Chapter File Management n File management system consists of system utility programs that run as privileged applications n Concerned.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
Serverless Network File Systems Overview by Joseph Thompson.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Log-Structured File Systems
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Embedded System Lab. 서동화 The Design and Implementation of a Log-Structured File System - Mendel Rosenblum and John K. Ousterhout.
CS333 Intro to Operating Systems Jonathan Walpole.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Local Filesystems (part 1) CPS210 Spring Papers  The Design and Implementation of a Log- Structured File System  Mendel Rosenblum  File System.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】
Embedded System Lab. 정영진 The Design and Implementation of a Log-Structured File System Mendel Rosenblum and John K. Ousterhout ACM Transactions.
CSci8211: Distributed Systems: RAMCloud 1 Distributed Shared Memory/Storage Case Study: RAMCloud Developed by Stanford Platform Lab  Key Idea: Scalable.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2013.
W4118 Operating Systems Instructor: Junfeng Yang.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
CPSC 426: Building Decentralized Systems Persistence
Log-Structured Memory for DRAM-Based Storage Stephen Rumble and John Ousterhout Stanford University.
File System Consistency
Jonathan Walpole Computer Science Portland State University
FileSystems.
Google Filesystem Some slides taken from Alan Sussman.
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
HashKV: Enabling Efficient Updates in KV Storage via Hashing
Filesystems 2 Adapted from slides of Hank Levy
Overview Continuation from Monday (File system implementation)
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
The Design and Implementation of a Log-Structured File System
Presentation transcript:

Memory and Object Management in a Distributed RAM-based Storage System Thesis Proposal Steve Rumble April 23 rd, 2012

RAMCloud Introduction General-purpose datacenter storage system All data in DRAM at all times Pushing two boundaries: –Low Latency: 5 – 10µs roundtrip (small reads) –Large Scale: To 10,000 servers, ~1PB total memory Goal: –Enable novel applications with 100 – 1,000x increase in sequential storage ops/sec Problem: –How to store data while getting high performance, high memory utilization, and durability? 2

Thesis Structuring memory as a log and using parallel and two-level cleaning enables high-performance memory allocation without sacrificing utilization or durability. 3

Contributions Log-structured memory –High performance in memory with durability on disk Parallel cleaning –Fast memory allocation, overheads off critical path Two-level cleaning –Optimizing the utilization/write-cost trade-off Tombstones –Delete consistency in the face of recoveries Tablet Migration –Rebalancing and cluster-wide data management 4

Outline RAMCloud Background Contributions –Log-structured memory –Parallel Cleaning –Two-level Cleaning –Tombstones –Tablet Migration Conclusion –Status –Future Work –Summary 5

Outline  RAMCloud Background Contributions –Log-structured memory –Parallel Cleaning –Two-level Cleaning –Tombstones –Tablet Migration Conclusion –Status –Future Work –Summary 6

Master Backup Master Backup … App Library App Library App Library App Library … Datacenter Network Coordinator Up to 10,000 Storage Servers RAMCloud Architecture 7 Up to 100,000 Application Servers Master Backup Master Backup

Distributed Key-Value Store Data model: key-value –Keys scoped into tables –Tables may span multiple servers (“tablets”) –Addressing: 8 Master Server 12 Hash TableObject RAM Tablet Map Client Read(5, “foo”) Hash TableRangeServer 50 – – 10, TableKey 3“bar” 5“foo”...

Outline RAMCloud Background  Contributions –Log-structured memory –Parallel Cleaning –Two-level Cleaning –Tombstones –Tablet Migration Conclusion –Status –Future Work –Summary 9

Log-structured Memory Log-structure: high disk write bandwidth on backups –Sequential I/O amortizes seek & rotational latency –Append only: Objects written to end of log (the head) –Fast allocation: Increment pointer 10 Master Hash Table Object RAM: Log-structured Head Log replicated on remote backup disks

Benefits of Segments Log divided into fixed-sized segments –More efficient garbage collection (cleaning) –High write bandwidth (striped across backups) –High read bandwidth for recovery 11 Master Hash Table Log Backup

Object Deletion Problems with deleting & updating objects: 1.Fragmentation: Reclaiming dead space for new writes  Solution: Cleaning 2.Consistency: Skipping dead objects during log replay  Solution: Tombstones Recovery Master Alive? Replay segment:

Log Cleaning Problem: Deletes & updates create fragmentation Cleaning used to reclaim this space Procedure: –Select segments (LFS cost-benefit) –Write live data to head of log Hash table updated to point to new location –Free cleaned segments 13

Why Cleaning? Alternative: Snapshotting –Mark current head position –Write live contents to head –Reclaim old space - log begins at snapshot position ХProblem: Expensive –Always copies entire contents of log Cleaning can skip segments with low fragmentation 14 Snapshot Wrap

Minimizing Write Latency Problem: Cleaning contends with regular writes –Recall our low latency goal –In steady state must constantly clean –But interference from cleaning threatens write latency Solutions: –Use the cores: Run cleaner in parallel –Minimize contention: Don’t clean to head of log 15 Head Segment Client WriteCleaner Write

Parallel Cleaning Cleaner thread writes to segments outside of log Cleaned and survivor segments atomically swapped out of / into log when next head allocated –Each log head enumerates all segments in the log 16 Survivor Segment Previous Head New Head Cleaner relocates live data New writes proceed in parallel with cleaner

Parallelism Isn’t Sufficient Parallelism can hide some performance impact However, cleaning still contends for –Network, disk, and memory bandwidth –Opportunities for contention in other parts of system Questions: –How expensive do we think cleaning will be? –If not cheap enough, how might we do better? 17

Efficiency depends on utilization of selected segments –The lower the utilization, the cheaper it is To get one segment’s worth of space back, clean: –1 segment at 0%, 2 segments at 50%, 4 at 75%, … –In general, clean and write segments Cleaning Efficiency 18 µ = 0% µ = 50% Freed

Write Cost “Write cost”: Avg number of times each byte is copied –Depends on utilization of segments cleaned –1.0 is optimal Cleaning always encounters empty segments –Same as “write amplification” in SSDs LFS showed how to optimize cleaning for write cost –Cost-benefit selection, hot/cold segregation 19

LFS Approach is Too Expensive 20 Conjecture: DRAM expense compels running at higher utilization Rosenblum et al, 1991

Utilization/Efficiency Dilemma Problem: Disk and memory layouts coupled –Cleaning in memory requires cleaning on disk Forces an unpleasant choice –Low memory utilization & cheap cleaning, or –High memory utilization & expensive cleaning Can we get the best of both worlds? –High utilization of precious memory –Low cleaning overhead Idea: What if we decouple disk and memory? 21

Two-level Cleaning Compact segments in memory without going to disk –Copy live data to front, use MMU to free and reuse tail More dead objects on disk segments than in RAM  Lower disk utilization  Lower write cost (cheaper to clean) Result: –Optimizes memory utilization Use copious RAM bandwidth to aggressively reclaim space –Optimizes disk write cost 2x data on disk, 50% max disk util., 2.0 max LFS write cost 22

Two-level Cleaning Ramifications More space used on backups: –More data to read during recovery More resources needed to recover in same time Or slower recovery times –More expense in disk hardware Cost per GB in hard drives probably too low to be an issue Must sometimes clean segments on disk to before cleaning segments in memory –Dependent log entries: not freed until another entry is purged from log 23

Tombstones: Telling When an Object was Deleted Problem: Must skip deleted objects during recovery –Objects could otherwise resurrect after failure Master’s hash table dictates which objects are alive –But the hash table is not made durable –Unlike filesystems, no persistent indexing structure Solution: Tombstones –Metadata appended to log whenever object is deleted or overwritten 24

Tombstone Issues Two main issues: –Use space to free space –Garbage collection is tricky But we have not found a reasonable alternative –Tombstones are a thorn in our side –To keep data from being replayed must either: Destroy it (overwrite) Have some data structure that precludes it (e.g. index) –Cannot afford synchronous overwrites –No indexing in RAMCloud 25

LFS Comparison Similarities –General structure, nomenclature (log, segments, cleaning) –Cost-benefit for segment selection Differences –Log is memory-based Distributed for durability Two-level cleaning –No disk read on cleaning (lower write cost) –No fixed block size Reordering can create fragmentation –Per-object ages for cost-benefit calculation Rather than per-segment –Filesystem vs. key-value store Need to support very small objects (~100 bytes) efficiently No tombstones in LFS, but more and different metadata 26

Cluster Memory Management Managing memory across servers Need policies: –When to move data between machines Server memory utilization too high Server request load too high (hot data) Need mechanisms: –How to move data between machines Efficiently Failure-tolerantly Consistently 27

Mechanism: Tablet Migration Tablets as basis for data movement Log-structure makes migration easy Tricky issues –Migrating tablets away and back, merging adjacent tablets 28 Source Master Tablets Tablets Destination Master Coordinator Split Tablet request Migrate request (Scan log, move data) Transfer ownership Split Tablet! Migrate Tablet!

Outline RAMCloud Background Contributions –Log-structured memory –Parallel Cleaning –Two-level Cleaning –Tombstones –Tablet Migration  Conclusion Status Future Work Summary 29

Current Status “First draft” log and cleaner since 2010/2011 –On-disk cleaning only –Parallel cleaning with cost-benefit selection –Very little performance measurement Two-level prototype cleaner off of main branch Prototype tablet migration mechanism partially implemented –Full tables can be migrated, no splitting/joining, no failure tolerance 30

Timeline for Future Work Goal: Graduation in 12 – 18 months 2012 –Integrate revised two-level cleaner –Measure performance, iterate on design, write up –Complete tablet migration mechanism –Explore cluster-wide data management policies When to migrate, what to tablet move, where to move it to, etc. Interaction with cleaning 2013 –Wrap-up, dissertation writing 31

Summary: Thesis Contributions Managing memory for high performance, high utilization, and durability via: Log-structured memory Parallel cleaning Two-level cleaning Tombstones Tablet Migration 32