RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Christos Kozyrakis, David Mazières, Aravind Narayanan,

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

MinCopysets: Derandomizing Replication in Cloud Storage
Fast Crash Recovery in RAMCloud
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2014.
RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Nandu Jayakumar, Diego Ongaro, Mendel Rosenblum,
RAMCloud Scalable High-Performance Storage Entirely in DRAM John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières,
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
RAMCloud 1.0 John Ousterhout Stanford University (with Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble, Ankita Kejriwal, and John Ousterhout Stanford University.
Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat David Andersen, Greg Ganger, Garth Gibson, Brian Mueller* Carnegie Mellon University, *Panasas.
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
SEDCL: Stanford Experimental Data Center Laboratory.
CS 140 Lecture Notes: Technology and Operating SystemsSlide 1 Technology Changes Mid-1980’s2009Change CPU speed15 MHz2 GHz133x Memory size8 MB4 GB500x.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
CS 142 Lecture Notes: Large-Scale Web ApplicationsSlide 1 RAMCloud Overview ● Storage for datacenters ● commodity servers ● GB DRAM/server.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
1 CS : Technology Trends Ion Stoica ( September 12, 2011.
Metrics for RAMCloud Recovery John Ousterhout Stanford University.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
RAMCloud Design Review Recovery Ryan Stutsman April 1,
A Compilation of RAMCloud Slides (through Oct. 2012) John Ousterhout Stanford University.
It’s Time for Low Latency Steve Rumble, Diego Ongaro, Ryan Stutsman, Mendel Rosenblum, John Ousterhout Stanford University 報告者 : 厲秉忠
Slide 1 Windows PC Accelerators Reporter :吳柏良. Slide 2 Outline l Introduction l Windows SuperFetch l Windows ReadyBoost l Windows ReadyDrive l Conclusion.
What We Have Learned From RAMCloud John Ousterhout Stanford University (with Asaf Cidon, Ankita Kejriwal, Diego Ongaro, Mendel Rosenblum, Stephen Rumble,
Networked File System CS Introduction to Operating Systems.
RAMCloud Overview John Ousterhout Stanford University.
RAMCloud: a Low-Latency Datacenter Storage System John Ousterhout Stanford University
RAMCloud: Concept and Challenges John Ousterhout Stanford University.
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Cool ideas from RAMCloud Diego Ongaro Stanford University Joint work with Asaf Cidon, Ankita Kejriwal, John Ousterhout, Mendel Rosenblum, Stephen Rumble,
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Forum January, 2015.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Low-Latency Datacenters John Ousterhout Platform Lab Retreat May 29, 2015.
Durability and Crash Recovery for Distributed In-Memory Storage Ryan Stutsman, Asaf Cidon, Ankita Kejriwal, Ali Mashtizadeh, Aravind Narayanan, Diego Ongaro,
RAMCloud: Low-latency DRAM-based storage Jonathan Ellithorpe, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro,
Enabling Large-Scale Storage in Sensor Networks with the Coffee File System ISPN 2009 Lawrence.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
CS 140 Lecture Notes: Technology and Operating Systems Slide 1 Technology Changes Mid-1980’s2012Change CPU speed15 MHz2.5 GHz167x Memory size8 MB4 GB500x.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
11.1Database System Concepts. 11.2Database System Concepts Now Something Different 1st part of the course: Application Oriented 2nd part of the course:
Experiences With RAMCloud Applications John Ousterhout, Jonathan Ellithorpe, Bob Brown.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
RAMCloud Overview and Status John Ousterhout Stanford University.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
Implementing Linearizability at Large Scale and Low Latency Collin Lee, Seo Jin Park, Ankita Kejriwal, † Satoshi Matsushita, John Ousterhout Platform Lab.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
)1()1( Presenter: Noam Presman Advanced Topics in Storage Systems – Semester B 2013 Authors: A.Cidon, R.Stutsman, S.Rumble, S.Katti,
Scalable Data Scale #2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries.
CSci8211: Distributed Systems: RAMCloud 1 Distributed Shared Memory/Storage Case Study: RAMCloud Developed by Stanford Platform Lab  Key Idea: Scalable.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2013.
Presented by: Xianghan Pei
CS 140 Lecture Notes: Technology and Operating Systems Slide 1 Technology Changes Mid-1980’s2012Change CPU speed15 MHz2.5 GHz167x Memory size8 MB4 GB500x.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
RAMCloud and the Low-Latency Datacenter John Ousterhout Stanford Platform Laboratory.
Lecture 17 Raid. Device Protocol Variants Status checks: polling vs. interrupts Data: PIO vs. DMA Control: special instructions vs. memory-mapped I/O.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble and John Ousterhout Stanford University.
CS 540 Database Management Systems
Memory COMPUTER ARCHITECTURE
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
CS 140 Lecture Notes: Technology and Operating Systems
CS 140 Lecture Notes: Technology and Operating Systems
Presentation transcript:

RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Christos Kozyrakis, David Mazières, Aravind Narayanan, Diego Ongaro, Mendel Rosenblum, Stephen Rumble, and Ryan Stutsman)

February 10, 2010RAMCloudSlide 2 Introduction ● New research project at Stanford ● Create large-scale storage systems entirely in DRAM ● Interesting combination: scale, low latency ● Enable new applications? ● The future of datacenter storage?

February 10, 2010RAMCloudSlide 3 Outline ● Overview of RAMCloud ● Motivation ● Research challenges

February 10, 2010RAMCloudSlide 4 RAMCloud Overview ● Storage for datacenters ● commodity servers ● 64 GB DRAM/server ● All data always in RAM ● Durable and available ● Performance goals:  High throughput: 1M ops/sec/server  Low-latency access: 5-10µs RPC Application Servers Storage Servers Datacenter

February 10, 2010RAMCloudSlide 5 Example Configurations Today5-10 years # servers1000 GB/server64GB1024GB Total capacity64TB1PB Total server cost$4M $/GB$60$4

February 10, 2010RAMCloudSlide 6 RAMCloud Motivation: Latency ● Large-scale apps struggle with high latency  Facebook: can only make internal requests per page UI App. Logic Data Structures Traditional Application UI Bus. Logic Application Servers Storage Servers Web Application << 1µs latency0.5-10ms latency Single machine Datacenter

February 10, 2010RAMCloudSlide 7 Computation Power Forms of Scalability Storage Capacity Data Access Rate

February 10, 2010RAMCloudSlide 8 Computation Power Forms of Scalability Storage Capacity Data Access Rate Not provided by today's application infrastructure

February 10, 2010RAMCloudSlide 9 MapReduce ● Sequential data access → high data access rate ● Not all applications fit this model Computation Data

February 10, 2010RAMCloudSlide 10 RAMCloud Motivation: Latency ● RAMCloud goal: large scale and low latency ● Enable a new breed of information-intensive applications UI App. Logic Data Structures Traditional Application UI Bus. Logic Application Servers Storage Servers Web Application << 1µs latency ms latency Single machine Datacenter 5-10µs

February 10, 2010RAMCloudSlide 11 RAMCloud Motivation: Scalability ● Relational databases don’t scale ● Every large-scale Web application has problems:  Facebook: 4000 MySQL instances memcached servers ● New forms of storage starting to appear:  Bigtable  Dynamo  PNUTS  Sinfonia  H-store  memcached

February 10, 2010RAMCloudSlide 12 RAMCloud Motivation: Technology Disk access rate not keeping up with capacity: ● Disks must become more archival ● More information must move to memory Mid-1980’s2009Change Disk capacity30 MB500 GB16667x Max. transfer rate2 MB/s100 MB/s50x Latency (seek & rotate)20 ms10 ms2x Capacity/bandwidth (large blocks) 15 s5000 s333x Capacity/bandwidth (1KB blocks) 600 s58 days8333x Jim Gray's rule5 min30 hrs360x

February 10, 2010RAMCloudSlide 13 Why Not a Caching Approach? ● Lost performance:  1% misses → 10x performance degradation ● Won’t save much money:  Already have to keep information in memory  Example: Facebook caches ~75% of data size ● Changes disk management issues:  Optimize for reads, vs. writes & recovery

February 10, 2010RAMCloudSlide 14 Why not Flash Memory? ● Many candidate technologies besides DRAM  Flash (NAND, NOR)  PC RAM  … ● DRAM enables lowest latency:  5-10x faster than flash ● Most RAMCloud techniques will apply to other technologies ● Ultimately, choose storage technology based on cost, performance, energy, not volatility

February 10, 2010RAMCloudSlide 15 Is RAMCloud Capacity Sufficient? ● Facebook: 200 TB of (non-image) data today ● Amazon: Revenues/year:$16B Orders/year:400M? ($40/order?) Bytes/order: ? Order data/year: TB? RAMCloud cost:$24-240K? ● United Airlines: Total flights/day:4000? (30,000 for all airlines in U.S.) Passenger flights/year:200M? Bytes/passenger-flight: ? Order data/year: TB? RAMCloud cost:$13-130K? ● Ready today for all online data; media soon

February 10, 2010RAMCloudSlide 16 RAMCloud Research Issues ● Data durability/availability ● Fast RPCs ● Data model, concurrency/consistency model ● Data distribution, scaling ● Automated management ● Multi-tenancy ● Client-server functional distribution ● Node architecture

February 10, 2010RAMCloudSlide 17 Data Durability/Availability ● Data must be durable when write RPC returns ● Unattractive approaches:  Replicate in other memories (too expensive)  Synchronous disk write ( x too slow) ● One possibility: buffered logging DRAM disk DRAM disk Storage Servers write log async, batch DRAM disk log

February 10, 2010RAMCloudSlide 18 Buffered Logging,cont’d ● Potential problem: crash recovery  If master crashes, data unavailable until read from disks on backups  Read 64 GB from one disk? 10 minutes  Our goal: recover in 1-2 seconds ● Solution: take advantage of system scale  Shard backup data across many servers  Recover in parallel

February 10, 2010RAMCloudSlide 19 Recovery, First Try ● Scatter log segments randomly across all servers ● After crash, all backups read disks in parallel (64 GB/ MB/sec = 0.6 sec) ● Collect all backup data on replacement master (64 GB/10GB/sec ~ 60 sec: too slow!) Backups... Replacement Master

February 10, 2010RAMCloudSlide 20 Recovery, Second Try ● Phase #1:  Read all segments into memories of backups  Send only location info to replacement master  Elapsed time depends on # objects ● Phase #2:  System resumes operation: ● Replacement master fetches objects on demand from backups (2x performance penalty for reads) ● Writes handled on replacement master  Meanwhile, transfer segments from backups to replacement master  Elapsed time: ~1 minute ● Performance returns to normal after Phase #2

February 10, 2010RAMCloudSlide 21 Low-Latency RPCs Achieving 5-10µs will impact every layer of the system: ● Must reduce network latency:  Typical today: µs (10-30 µs/switch, 5 switches each way)  Arista: 0.9 µs/switch: 9 µs roundtrip  Need cut-through routing, congestion mgmt ● Tailor OS on server side:  Dedicated cores  No interrupts?  No virtual memory? Switch ClientServer Switch

February 10, 2010RAMCloudSlide 22 Low-Latency RPCs, cont’d ● Network protocol stack  TCP too slow (especially with packet loss)  Must avoid copies ● Client side: need efficient path through VM  User-level access to network interface? ● Preliminary experiments:  µs roundtrip  Direct connection: no switches  Biggest current problem: NICs (3-5 µs each way)

February 10, 2010RAMCloudSlide 23 Interesting Facets ● Use each system property to improve the others ● High server throughput:  No replication for performance, only durability?  Simplifies consistency issues ● Low-latency RPC:  Cheap to reflect writes to backup servers  Stronger consistency? ● 1000’s of servers:  Sharded backups for fast recovery

February 10, 2010RAMCloudSlide 24 Project Plan ● Build production-quality RAMCloud implementation ● Status: 0%100% Design Implementation

February 10, 2010RAMCloudSlide 25 Conclusion ● Interesting combination of scale and latency ● Enable more powerful uses of information at scale:  clients  100TB - 1PB  5-10 µs latency ● Many interesting research issues

February 10, 2010RAMCloudSlide 26 Questions/Comments