Assumptions Hypothesis Hopes RAMCloud Mendel Rosenblum Stanford University.

Slides:



Advertisements
Similar presentations
1 Perspectives from Operating a Large Scale Website Dennis Lee VP Technical Operations, Marchex.
Advertisements

MinCopysets: Derandomizing Replication in Cloud Storage
Fast Crash Recovery in RAMCloud
High throughput chain replication for read-mostly workloads
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Nandu Jayakumar, Diego Ongaro, Mendel Rosenblum,
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
RAMCloud 1.0 John Ousterhout Stanford University (with Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble, Ankita Kejriwal, and John Ousterhout Stanford University.
Distributed components
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
SEDCL: Stanford Experimental Data Center Laboratory.
SERVER LOAD BALANCING Presented By : Priya Palanivelu.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
CS 142 Lecture Notes: Large-Scale Web ApplicationsSlide 1 RAMCloud Overview ● Storage for datacenters ● commodity servers ● GB DRAM/server.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
RAMCloud Design Review Recovery Ryan Stutsman April 1,
It’s Time for Low Latency Steve Rumble, Diego Ongaro, Ryan Stutsman, Mendel Rosenblum, John Ousterhout Stanford University 報告者 : 厲秉忠
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
What We Have Learned From RAMCloud John Ousterhout Stanford University (with Asaf Cidon, Ankita Kejriwal, Diego Ongaro, Mendel Rosenblum, Stephen Rumble,
PMIT-6102 Advanced Database Systems
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
1 Perspectives from Operating a Large Scale Website Dennis Lee.
RAMCloud: a Low-Latency Datacenter Storage System John Ousterhout Stanford University
RAMCloud: Concept and Challenges John Ousterhout Stanford University.
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
IT Infrastructure Chap 1: Definition
Cool ideas from RAMCloud Diego Ongaro Stanford University Joint work with Asaf Cidon, Ankita Kejriwal, John Ousterhout, Mendel Rosenblum, Stephen Rumble,
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Forum January, 2015.
IMDGs An essential part of your architecture. About me
Replication March 16, Replication What is Replication?  A technique for increasing availability, fault tolerance and sometimes, performance 
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Durability and Crash Recovery for Distributed In-Memory Storage Ryan Stutsman, Asaf Cidon, Ankita Kejriwal, Ali Mashtizadeh, Aravind Narayanan, Diego Ongaro,
RAMCloud: Low-latency DRAM-based storage Jonathan Ellithorpe, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro,
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
CS 140 Lecture Notes: Technology and Operating Systems Slide 1 Technology Changes Mid-1980’s2012Change CPU speed15 MHz2.5 GHz167x Memory size8 MB4 GB500x.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
RAMCloud Overview and Status John Ousterhout Stanford University.
Caching Consistency and Concurrency Control Contact: Dingshan He
)1()1( Presenter: Noam Presman Advanced Topics in Storage Systems – Semester B 2013 Authors: A.Cidon, R.Stutsman, S.Rumble, S.Katti,
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Maintain Business Performance EFFICENT, ELASTIC and SIMPLE Infrastructure Sharon Azulai, Plexistor CEO | Jan 20, 2016.
CSci8211: Distributed Systems: RAMCloud 1 Distributed Shared Memory/Storage Case Study: RAMCloud Developed by Stanford Platform Lab  Key Idea: Scalable.
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2013.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
RAMCloud and the Low-Latency Datacenter John Ousterhout Stanford Platform Laboratory.
Cisco Defense Orchestrator
Tom Van Steenkiste Supervisor: Predrag Buncic
Marshfield Area Technical Council
Slicer: Auto-Sharding for Datacenter Applications
The Case for a Session State Storage Layer
Ralleo Enterprise-Grade Solution for Managing Change and Business Transformation Provides Opportunities to Better Analyze Real-Time Data MICROSOFT AZURE.
CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.
Dedicated Hosting Servers In The US With Faster Bandwidth Connectivity Fast bandwidth connectivity for dedicated hosting servers is vital for any enterprise.
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
The Tail At Scale Dean and Barroso, CACM 2013, Pages 74-80
Prophecy: Using History for High-Throughput Fault Tolerance
In-network computation
Presentation transcript:

Assumptions Hypothesis Hopes RAMCloud Mendel Rosenblum Stanford University

April 1, 2010RAMCloud AssumptionsSlide 2 Talk Agenda ● Present assumptions we are making in RAMCloud  Infrastructure for data centers 5 years in future ● Hardware and Workload assumptions ● Elicit feedback  Unconscious assumptions  Hopelessly wrong assumptions ● Disclaimer: Goal #1 of RAMCloud is research  Research agenda is pushing latency and scale  Significant risk is OK  Version 1.0 design decisions (OK to leave stuff out, oversimplify)

April 1, 2010RAMCloud AssumptionsSlide 3 Network Assumptions ● Assumption: Network latency going down  Microsecond latencies require low latency switches & NICs ● Encouraging: Arista: 0.6 μs/switch ● Lots of discouraging  Chicken and egg problem ● Assumption: No significant oversubscription  Scale-based solutions require high bisection bandwidth ● Example: Recovery ● Actively engaging networking researchers and industry partners Vote: Are we crazy?

April 1, 2010RAMCloud AssumptionsSlide 4 Workload Assumptions ● Less than 2 second recovery time is fast enough ● Applications will tolerate 2 second hiccups Vote: Will enough important applications tolerate 2 second pauses when failures happen?

April 1, 2010RAMCloud AssumptionsSlide 5 Server Assumptions ● Assumption: Simultaneous failures of all servers unlikely (i.e. all DRAMs can’t lose contents)  Require Uninterruptible power supply  Example: Google’s battery backup per machine ● Need only enough to transfer write buffers to non-volatile storage ● Want to avoid non-volatile latency in write request path Vote: What do you think?

April 1, 2010RAMCloud AssumptionsSlide 6 Assumption: Latency is goodness ● High latency is a major application inhibitor  Increases complexity, decreases innovation  RAMCloud will enable new applications ● Low latency is underappreciated  Users won’t ask for it but will love it Vote: Are we right?

April 1, 2010RAMCloud AssumptionsSlide 7 Low Latency: Stronger Consistency? ● Might be able to help with scaling challenges  Example: Durability, update multiple in-memory replicas ● Cost of consistency rises with transaction overlap: O ~ R*D O = # overlapping transactions R = arrival rate of new transactions D = duration of each transaction ● R increases with system scale  Eventually, scale makes consistency unaffordable ● But, D decreases with lower latency  Stronger consistency affordable at larger scale  Is this phenomenon strong enough to matter?

April 1, 2010RAMCloud AssumptionsSlide 8 Locality Assumptions ● Locality is getting harder to find and exploit  Example: Facebook usage ● Flat model rather than tiered storage Vote: Right?

April 1, 2010RAMCloud AssumptionsSlide 9 Read/Write Ratio Assumption ● Read request dominate writes ● Postulate: Fast reads will cause applications to use more information leading to more reads than writes Vote: What do you think?

April 1, 2010RAMCloud AssumptionsSlide 10 Replication Assumptions ● Replication isn’t needed for performance ● High throughput and load balancing eliminates the need for object replication ● Few applications need more than1M ops/second to a single object Vote: Right?

April 1, 2010RAMCloud AssumptionsSlide 11 WAN Replication Assumption ● There are interesting applications that don’t require cross data center replication  Speed of light not our friend

April 1, 2010RAMCloud AssumptionsSlide 12 Transaction Assumption ● Multi-object update support is required  Distributed applications need support for concurrent accesses

April 1, 2010RAMCloud AssumptionsSlide 13 Security assumptions ● Threat model  RAMCloud servers and network physically secure ● Don’t need encryption on RPCs, DoS defense, etc.

April 1, 2010RAMCloud AssumptionsSlide 14 Data Model Assumption ● Simple table/object with indexing sufficient  Dictated by speed  SQL/relational model can be built on top

April 1, 2010RAMCloud AssumptionsSlide 15 Thank you