Download presentation
Presentation is loading. Please wait.
Published byLandon Rench Modified over 9 years ago
1
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2014
2
General-purpose storage system for large-scale applications: ● All data is stored in DRAM at all times ● As durable and available as disk ● Simple key-value data model ● Large scale: 1000+ servers, 100+ TB ● Low latency: 5-10 µs remote access time Project goal: enable a new class of data-intensive applications June 6, 2013RAMCloud Update and OverviewSlide 2 What is RAMCloud?
3
June 6, 2013RAMCloud Update and OverviewSlide 3 RAMCloud Architecture Master Backup Master Backup Master Backup Master Backup … Appl. Library Appl. Library Appl. Library Appl. Library … Datacenter Network Coordinator 1000 – 10,000 Storage Servers 1000 – 100,000 Application Servers Commodity Servers 64-256 GB per server High-speed networking: ● 5 µs round-trip ● Full bisection bwidth
4
Data Model: Key-Value Store ● Basic operations: read(tableId, key) => blob, version write(tableId, key, blob) => version delete(tableId, key) ● Other operations: cwrite(tableId, key, blob, version) => version Enumerate objects in table Efficient multi-read, multi-write Atomic increment ● Not in RAMCloud 1.0: Atomic updates of multiple objects Secondary indexes May 13, 2014RAMCloud/VMware RADIOSlide 4 Tables (Only overwrite if version matches) Key (≤ 64KB) Version (64b) Blob (≤ 1MB) Object
5
● Close to 1.0 release: Core system becoming stable Coordinator not yet fault-tolerant ● Original students working on dissertations ● New students staring to think about new projects June 6, 2013RAMCloud Update and OverviewSlide 5 Status at June 2013 Retreat
6
● RAMCloud 1.0, January 2014: Key-value store Low-latency RPC system (4.9 µs reads, 15.3 µs durable writes) Log-structured storage management 1-2 second recovery from storage server crashes Coordinator crash recovery ● New projects (see below) ● Application experiments/interest: Graph processing: Jonathan Ellithorpe ONOS (operating system for software-defined networks) Open Networking Laboratory Various projects/experiments at Huawei High-energy physics(CERN): Jakob Blomer visiting for summer Port to NEC Atom cluster: Satoshi Matsushita June 6, 2013RAMCloud Update and OverviewSlide 6 Progress Since June 2013
7
● PhD dissertations: Ryan Stutsman: “Durability and Crash Recovery in Distributed In-memory Storage Systems” Now at Microsoft Research Steve Rumble: “Memory and Object Management in RAMCloud” Now at Google Zurich Diego Ongaro: “Consensus: Bridging Theory and Practice” ETA summer 2014 ● Papers published: “Log-Structured Memory for DRAM-Based Storage” Best Paper Award, FAST “In Search of an Understandable Consensus Algorithm” USENIX ATC June 6, 2013RAMCloud Update and OverviewSlide 7 Progress, cont’d
8
● New papers submitted to OSDI: “SLIK: Scalable Low-Latency Indexes for a Key-Value Store” (Ankita, Arjun, Ashish, Zhihao) “Experience with Rules-Based Programming for Distributed, Concurrent, Fault-Tolerant Code” (Ryan, Collin) June 6, 2013RAMCloud Update and OverviewSlide 8 Progress, cont’d
9
Ryan StutsmanGraduated (PhD) Steve RumbleGraduated (PhD) Diego OngaroGraduating soon (PhD) Ankita Kejriwal Arjun GopalanGraduating soon (MS) Behnam Montazeri Collin LeeNew Henry QinNew Ashish GuptaNew (but leaving with MS) Seo Jin ParkNew Zhihao JiaNew (rotation only) Stephen YangRejoining Fall 2014 June 6, 2013RAMCloud Update and OverviewSlide 9 Changing of the Guard
10
June 6, 2013RAMCloud Update and OverviewSlide 10 New Projects First-generation RPC (based on Infiniband) Key-value store Log-structured storage management Crash recovery First-generation RPC (based on Infiniband) Key-value store Log-structured storage management Crash recovery RAMCloud 1.0 Secondary indexes Linearizability Multi-object transactions Graph support? Secondary indexes Linearizability Multi-object transactions Graph support? Higher-Level Data Model Analyze RPC latency Driver(s) for 10 GigE Clean-slate RPC redesign Analyze RPC latency Driver(s) for 10 GigE Clean-slate RPC redesign Networking Infrastructure Phase I: 2009 – 2013Phase II: 2014 – ?
11
● SLIK: Scalable, Low-latency Indexes for a Key-value Store ● Requires new object format: June 6, 2013RAMCloud Update and OverviewSlide 11 Secondary Indexes Key Version Value Key 0 Key 1 Value Version... Key N Old: key-value store New: multikey-value store Primary key: same as before
12
June 6, 2013RAMCloud Update and OverviewSlide 12 RAMCloud Operations Index Log Object Hash Table Log Primary Key Value Backups Secondary Key Range Primary Key Hash(es) Hash Table Log Hash Table Log Primary Key Hash(es) Object(s) Hash Table Index Log Backups Object Backups Secondary Key, Primary Key Hash Client Application Non-Indexed Log Indexed Read Write Master
13
● Status: Preliminary limitations of most mechanism Initial performance measurements ● Students involved: Ankita Kejriwal (talk later today) Arjun Gopalan Ashish Gupta Zhihao Jia June 6, 2013RAMCloud Update and OverviewSlide 13 SLIK, cont’d
14
● Holy Grail of consistency for large-scale apps June 6, 2013RAMCloud Update and OverviewSlide 14 Linearizability time Client sends request for operation Client receives response System behaves as if operation executes exactly once, instantaneously, sometime between when client sends request and receives response
15
June 6, 2013RAMCloud Update and OverviewSlide 15 Linearizability time x = 10 x = 5 read x OK to return either 5 or 10 time x = 10x = 5 read x Must return 5
16
June 6, 2013RAMCloud Update and OverviewSlide 16 Linearizability Failure Client Master if x.version == 22 then x.value = 20 key:x version:22 value:10
17
June 6, 2013RAMCloud Update and OverviewSlide 17 Linearizability Failure Client Master if x.version == 22 then x.value = 20 key:x version:23 value:20 success Backups key:x version:23 value:20 key:x version:23 value:20
18
June 6, 2013RAMCloud Update and OverviewSlide 18 Linearizability Failure Client Master if x.version == 22 then x.value = 20 key:x version:23 value:20 success Backups key:x version:23 value:20 key:x version:23 value:20 Recovery Master key:x version:23 value:20 Crash Recovery if x.version == 22 then x.value = 20 retry error: version mismatch! CRASH Must remember old results, avoid re-executing requests
19
● Create general-purpose infrastructure (use log to track RPC results) ● Use it to implement linearizable RPCs: Conditional write Multi-object transactions ● Students involved: Seo Jin Park (talk later today) Collin Lee Ankita Kejriwal June 6, 2013RAMCloud Update and OverviewSlide 19 Linearizability Project
20
● After 4 years, still little understanding of RAMCloud latency! What accounts for current latency? How much can it be improved? What are the fundamental limits? What is the right system structure to minimize latency? ● Henry Qin starting to answer these questions June 6, 2013RAMCloud Update and OverviewSlide 20 Latency Analysis
21
● Infiniband reliable queue pairs: Highest performance; our main workhorse Reliable, in-order delivery implemented in hardware Doesn’t support Ethernet-style networks Driver is old, thrown-together, warty (“temporary solution”) ● Kernel TCP: Easy to use Too slow for real applications (50-150µs round-trips) ● FastTransport: Custom transport for RAMCloud Works with any underlying datagram protocol (e.g. kernel UDP) Provides reliable, in-order, flow-controlled delivery Not as fast as infrc, too complex, never fully debugged June 6, 2013RAMCloud Update and OverviewSlide 21 RAMCloud Transports Today
22
● Goal: clean-slate replacement for FastTransport: Better latency and scalability Replace infrc as workhorse transport Separable from RAMCloud “RPC for future datacenters” ● First steps (Behnam Montazeri): Build SolarFlare datagram driver for FastTransport ● Kernel bypass for 10 GigE Understand FastTransport weaknesses June 6, 2013RAMCloud Update and OverviewSlide 22 Transport Redesign
23
● Several new projects in early stages ● Talks this retreat: mostly work in progress ● Should have many interesting results over the next year June 6, 2013RAMCloud Update and OverviewSlide 23 Conclusion
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.