Presentation is loading. Please wait.

Presentation is loading. Please wait.

RAMCloud: a Low-Latency Datacenter Storage System John Ousterhout Stanford University

Similar presentations


Presentation on theme: "RAMCloud: a Low-Latency Datacenter Storage System John Ousterhout Stanford University"— Presentation transcript:

1 RAMCloud: a Low-Latency Datacenter Storage System John Ousterhout Stanford University ouster@cs.stanford.edu

2 ● RAMCloud: new class of datacenter storage All data always in DRAM ● Large scale:  100 - 10,000 storage servers  100 TB - 1 PB total capacity ● Low latency:  5 µs remote access time from anywhere in datacenter ● Durable/available ● Overall goal: enable a new class of applications Does this make sense for Radio Astronomy apps? April 3, 2014Exascale Radio Astronomy ConferenceSlide 2 Introduction

3 April 3, 2014Exascale Radio Astronomy ConferenceSlide 3 Traditional Storage Choices Local DRAM Local Disk Local Flash Network Disk Latency50-100ns5-10ms50-200µs5-10ms Bandwidth10-50 GB/s100 MB/s (per disk) 250 MB/s (per drive) 1 GB/s/node (network) Maximum capacity 1 TB5-20TB1-5 TB10-100PB Maximum cores 24 scalable Durable and fault tolerant? nopartially yes

4 April 3, 2014Exascale Radio Astronomy ConferenceSlide 4 RAMCloud Architecture Master Backup Master Backup Master Backup Master Backup … Appl. Library Appl. Library Appl. Library Appl. Library … Datacenter Network Coordinator 1000 – 10,000 Storage Servers 1000 – 100,000 Application Servers Commodity Servers 64-256 GB per server High-speed networking: ● 5 µs round-trip ● Full bisection bwidth

5 Data Model: Key-Value Store ● Basic operations:  read(tableId, key) => blob, version  write(tableId, key, blob) => version  delete(tableId, key) ● Other operations:  cwrite(tableId, key, blob, version) => version  Enumerate objects in table  Efficient multi-read, multi-write  Atomic increment ● Under development:  Secondary indexes  Atomic updates of multiple objects April 3, 2014Exascale Radio Astronomy ConferenceSlide 5 Tables (Only overwrite if version matches) Key (≤ 64KB) Version (64b) Blob (≤ 1MB) Object

6 ● One copy of data in DRAM ● Multiple copies on disk/flash  Each master’s backup data scattered across cluster ● Fast crash recovery  Remaining servers work together to recover lost data  Typical recovery time: 1-2 seconds April 3, 2014Exascale Radio Astronomy ConferenceSlide 6 Data Durability

7 ● Using Infiniband networking (24 Gb/s, kernel bypass)  Other networking also supported, but slower ● Reads:  100B objects: 5µs  10KB objects: 10µs  Single-server throughput (100B objects): 700 Kops/sec.  Small-object multi-reads: 1-2M objects/sec. ● Durable writes:  100B objects: 16µs  10KB objects: 40µs  Small-object multi-writes: 400-500K objects/sec. April 3, 2014Exascale Radio Astronomy ConferenceSlide 7 RAMCloud Performance 1 client, 1 server

8 April 3, 2014Exascale Radio Astronomy ConferenceSlide 8 Comparisons Local DRAM Network Disk RAMCloud Latency50-100ns5-10ms5µs Bandwidth10-50 GB/s1 GB/s/node (network) Maximum capacity 1 TB10-100PB1-5PB Maximum cores 24scalable Durable and fault tolerant? noyes

9 ● Ongoing research project at Stanford ● Goal: production-quality system  Source code freely available  Version 1.0 tagged in January 2014 (first version suitable for real applications)  Starting to work with early adopters ● System requirements:  x86 servers (minimum cluster size: 10-20 servers)  Linux operating system  Need networking with kernel-bypass NICs ● Built-in support for Mellanox Infiniband ● Driver for SolarFlare 10 Gbs Ethernet NICs under development April 3, 2014Exascale Radio Astronomy ConferenceSlide 9 RAMCloud Status

10 Issues to consider: ● Remote access data model  Sparse vs. bulk ● Key-value store ● Durability April 3, 2014Exascale Radio Astronomy ConferenceSlide 10 Is RAMCloud Right for You?

11 DDDDDDDDDD Network Large-Scale Applications ● Computation, data colocated ● Works best for:  Bulk processing (touch all data)  High locality of access ● Performance dominated by bandwidth ● Examples: analytics ● Remote data access ● Works best for:  Sparse and unpredictable data accesses  No locality ● Performance dominated by latency ● Example: transactional Web applications (Facebook) April 3, 2014Exascale Radio Astronomy ConferenceSlide 11 CCCCCCCCCCD Network CCCCCCCCCC DDDDDDDDD Computation Nodes Storage Nodes

12 ● RAMCloud: general-purpose DRAM-based storage  Scale  Latency ● Goals:  Harness full performance potential of DRAM-based storage  Enable new applications: intensive manipulation of large-scale data ● What could you do with:  1M cores  1 petabyte data  5-10µs access time April 3, 2014Exascale Radio Astronomy ConferenceSlide 12 Conclusion


Download ppt "RAMCloud: a Low-Latency Datacenter Storage System John Ousterhout Stanford University"

Similar presentations


Ads by Google