Download presentation
Presentation is loading. Please wait.
Published byStephany McDaniel Modified over 9 years ago
1
RAMCloud: a Low-Latency Datacenter Storage System John Ousterhout Stanford University ouster@cs.stanford.edu
2
● RAMCloud: new class of datacenter storage All data always in DRAM ● Large scale: 100 - 10,000 storage servers 100 TB - 1 PB total capacity ● Low latency: 5 µs remote access time from anywhere in datacenter ● Durable/available ● Overall goal: enable a new class of applications Does this make sense for Radio Astronomy apps? April 3, 2014Exascale Radio Astronomy ConferenceSlide 2 Introduction
3
April 3, 2014Exascale Radio Astronomy ConferenceSlide 3 Traditional Storage Choices Local DRAM Local Disk Local Flash Network Disk Latency50-100ns5-10ms50-200µs5-10ms Bandwidth10-50 GB/s100 MB/s (per disk) 250 MB/s (per drive) 1 GB/s/node (network) Maximum capacity 1 TB5-20TB1-5 TB10-100PB Maximum cores 24 scalable Durable and fault tolerant? nopartially yes
4
April 3, 2014Exascale Radio Astronomy ConferenceSlide 4 RAMCloud Architecture Master Backup Master Backup Master Backup Master Backup … Appl. Library Appl. Library Appl. Library Appl. Library … Datacenter Network Coordinator 1000 – 10,000 Storage Servers 1000 – 100,000 Application Servers Commodity Servers 64-256 GB per server High-speed networking: ● 5 µs round-trip ● Full bisection bwidth
5
Data Model: Key-Value Store ● Basic operations: read(tableId, key) => blob, version write(tableId, key, blob) => version delete(tableId, key) ● Other operations: cwrite(tableId, key, blob, version) => version Enumerate objects in table Efficient multi-read, multi-write Atomic increment ● Under development: Secondary indexes Atomic updates of multiple objects April 3, 2014Exascale Radio Astronomy ConferenceSlide 5 Tables (Only overwrite if version matches) Key (≤ 64KB) Version (64b) Blob (≤ 1MB) Object
6
● One copy of data in DRAM ● Multiple copies on disk/flash Each master’s backup data scattered across cluster ● Fast crash recovery Remaining servers work together to recover lost data Typical recovery time: 1-2 seconds April 3, 2014Exascale Radio Astronomy ConferenceSlide 6 Data Durability
7
● Using Infiniband networking (24 Gb/s, kernel bypass) Other networking also supported, but slower ● Reads: 100B objects: 5µs 10KB objects: 10µs Single-server throughput (100B objects): 700 Kops/sec. Small-object multi-reads: 1-2M objects/sec. ● Durable writes: 100B objects: 16µs 10KB objects: 40µs Small-object multi-writes: 400-500K objects/sec. April 3, 2014Exascale Radio Astronomy ConferenceSlide 7 RAMCloud Performance 1 client, 1 server
8
April 3, 2014Exascale Radio Astronomy ConferenceSlide 8 Comparisons Local DRAM Network Disk RAMCloud Latency50-100ns5-10ms5µs Bandwidth10-50 GB/s1 GB/s/node (network) Maximum capacity 1 TB10-100PB1-5PB Maximum cores 24scalable Durable and fault tolerant? noyes
9
● Ongoing research project at Stanford ● Goal: production-quality system Source code freely available Version 1.0 tagged in January 2014 (first version suitable for real applications) Starting to work with early adopters ● System requirements: x86 servers (minimum cluster size: 10-20 servers) Linux operating system Need networking with kernel-bypass NICs ● Built-in support for Mellanox Infiniband ● Driver for SolarFlare 10 Gbs Ethernet NICs under development April 3, 2014Exascale Radio Astronomy ConferenceSlide 9 RAMCloud Status
10
Issues to consider: ● Remote access data model Sparse vs. bulk ● Key-value store ● Durability April 3, 2014Exascale Radio Astronomy ConferenceSlide 10 Is RAMCloud Right for You?
11
DDDDDDDDDD Network Large-Scale Applications ● Computation, data colocated ● Works best for: Bulk processing (touch all data) High locality of access ● Performance dominated by bandwidth ● Examples: analytics ● Remote data access ● Works best for: Sparse and unpredictable data accesses No locality ● Performance dominated by latency ● Example: transactional Web applications (Facebook) April 3, 2014Exascale Radio Astronomy ConferenceSlide 11 CCCCCCCCCCD Network CCCCCCCCCC DDDDDDDDD Computation Nodes Storage Nodes
12
● RAMCloud: general-purpose DRAM-based storage Scale Latency ● Goals: Harness full performance potential of DRAM-based storage Enable new applications: intensive manipulation of large-scale data ● What could you do with: 1M cores 1 petabyte data 5-10µs access time April 3, 2014Exascale Radio Astronomy ConferenceSlide 12 Conclusion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.