Low-Latency Datacenters John Ousterhout Platform Lab Retreat May 29, 2015.

Slides:

Advertisements

Similar presentations

Fast Crash Recovery in RAMCloud

Advertisements

What happens when you try to build a low latency NIC? Mario Flajslik.

John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2014.

RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Nandu Jayakumar, Diego Ongaro, Mendel Rosenblum,

RAMCloud Scalable High-Performance Storage Entirely in DRAM John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières,

Multiple Processor Systems

RAMCloud 1.0 John Ousterhout Stanford University (with Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin.

Log-Structured Memory for DRAM-Based Storage Stephen Rumble, Ankita Kejriwal, and John Ousterhout Stanford University.

1. Overview  Introduction  Motivations  Multikernel Model  Implementation – The Barrelfish  Performance Testing  Conclusion 2.

VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.

Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.

CS533 Concepts of Operating Systems Class 8 Shared Memory Implementations of Remote Procedure Call.

SEDCL: Stanford Experimental Data Center Laboratory.

CS 550 Amoeba-A Distributed Operation System by Saie M Mulay.

Introducing the Platform Lab John Ousterhout Stanford University.

A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat Presented by Gregory Peaker and Tyler Maclean.

CS 142 Lecture Notes: Large-Scale Web ApplicationsSlide 1 RAMCloud Overview ● Storage for datacenters ● commodity servers ● GB DRAM/server.

Jennifer Rexford Princeton University MW 11:00am-12:20pm Data-Center Traffic Management COS 597E: Software Defined Networking.

Mohammad Alizadeh, Abdul Kabbani, Tom Edsall,

Router Architectures An overview of router architectures.

IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.

Network Management Concepts and Practice Author: J. Richard Burke Presentation by Shu-Ping Lin.

Practical TDMA for Datacenter Ethernet

Metrics for RAMCloud Recovery John Ousterhout Stanford University.

Revisiting Network Interface Cards as First-Class Citizens Wu-chun Feng (Virginia Tech) Pavan Balaji (Argonne National Lab) Ajeet Singh (Virginia Tech)

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

RAMCloud Design Review Recovery Ryan Stutsman April 1,

What We Have Learned From RAMCloud John Ousterhout Stanford University (with Asaf Cidon, Ankita Kejriwal, Diego Ongaro, Mendel Rosenblum, Stephen Rumble,

Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.

SEDCL/Platform Lab Retreat John Ousterhout Stanford University.

RAMCloud Overview John Ousterhout Stanford University.

RAMCloud: a Low-Latency Datacenter Storage System John Ousterhout Stanford University

RAMCloud: Concept and Challenges John Ousterhout Stanford University.

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.

High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.

HPCS Lab. High Throughput, Low latency and Reliable Remote File Access Hiroki Ohtsuji and Osamu Tatebe University of Tsukuba, Japan / JST CREST.

Cool ideas from RAMCloud Diego Ongaro Stanford University Joint work with Asaf Cidon, Ankita Kejriwal, John Ousterhout, Mendel Rosenblum, Stephen Rumble,

John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Forum January, 2015.

Connectivity Devices Hakim S. ADICHE, MSc

MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.

The NE010 iWARP Adapter Gary Montry Senior Scientist

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.

RAMCloud: Low-latency DRAM-based storage Jonathan Ellithorpe, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro,

RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Christos Kozyrakis, David Mazières, Aravind Narayanan,

MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.

Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.

CS 140 Lecture Notes: Technology and Operating Systems Slide 1 Technology Changes Mid-1980’s2012Change CPU speed15 MHz2.5 GHz167x Memory size8 MB4 GB500x.

Software Defined Networks for Dynamic Datacenter and Cloud Environments.

CS 4396 Computer Networks Lab Router Architectures.

RAMCloud Overview and Status John Ousterhout Stanford University.

Internet Protocol Storage Area Networks (IP SAN)

CSci8211: Distributed Systems: RAMCloud 1 Distributed Shared Memory/Storage Case Study: RAMCloud Developed by Stanford Platform Lab  Key Idea: Scalable.

John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2013.

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

© 2007 EMC Corporation. All rights reserved. Internet Protocol Storage Area Networks (IP SAN) Module 3.4.

RAMCloud and the Low-Latency Datacenter John Ousterhout Stanford Platform Laboratory.

The Stanford Platform Laboratory John Ousterhout and Guru Parulkar Stanford University

Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen

Log-Structured Memory for DRAM-Based Storage Stephen Rumble and John Ousterhout Stanford University.

Network Requirements for Resource Disaggregation

5/3/2018 3:51 AM Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter Yuanwei Lu1,2, Guo Chen2, Zhenyuan Ruan1,2, Wencong Xiao2,3,

CS 142 Lecture Notes: Datacenters

Data Center Network Architectures

Flash Storage 101 Revolutionizing Databases

Chapter 6: Network Layer

11/13/ :11 PM Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter Yuanwei Lu1,2, Guo Chen2, Zhenyuan Ruan1,2, Wencong Xiao2,3,

Dingming Wu+, Yiting Xia+*, Xiaoye Steven Sun+,

RAMCloud Architecture

RAMCloud Architecture

Presentation transcript:

Low-Latency Datacenters John Ousterhout Platform Lab Retreat May 29, 2015

● Scale:  1M+ cores  1-10 PB memory  200 PB disk storage ● Latency:  < 0.5 µs speed-of-light delay ● Most work so far has focused on scale:  One app, many resources  Map-Reduce, etc. ● Latency potential unrealized:  High-latency hardware/software  Most apps designed to tolerate latency (communication via large blocks) May 29, 2015Low-Latency DatacenterSlide 2 Datacenters: Scale and Latency

● Round-trip times (100K servers):  Today: µs best case  Often much worse because of congestion  Hardware limit: ~2 µs ● Storage latency dropping:  Disk → Flash → DRAM ● Can we create a new platform that makes the hardware limit accessible to applications? ● If so, will it enable important new applications? May 29, 2015Low-Latency DatacenterSlide 3 Latency

● New switching architecture (30 ns per switch) ● NIC fused with CPU cores; on-chip routing ● User-level networking, polling instead of interrupts ● New transport protocol ● Storage systems based primarily in DRAM ● New software stack May 29, 2015Low-Latency DatacenterSlide 4 Clean-Slate Low-Latency Datacenter

● New class of datacenter storage:  All data in DRAM at all times (disk/flash for backup only)  Large scale: aggregate 1000’s of servers  Low latency: 5-10µs remote access ● 1000x improvements over disk in  Performance  Energy/op Goal: enable a new class of data-intensive applications May 29, 2015Low-Latency DatacenterSlide 5 Low-Latency Storage: RAMCloud Master Backup Master Backup Master Backup Master Backup … Appl. Library Appl. Library Appl. Library Appl. Library … Datacenter Network Coordinator 1000 – 100,000 Application Servers 1000 – 10,000 Storage Servers GB per server

● TCP protocol optimized for:  Throughput, not latency  Long-haul networks (high latency)  Congestion throughout  Modest # connections/server ● Future datacenters:  High performance networking fabric: ● Low latency ● Multi-path  Congestion primarily at edges ● Little congestion in core  Many connections/server (1M?) Need new transport protocol May 29, 2015Low-Latency DatacenterSlide 6 New Transport Protocol Top-of-rack switches Congested link “Perfect” core fabric

● Greatest obstacle to low latency:  Congestion at receiver’s link  Large messages delay small ones ● Solution: drive congestion control from receiver  Schedule incoming traffic  Prioritize small messages ● Behnam Montazeri will present work in progress May 29, 2015Low-Latency DatacenterSlide 7 New Transport Protocol, cont’d

● Today’s stacks: highly layered ● Good for structuring software  Each layer solves one problem ● Bad for performance  Each layer adds latency ● Example: Thrift RPC system  Handles several problems: marshalling, threading, etc.  General-purpose: re-pluggable components  Adds 7 µs latency For low latency, must replace the entire software stack May 29, 2015Low-Latency DatacenterSlide 8 Low-Latency Software Stacks?

May 29, 2015Low-Latency DatacenterSlide 9 Reducing Software Stack Latency High Latency 1.Optimize layers (specialize?) 2.Eliminate layers 3.Bypass layers

May 29, 2015Low-Latency DatacenterSlide 10 Integrate NIC Into CPU Chip? Core Switch Network Per-Core Mini-NIC OS controls routing tables for incoming packets Core

● Does 2 µs latency matter? ● Use low latency for collecting data?  Small chunks of data  Random access  Dependencies serialize accesses  Need a lot of chunks in a small amount of time: ● 20K chunks in 50 ms? ● Use low latency for new computational models?  Independent compute-storage elements  Low latency allows high coherency May 29, 2015Low-Latency DatacenterSlide 11 Low Latency => New Applications?

● What are the key elements of a low-latency platform for datacenters? ● What will a new software stack look like? ● What applications could make use of a low-latency datacenter? May 29, 2015Low-Latency DatacenterSlide 12 Discussion Topics

May 29, 2015Low-Latency DatacenterSlide 13 Palette