The Stanford Platform Laboratory John Ousterhout and Guru Parulkar Stanford University

Slides:

Advertisements

Similar presentations

Interactive lesson about operating system

Advertisements

Distributed Data Processing

Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.

RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Nandu Jayakumar, Diego Ongaro, Mendel Rosenblum,

Introduction CSCI 444/544 Operating Systems Fall 2008.

RAMCloud 1.0 John Ousterhout Stanford University (with Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin.

Shadow Configurations: A Network Management Primitive Richard Alimi, Ye Wang, Y. Richard Yang Laboratory of Networked Systems Yale University.

SEDCL: Stanford Experimental Data Center Laboratory.

Chapter 13 Embedded Systems

Figure 1.1 Interaction between applications and the operating system.

Introducing the Platform Lab John Ousterhout Stanford University.

1© Copyright 2015 EMC Corporation. All rights reserved. SDN INTELLIGENT NETWORKING IMPLICATIONS FOR END-TO-END INTERNETWORKING Simone Mangiante Senior.

Polaris Financial Technologies Welcomes the members of Hyderabad chapter for the 2nd event on 4 th July 14 held by PACE (The Testing Practice)

CS 142 Lecture Notes: Large-Scale Web ApplicationsSlide 1 RAMCloud Overview ● Storage for datacenters ● commodity servers ● GB DRAM/server.

Chapter 9 Classification And Forwarding. Outline.

Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.

Client-Server Processing and Distributed Databases

Programming mobile devices Part II Programming Symbian devices with Symbian C++

Opensource for Cloud Deployments – Risk – Reward – Reality

What We Have Learned From RAMCloud John Ousterhout Stanford University (with Asaf Cidon, Ankita Kejriwal, Diego Ongaro, Mendel Rosenblum, Stephen Rumble,

Challenges of Storage in an Elastic Infrastructure. May 9, 2014 Farid Yavari, Storage Solutions Architect and Technologist.

Computer System Architectures Computer System Software

SEDCL/Platform Lab Retreat John Ousterhout Stanford University.

Microkernels, virtualization, exokernels Tutorial 1 – CSC469.

RAMCloud Overview John Ousterhout Stanford University.

RAMCloud: a Low-Latency Datacenter Storage System John Ousterhout Stanford University

RAMCloud: Concept and Challenges John Ousterhout Stanford University.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.

John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Forum January, 2015.

1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.

Low-Latency Datacenters John Ousterhout Platform Lab Retreat May 29, 2015.

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

OPERATING SYSTEMS Goals of the course Definitions of operating systems Operating system goals What is not an operating system Computer architecture O/S.

The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.

Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.

RAMCloud: Low-latency DRAM-based storage Jonathan Ellithorpe, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro,

RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Christos Kozyrakis, David Mazières, Aravind Narayanan,

1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.

Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.

Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.

CS 140 Lecture Notes: Technology and Operating Systems Slide 1 Technology Changes Mid-1980’s2012Change CPU speed15 MHz2.5 GHz167x Memory size8 MB4 GB500x.

Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.

Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.

Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.

Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.

Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.

RAMCloud Overview and Status John Ousterhout Stanford University.

Welcome to CPS 210 Graduate Level Operating Systems –readings, discussions, and programming projects Systems Quals course –midterm and final exams Gateway.

Full and Para Virtualization

Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.

Aneka Cloud ApplicationPlatform. Introduction Aneka consists of a scalable cloud middleware that can be deployed on top of heterogeneous computing resources.

GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.

CSci8211: Distributed Systems: RAMCloud 1 Distributed Shared Memory/Storage Case Study: RAMCloud Developed by Stanford Platform Lab  Key Idea: Scalable.

John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2013.

Background Computer System Architectures Computer System Software.

RAMCloud and the Low-Latency Datacenter John Ousterhout Stanford Platform Laboratory.

BIG DATA/ Hadoop Interview Questions.

Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:

SDN challenges Deployment challenges

COMPSCI 110 Operating Systems

Current Generation Hypervisor Type 1 Type 2.

File System Implementation

Grid Computing.

Overview of SDN Controller Design

Storage Virtualization

Software Defined Networking (SDN)

Specialized Cloud Architectures

Cloud-Enabling Technology

Presentation transcript:

The Stanford Platform Laboratory John Ousterhout and Guru Parulkar Stanford University

Platform Lab Faculty John Ousterhout Faculty Director Mendel RosenblumKeith WinsteinGuru Parulkar Executive Director Bill DallyPhil LevisSachin KattiChristos Kozyrakis Nick McKeown 2

1980’s: – Platform: relational database – Applications: enterprise applications (e.g. ERP systems) 1990’s: – Platform: HTTP + HTML + JavaScript – Applications: online commerce 2000’s: – Platform: GFS + MapReduce – Applications: large-scale analytics 2010’s: – Platform: smart phones + GPS – Applications: Uber and many others New Platforms Enable New Applications 3

General-purpose substrate – Makes it easier to build applications or higher-level platforms – Solves significant problems – Usually introduces some restrictions Software and/or hardware Example: Map/Reduce computational model – Simplifies construction of applications that use hundreds of servers to compute on large datasets – Hides communication latency: data transferred in large blocks – Automatically handles failures & slow servers – Restrictions: 2 levels of computation, sequential data access What is a Platform? 4

Platform Lab Vision Platforms Large SystemsCollaboration Create the next generation of platforms to stimulate new classes of applications 5

Drivers for Next Generation Platforms Achieve physical limits – Can we have layers of abstraction without giving up performance? Heterogeneity and specialization – General-purpose systems fundamentally inefficient – Can we find a small set of specialized components that are highly efficient and, taken together, provide a general-purpose set of abstractions? Raise the floor of the developer productivity – How to create abstractions that are extremely easy to use, while still providing high enough performance to meet the application’s needs? Scalability and elasticity – How to achieve high throughput and low latency with horizontal scaling – How to achieve elasticity, for example from 1K-1M users without reimplementation? 6

Initial Focus Platform: Swarm Pod

Swarm Pod Next-Generation Datacenter Pod 8 Swarms of Devices Wired/Wireless Networks Switches/Routers Self-Driving Cars Drones Cell Phones Internet of Things

Changing Technologies and Requirements Next-Generation Datacenter Pod 9 Swarms of Devices Wired/Wireless Networks Switches/Routers Self-Driving Cars Drones Cell Phones Internet of Things More Devices Online Need Better Visibility and Control Large Nonvolatile Memories Low Latency Interconnects Increasing Core Density Specialized Components Collaboration Between Devices

Swarm Pod Research Topics Next-Generation Datacenter Pod 10 Swarms of Devices Wired/Wireless Networks Switches/Routers Self-Driving Cars Drones Cell Phones Internet of Things Scalable Control Planes Low-Latency Software Stack New Memory/ Storage Systems IX Operating System RAMCloud Storage System Programmable Network Fabrics Self-Incentivizing Networks

The Low-Latency Datacenter Phase 1 of datacenter revolution: scale – How can one application harness thousands of servers? – New platforms such as MapReduce, Spark – But, based on high-latency technologies: 1990’s networking: µs round-trips Disks: 10ms access time Phase 2 of datacenter revolution: low latency – New networking hardware: 5-10µs round-trips today 2-3µs in the future – New nonvolatile memory technologies Storage access times < 10µs – Low latency will enable new applications How does low latency affect system architecture? 11

Eliminating Layers Existing software stacks highly layered – Great for software structuring – Layer crossings add latency – Software latency hidden by slow networks and disks Can’t achieve low latency with today’s stacks – Death by a thousand cuts: no single place to optimize – Networks: Complex OS protocol stacks Marshaling/serialization costs – Storage systems: OS file system overheads Low-latency systems will require a new software stack – Can layers be reduced without making systems unmanageable? – Must eliminate layer crossings – What are the new APIs? 12

The RAMCloud Storage System New class of storage for low-latency datacenters: – All data in DRAM at all times – Large scale: servers – Low latency: 5-10µs remote access – Durability/availability equivalent to replicated disk. 1000x improvements in: – Latency – Throughput (relative to disk-based storage) Goal: enable new data-intensive applications 13 Master Backup Master Backup Master Backup Master Backup … Appl. Library Appl. Library Appl. Library Appl. Library … Datacenter Network Coordinator 1000 – 100,000 Application Servers 1000 – 10,000 Storage Servers

New RAMCloud Projects New software stack layers for low-latency datacenter: New remote procedure call (RPC) system – Homa: new transport protocol Receiver-managed flow and congestion control Minimize buffering – Microsecond-scale latency – 1M connections/server New thread scheduling mechanism – Threads scheduled by application, not OS – OS allocates cores to applications, manages competing apps – Same mechanism extends to VMMs: hypervisor allocates cores to guest OS 14

Reimagining Memory and Storage New nonvolatile memories coming soon Example: Intel/Micron Crosspoint devices: – 1-10µs access time? – 10 TB capacity? – DIMM form factor: directly addressable What are the right abstractions for shared storage? – Files have high overheads for OS lookups, protection checks – Does paging make sense again? – Single-level store? Relationship between data and computation: – Move data to computation or vice versa? 15

Hollowing Out of the OS 16 Hypervisor Application Operating System Device Drivers Networking (Kernel Bypass) Direct Storage Access Thread Scheduling Physical Memory Mgmt Does a radical OS redesign/simplification make sense?

Next logical step in SDN: Take programmability all the way down to the wire

Status quo 18 Switch OS Run-time API Driver “This is roughly how I process packets …” Fixed-function ASIC Prone to bugs Very long and unpredictable lead time

Turning the tables Switch OS Run-time API Driver PISA device (Protocol-Independent Switch Architecture) “This is precisely how you must process packets” in P4 19

P4 and PISA P4 code Compiler Compiler Target Queues Programmable Parser Programmable Parser Fixed Action L2 Table Fixed Action IPv4 Table IPv6 Table ACL Table Match Table Action Macro CLK 20

Current Research Projects 1.P4 as a front-end to configure OVS (with Ben Pfaff and Princeton) – Approach 1: Statically compile P4 program to replace parse and matching in OVS – Approach 2: Compile P4 to eBPF and dynamically load to kernel – Early results suggest no performance penalty for programmability; in some cases faster 2.Domino: A higher level language (with MIT) – C-like, process-to-completion. Includes stateful processing. Compiler generates P4 code. 3.PIFO: A hardware abstraction for programmable packet scheduling algorithms 4.xFabric: Calculating flow rates based on programmers utility function 5.PERC: Fast congestion control by proactive, direct calculation of flow rates in the forwarding plane. 21

Applications declare their resource preferences – Lowest latency, bandwidth allocation Network operators declare their resource usage policies Challenge is to automate optimal resource allocation for diverse applications for the datacenter scale infrastructure xFabric as a platform will ensure optimal resource allocations while meeting application requirements while meeting operator policies xFabric: Programmable Datacenter Fabric 22

Scalable Control Plane: Generalized Customizable Separation of control plane is a common trend: networks/systems – SDN, storage systems, MapReduce scheduler, … Control plane design represents a challenge – Scale, throughput and latency metrics, abstractions that are easy to use We have been building control planes for specific systems – ONOS, SoftRAN, and RAMCloud Coordinator Can we design a generalized scalable control plane with – A common foundation that can be customized for different contexts Design a new platform that makes it significantly easier to develop diverse control planes with functionality and performance 23

Key Performance Requirements Control Apps Global Network View / State Global View / State High Volume of State: ~500GB-2TB High Throughput: ~500K-20M ops / second ~100M state ops / second Low Latency to Events: 1-10s ms A distributed platform required to meet the metrics Difficult challenge! High throughput | Low latency | Consistency | High availability Server Storage 24

Generalized and Customizable Scalable Control Plane Northbound Absrtractions/APIs (C/C++, Declective Programming, REST) Strongly consistent, trasaction semantics? OpenFlow/NetConfRAN Protocol?RPC Apps Distributed Core Cluster of Servers, Gbps, low latency RPC Distributed State Management Primitives Southbound Plug-in for different contexts Northbound Abstraction: - Interface to apps - Provide different APIs - Customize for the context Core: - distributed - context independent Southbound Abstraction: - Interface to data plane - Plug-ins for different contexts Server Storage Switches eNBs Servers Storage Server

Platform Lab Vision Platforms Large SystemsCollaboration New platforms enable new applications 26

Why universities should do large systems projects: – Companies don’t have time to evaluate, find best approach – Universities can lead the market – Produce better graduates Goal for Platform Lab: – Create environment where large systems projects flourish Large Systems 27

Convergence of computing, storage, and networking is very important for future infrastructure Swarm Pod and the target applications require expertise in many system areas The Platform Lab has brought a set of professors and their students with expertise in different systems area to collaborate to address the challenges at the convergence of computing, storage, and networking Collaboration 28

Difficult to know the specifics at this point but our expectations and history suggests Platform Lab will lead to Influential ideas and architecture directions Real systems or platforms with community of users Graduates with strong systems skillset Impact on the practice of computing and networking Commercial impact with ideas, open source systems, and startups Across several areas of systems: hardware & software; computing, networking, and storage; different layers of the system; app domains; … Expected Results 29

Regular Members – Event Based Interactions – Regular reviews and retreats – Early access to results – Access to faculty and students Premium Members – Active Collaboration – Work together with committed engineers/researchers – Be part of architecture, design, implementation, and evaluation of platforms – Company staff participate in regular meetings including weekly meetings Engagement Model 30

Questions? Reactions?

Thank You!

Low-latency datacenter (Dally, Katti, Kozyrakis, Levis, Ousterhout) RAMCloud (Ousterhout, Rosenblum) Scalable control planes (Katti, Ousterhout, Parulkar) Programmable network fabrics (Katti, Levis, McKeown, Ousterhout, Parulkar) New memory/storage systems for the 21st Century (Dally, Kozyrakis, Levis) Cloud query planner (Winstein, Levis) Self-incentivizing networks (Winstein, ??) Example Target Platforms 33

Memory abstractions and storage hierarchies obsolete for today’s workload and technologies – E.g., memory is limited; temporal locality; moving data to computation is efficient -- not true for many of today’s apps/environment Goals: revisit memory/storage abstractions and implementations – Heterogeneous: combination of DRAM, SCM – Aggressive memory sharing among apps across a sever, cluster, datacenter – Support for QoS, near-data processing, and security 21 st Century Abstractions for Memory and Storage 34

21 st Century Abstractions for Memory and Storage Proposed design – ideas – A single-level store based on objects or segments that will span apps, memory technologies, and servers – Objects will have logical attributes: persistence, indexing, … – Objects will have physical attributes: encryption, replication requirements, … – Apps/users specify logical attributes; compilers and run time systems manage mapping and do background optimizations Develop hardware & software platforms for a single-level store – Efficient hardware structure for fast access – Compiler & system software for efficient mapping within and across servers – APIs and storage representation schemes – Security and privacy support – Cluster-wide management and optimization 35

Logically Centralized Control Plane Provides global network view Makes it easy to program control, management, config apps Enables new apps 36

Scalable Control Plane: Perfect Platform for the Laboratory Requires overcoming all of the fundamental challenges identified Physical limits – To deliver on performance Heterogeneity and specialization – Target environments are diverse: hardware to apps Scalability and elasticity – Most control plane scenarios need scalability and elasticity Raise the floor of the developer productivity – Typically devops/netops people write apps for the controllers – programming abstractions have to suite them 37

Target Platforms: Low Latency Datacenter

Phase 1: manage scale – 10, ,000 servers within 50m radius – 1 PB DRAM – 100 PB disk storage – Challenge: how can one application harness thousands of servers? Answer: MapReduce, etc. But, communication latency high: – µs round-trip times – Must process data sequentially to hide latency (e.g. MapReduce) – Interactive applications limited in functionality Evolution of Datacenters 39

Why Does Latency Matter? Large-scale apps struggle with high latency – Random access data rate has not scaled! – Facebook: can only make internal requests per page UI App. Logic Data Structures Traditional Application UI App. Logic Application Servers Storage Servers Web Application << 1µs latency0.5-10ms latency Single machine Datacenter 40

Goal: Scale and Latency Enable new class of applications: – Large-scale graph algorithms (machine learning?) – Collaboration at scale? Traditional ApplicationWeb Application << 1µs latency ms latency 5-10µs UI App. Logic Application Servers Storage Servers Datacenter UI App. Logic Data Structures Single machine 41

Large-Scale Collaboration Data for one user “Region of Consciousness” Gmail: for one user Facebook: friends Morning commute: 10, ,000 cars 42

Goal: Build new hardware and software infrastructure that operates at microsecond-scale latencies Build on RAMCloud RPC implementation: – Reduce software overhead down from 2µs – Support throughput as well as latency – Reduce state per connection to support 1M connections/server in future Low Latency Datacenter 43

Target Platforms: RAMCloud

Storage system for low-latency datacenters: General-purpose All data always in DRAM (not a cache) Durable and available Scale: servers, 100+ TB Low latency: 5-10µs remote access RAMCloud 45

RAMCloud: Distributed Storage with Low Latency Appl. Library Datacenter Network … 1000 – 100,000 Application Servers Appl. Library Appl. Library Appl. Library Master Backup Master Backup Master Backup Master Backup … Coordinator Coordinator Standby External Storage (ZooKeeper) 1000 – 10,000 Storage Servers High-speed networking: 5 µs round-trip Full bisection bandwidth Commodity Servers GB per server Build higher level abstractions for ease of use while preserving or improving performance 46

Using Infiniband networking (24 Gb/s, kernel bypass) – Other networking also supported, but slower Reads: – 100B objects: 4.7µs – 10KB objects: 10µs – Single-server throughput (100B objects): 900 Kops/sec. – Small-object multi-reads: 2M objects/sec. Durable writes: – 100B objects: 13.5µs – 10KB objects: 35µs – Small-object multi-writes: K objects/sec. RAMCloud Performance 47

Support higher-level features/abstractions Secondary indexes Multi-object transactions Graph operations Without compromising scale and latency (as much as possible) RAMCloud Next Steps 48