Data Center Storage and Networking Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking December.

Slides:



Advertisements
Similar presentations
Towards Predictable Datacenter Networks
Advertisements

SDN + Storage.
The Case for Enterprise Ready Virtual Private Clouds Timothy Wood, Alexandre Gerber *, K.K. Ramakrishnan *, Jacobus van der Merwe *, and Prashant Shenoy.
Software Defined Networking COMS , Fall 2013 Instructor: Li Erran Li SDNFall2013/
Towards Virtual Routers as a Service 6th GI/ITG KuVS Workshop on “Future Internet” November 22, 2010 Hannover Zdravko Bozakov.
Course Name- CSc 8320 Advanced Operating Systems Instructor- Dr. Yanqing Zhang Presented By- Sunny Shakya Latest AOS techniques, applications and future.
Software Defined Networking COMS , Fall 2014 Instructor: Li Erran Li SDNFall2014/
By Aaron Thomas. Quick Network Protocol Intro. Layers 1- 3 of the 7 layer OSI Open System Interconnection Reference Model  Layer 1 Physical Transmission.
End-to-End Analysis of Distributed Video-on-Demand Systems Padmavathi Mundur, Robert Simon, and Arun K. Sood IEEE Transactions on Multimedia, February.
4-1 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side, delivers.
10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.
In-Band Flow Establishment for End-to-End QoS in RDRN Saravanan Radhakrishnan.
ENFORCING PERFORMANCE ISOLATION ACROSS VIRTUAL MACHINES IN XEN Diwaker Gupta, Ludmila Cherkasova, Rob Gardner, Amin Vahdat Middleware '06 Proceedings of.
Data Center Middleboxes Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking November 24,
Software Routers: NetMap Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking October 8, 2014.
Software Defined Networking COMS , Fall 2014 Instructor: Li Erran Li SDNFall2014/
Data Center Networks and Basic Switching Technologies Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems.
1© Copyright 2015 EMC Corporation. All rights reserved. SDN INTELLIGENT NETWORKING IMPLICATIONS FOR END-TO-END INTERNETWORKING Simone Mangiante Senior.
Data Center Virtualization: Open vSwitch Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
IOFlow: a Software-Defined Storage Architecture
IOFlow: A Software-defined Storage Architecture Eno Thereska, Hitesh Ballani, Greg O’Shea, Thomas Karagiannis, Antony Rowstron, Tom Talpey, Richard Black,
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Data Center Traffic and Measurements: Available Bandwidth Estimation Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance.
Data Center Virtualization: VirtualWire Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Black-box and Gray-box Strategies for Virtual Machine Migration Timothy Wood, Prashant.
Software Routers: NetSlice Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking October 15,
AQM Recommendation Fred Baker. History At IETF 86, TSVAREA decided to update the recommendation of RFC 2309 to not recommend the use of RED Argument:
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
Virtual Machine Hosting for Networked Clusters: Building the Foundations for “Autonomic” Orchestration Based on paper by Laura Grit, David Irwin, Aydan.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Data Center Virtualization: Xen and Xen-blanket
Common Devices Used In Computer Networks
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
The Center for Autonomic Computing is supported by the National Science Foundation under Grant No NSF CAC Seminannual Meeting, October 5 & 6,
MClock: Handling Throughput Variability for Hypervisor IO Scheduling in USENIX conference on Operating Systems Design and Implementation (OSDI ) 2010.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
4: Network Layer4-1 Schedule Today: r Finish Ch3 r Collect 1 st Project r See projects run r Start Ch4 Soon: r HW5 due Monday r Last chance for Qs r First.
Objectives Functionalities and services Architecture and software technologies Potential Applications –Link to research problems.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
Hyper-V Performance, Scale & Architecture Changes Benjamin Armstrong Senior Program Manager Lead Microsoft Corporation VIR413.
SDN Management Layer DESIGN REQUIREMENTS AND FUTURE DIRECTION NO OF SLIDES : 26 1.
Forwarding.
Symbiotic Routing in Future Data Centers Hussam Abu-Libdeh Paolo Costa Antony Rowstron Greg O’Shea Austin Donnelly MICROSOFT RESEARCH Presented By Deng.
Full and Para Virtualization
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
An End-to-End Service Architecture r Provide assured service, premium service, and best effort service (RFC 2638) Assured service: provide reliable service.
Network Virtualization Sandip Chakraborty. In routing table we keep both the next hop IP (gateway) as well as the default interface. Why do we require.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Network Layer4-1 Chapter 4 Network Layer All material copyright J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking: A Top Down.
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
An End-to-End Service Architecture r Provide assured service, premium service, and best effort service (RFC 2638) Assured service: provide reliable service.
Slide 1/20 "PerfSight: Performance Diagnosis for Software Dataplanes." Wu, Wenfei, Keqiang He, and Aditya Akella ACM ICM, Presented by: Ayush Patwari.
R2C2: A Network Stack for Rack-scale Computers Paolo Costa, Hitesh Ballani, Kaveh Razavi, Ian Kash Microsoft Research Cambridge EECS 582 – W161.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
sRoute: Treating the Storage Stack Like a Network
Network Virtualization Ben Pfaff Nicira Networks, Inc.
CIS 700-5: The Design and Implementation of Cloud Networks
sRoute: Treating the Storage Stack Like a Network
Networking Devices.
CS 268: Computer Networking
CS 31006: Computer Networks – The Routers
Network Core and QoS.
Building a Database on S3
Software models - Software Architecture Design Patterns
Specialized Cloud Architectures
Network Core and QoS.
Towards Predictable Datacenter Networks
Intelligent Network Services through Active Flow Manipulation
Presentation transcript:

Data Center Storage and Networking Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking December 1, 2014 Slides from ACM SOSP 2013 presentation on “IOFlow: A Software-Defined Storage Architecture.” Eno Thereska, Hitesh Ballani, Greg O'Shea, Thomas Karagiannis, Antony Rowstron, Tom Talpey, and Timothy Zhu. In SOSP'13, Farmington, PA, USA. November 3-6, “

Overview and Basics Data Center Networks – Basic switching technologies – Data Center Network Topologies (today and Monday) – Software Routers (eg. Click, Routebricks, NetMap, Netslice) – Alternative Switching Technologies – Data Center Transport Data Center Software Networking – Software Defined networking (overview, control plane, data plane, NetFGPA) – Data Center Traffic and Measurements – Virtualizing Networks – Middleboxes Advanced Topics Where are we in the semester?

Goals for Today IOFlow: a software-defined storage architecture – E. Thereska, H. Ballani, G. O'Shea, T. Karagiannis, A. Rowstron, T. Talpey, R. Black, T. Zhu. ACM Symposium on Operating Systems Principles (SOSP), October 2013, pages

Background: Enterprise data centers General purpose applications Application runs on several VMs Separate network for VM-to-VM traffic and VM-to-Storage traffic Storage is virtualized Resources are shared Switch S-NIC NIC S-NICNIC VM Virtual Machine vDisk VM Virtual Machine vDisk 2

Motivation It is hard to provide such SLAs today Want: predictable application behaviour and performance Need system to provide end-to-end SLAs, e.g., Guaranteed storage bandwidth B Guaranteed high IOPS and priority Per-application control over decisions along IOs’ path 5

Switch S-NIC NIC S-NICNIC VM Virtual Machine vDisk VM Virtual Machine vDisk App OS App OS … 6 Deep IO path with 18+ different layers that are configured and operate independently and do not understand SLAs Example: guarantee aggregate bandwidth B for Red tenant

Challenges in enforcing end-to-end SLAs No storage control plane No enforcing mechanism along storage data plane Aggregate performance SLAs - Across VMs, files and storage operations Want non-performance SLAs: control over IOs’ path Want to support unmodified applications and VMs 7

… IOFlow architecture App OS App OS Controller High-level SLA 8 IOFlow API Decouples the data plane (enforcement) from the control plane (policy logic)

Contributions Defined and built storage control plane Controllable queues in data plane Interface between control and data plane (IOFlow API) Built centralized control applications that demonstrate power of architecture 9

SDS: Storage-specific challenges Low-level primitives Old networksSDNStorage todaySDS End-to-end identifier Data plane queues Control plane

Storage flows Storage “Flow” refers to all IO requests to which an SLA applies ---> SLA Aggregate, per-operation and per-file SLAs, e.g., ---> high priority ---> min 100,000 IOPS Non-performance SLAs, e.g., path routing ---> bypass malware scanner 11 source set destination sets

IOFlow API: programming data plane queues 1. Classification [IO Header -> Queue] 2. Queue servicing [Queue -> ] 3. Routing [Queue -> Next-hop] Malware scanner 12

Lack of common IO Header for storage traffic SLA: --> Bandwidth B 13 Block device Z: (/device/scsi1) Server and VHD \\serverX\AB79.vhd Volume and file H:\AB79.vhd Block device /device/ssd5

Flow name resolution through controller SLA: {VM 4, *, *, //share/dataset} --> Bandwidth B Controller SMBc exposes IO Header it understands: Queuing rule (per-file handle): --> Q1 Q1.token rate --> B 14

Rate limiting for congestion control Queue servicing [Queue -> ] Important for performance SLAs Today: no storage congestion control Challenging for storage: e.g., how to rate limit two VMs, one reading, one writing to get equal storage bandwidth? 15 IOs tokens

Rate limiting on payload bytes does not work 16 VM 8KB Writes 8KB Reads

Rate limiting on bytes does not work 17 VM 8KB Writes 8KB Reads

Rate limiting on IOPS does not work 18 VM 8KB Writes 64KB Reads Need to rate limit based on cost

Rate limiting based on cost  Controller constructs empirical cost models based on device type and workload characteristics  RAM, SSDs, disks: read/write ratio, request size  Cost models assigned to each queue  ConfigureTokenBucket [Queue -> cost model]  Large request sizes split for pre-emption 19

Recap: Programmable queues on data plane  Classification [IO Header -> Queue]  Per-layer metadata exposed to controller  Controller out of critical path  Queue servicing [Queue -> ]  Congestion control based on operation cost  Routing [Queue -> Next-hop] How does controller enforce SLA? 20

Distributed, dynamic enforcement SLA needs per-VM enforcement Need to control the aggregate rate of VMs 1-4 that reside on different physical machines Static partitioning of bandwidth is sub-optimal --> Bandwidth 40 Gbps 21 VM 40Gbps

Work-conserving solution VMs with traffic demand should be able to send it as long as the aggregate rate does not exceed 40 Gbps Solution: Max-min fair sharing 22 VM

Max-min fair sharing Well studied problem in networks  Existing solutions are distributed  Each VM varies its rate based on congestion  Converge to max-min sharing  Drawbacks: complex and requires congestion signal But we have a centralized controller  Converts to simple algorithm at controller 23

Controller-based max-min fair sharing What does controller do? Infers VM demands Uses centralized max-min within a tenant and across tenants Sets VM token rates Chooses best place to enforce Controller 24 INPUT: per-VM demands OUTPUT: per-VM allocated token rate t s t = control interval s = stats sampling interval

Controller decides where to enforce 25 SLA constraints  Queues where resources shared  Bandwidth enforced close to source  Priority enforced end-to-end Efficiency considerations  Overhead in data plane ~ # queues  Important at 40+ Gbps Minimize # times IO is queued and distribute rate limiting load VM

Centralized vs. decentralized control Centralized controller in SDS allows for simple algorithms that focus on SLA enforcement and not on distributed system challenges Analogous to benefits of centralized control in software- defined networking (SDN) 26

IOFlow implementation Controller 27 2 key layers for VM-to-Storage performance SLAs 4 other layers. Scanner driver (routing). User-level (routing). Network driver. Guest OS file system Implemented as filter drivers on top of layers

Evaluation map IOFlow’s ability to enforce end-to-end SLAs Aggregate bandwidth SLAs Priority SLAs and routing application in paper Performance of data and control planes 28

Evaluation setup 29 VM Switch VM … Clients:10 hypervisor servers, 12 VMs each 4 tenants (Red, Green, Yellow, Blue) 30 VMs/tenant, 3 VMs/tenant/server Storage network: Mellanox 40Gbps RDMA RoCE full-duplex 1 storage server: 16 CPUs, 2.4GHz (Dell R720) SMB 3.0 file server protocol 3 types of backend: RAM, SSDs, Disks Controller: 1 separate server 1 sec control interval (configurable)

Workloads 4 Hotmail tenants {Index, Data, Message, Log} Used for trace replay on SSDs (see paper) IoMeter is parametrized with Hotmail tenant characteristics (read/write ratio, request size) 30

Enforcing bandwidth SLAs 4 tenants with different storage bandwidth SLAs Tenants have different workloads  Red tenant is aggressive: generates more requests/second TenantSLA Red{VM1 – 30} -> Min 800 MB/s Green{VM31 – 60} -> Min 800 MB/s Yellow{VM61 – 90} -> Min 2500 MB/s Blue{VM91 – 120} -> Min 1500 MB/s 31

Things to look for Distributed enforcement across 4 competing tenants  Aggressive tenant(s) under control Dynamic inter-tenant work conservation  Bandwidth released by idle tenant given to active tenants Dynamic intra-tenant work conservation  Bandwidth of tenant’s idle VMs given to its active VMs 32

Results Controller notices red tenant’s performance Tenants’ SLAs enforced. 120 queues cfg. 33 Inter-tenant work conservation Intra-tenant work conservation

Data plane overheads at 40Gbps RDMA Negligible in previous experiment. To bring out worst case varied IO sizes from 512Bytes to 64KB 34 Reasonable overheads for enforcing SLAs

Control plane overheads: network and CPU 35 Overheads (MB) <0.3% CPU overhead at controller Controller configures queue rules, receives statistics and updates token rates every interval

Before Next time Final Project Presentation/Demo – Due Friday, December 12. – Presentation and Demo – Written submission required: Report Website: index.html that points to report, presentation, and project (e.g. code) Required review and reading for Wednesday, December 3 – Plug into the Supercloud, D. Williams, H. Jamjoom, H. Weatherspoon. IEEE Internet Computing, Vol. 17, No 2, March/April 2013, pp – Check piazza: Check website for updated schedule