Lecture 8, Computer Networks (198:552)

Slides:



Advertisements
Similar presentations
Cs/ee 143 Communication Networks Chapter 6 Internetworking Text: Walrand & Parekh, 2010 Steven Low CMS, EE, Caltech.
Advertisements

The Case for Enterprise Ready Virtual Private Clouds Timothy Wood, Alexandre Gerber *, K.K. Ramakrishnan *, Jacobus van der Merwe *, and Prashant Shenoy.
CSE 534 Fundamentals of Computer Networks Lecture 4: Bridging (From Hub to Switch by Way of Tree) Based on slides from D. Choffnes Northeastern U. Revised.
CS 4700 / CS 5700 Network Fundamentals Lecture 7: Bridging (From Hub to Switch by Way of Tree) Revised 1/14/13.
Multi-Layer Switching Layers 1, 2, and 3. Cisco Hierarchical Model Access Layer –Workgroup –Access layer aggregation and L3/L4 services Distribution Layer.
Ethernet and switches selected topics 1. Agenda Scaling ethernet infrastructure VLANs 2.
Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,
PortLand Presented by Muhammad Sadeeq and Ling Su.
COS 461: Computer Networks
A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat Presented by Gregory Peaker and Tyler Maclean.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Data-Center Traffic Management COS 597E: Software Defined Networking.
Jennifer Rexford Fall 2010 (TTh 1:30-2:50 in COS 302) COS 561: Advanced Computer Networks Data.
A Scalable, Commodity Data Center Network Architecture.
Datacenter Networks Mike Freedman COS 461: Computer Networks
Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
(part 3).  Switches, also known as switching hubs, have become an increasingly important part of our networking today, because when working with hubs,
Chapter 1: Hierarchical Network Design
Networking the Cloud Presenter: b 電機三 姜慧如.
Cloud Scale Performance & Diagnosability Comprehensive SDN Core Infrastructure Enhancements vRSS Remote Live Monitoring NIC Teaming Hyper-V Network.
Floodless in SEATTLE : A Scalable Ethernet ArchiTecTure for Large Enterprises. Changhoon Kim, Matthew Caesar and Jenifer Rexford. Princeton University.
LAN Switching and Wireless – Chapter 1 Vilina Hutter, Instructor
VL2: A Scalable and Flexible Data Center Network Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David.
Vic Liu Liang Xia Zu Qiang Speaker: Vic Liu China Mobile Network as a Service Architecture draft-liu-nvo3-naas-arch-01.
Campus Networking Best Practices Hervey Allen NSRC & University of Oregon Dale Smith University of Oregon & NSRC
Cisco Confidential © 2013 Cisco and/or its affiliates. All rights reserved. 1 Cisco Networking Training (CCENT/CCT/CCNA R&S) Rick Rowe Ron Giannetti.
6.888: Lecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016  Slides adapted from presentations by Albert Greenberg and Changhoon.
Data Centers and Cloud Computing 1. 2 Data Centers 3.
VL2: A Scalable and Flexible Data Center Network
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
Data Center Architectures
Data Center Networking
Youngstown State University Cisco Regional Academy
Instructor Materials Chapter 1: LAN Design
CIS 700-5: The Design and Implementation of Cloud Networks
Instructor Materials Chapter 4: Introduction to Switched Networks
Lecture 2: Cloud Computing
CS 3700 Networks and Distributed Systems
Large-scale (Campus) Lan design (Part II)
Link Layer 5.1 Introduction and services
A Survey of Data Center Network Architectures By Obasuyi Edokpolor
Revisiting Ethernet: Plug-and-play made scalable and efficient
Planning and Troubleshooting Routing and Switching
Data Center Network Architectures
Chapter 4: Routing Concepts
Elastic Provisioning In Virtual Private Clouds
CS 3700 Networks and Distributed Systems
Introduction to Networks
Instructor Materials Chapter 4: Introduction to Switched Networks
Chapter 4: Switched Networks
Cloud Computing.
NTHU CS5421 Cloud Computing
GGF15 – Grids and Network Virtualization
Aled Edwards, Anna Fischer, Antonio Lain HP Labs
IS3120 Network Communications Infrastructure
湖南大学-信息科学与工程学院-计算机与科学系
CS 4700 / CS 5700 Network Fundamentals
Chapter 4: Switched Networks
Marrying OpenStack and Bare-Metal Cloud
Managing Clouds with VMM
NTHU CS5421 Cloud Computing
CS 4700 / CS 5700 Network Fundamentals
VL2: A Scalable and Flexible Data Center Network
Internet and Web Simple client-server model
Specialized Cloud Architectures
Centralized Arbitration for Data Centers
Data Center Networks Mohammad Alizadeh Fall 2018
CS434/534: Topics in Network Systems Cloud Data Centers: Topology, Control; VL2 Yang (Richard) Yang Computer Science Department Yale University 208A.
CS 401/601 Computer Network Systems Mehmet Gunes
Lecture 9, Computer Networks (198:552)
Data Center Traffic Engineering
Presentation transcript:

Lecture 8, Computer Networks (198:552) Data Centers Part II Lecture 8, Computer Networks (198:552) Fall 2019 Slide material heavily adapted courtesy of Albert Greenberg, Changhoon Kim, Mohammad Alizadeh.

What are data centers? Large facilities with 10s of thousands of networked servers Compute, storage, and networking working in concert “Warehouse-Scale Computers”

Cloud Computing: The View from Apps On-demand Use resources when you need it; pay-as-you-go Elastic Scale up & down based on demand Multi-tenancy Multiple independent users share infrastructure Security and resource isolation SLAs on performance & reliability Dynamic Management Resiliency: isolate failure of servers and storage Workload movement: move work to other locations

What’s different about DCNs? Single administrative domain Change all endpoints and switches if you want No need to be compatible with outside world Unique network properties Tiny round trip times (microseconds) Massive multipath topologies Shallow-buffered switches Latency and tail-latency critical Network is a backplane for large-scale parallel computation Together, serious implications for the transport, network, link layer designs you can (and should) use

Challenges in DCNs

Data center costs Amortized Cost* Component Sub-Components ~45% Servers CPU, memory, disk ~25% Power infrastructure UPS, cooling, power distribution ~15% Power draw Electrical utility costs Network Switches, links, transit Source: http://web.cse.ohio-state.edu/icdcs2009/Keynote_files/greenberg-keynote.pdf The Cost of a Cloud: Research Problems in Data Center Networks. Sigcomm CCR 2009. Greenberg, Hamilton, Maltz, Patel. *3 yr amortization for servers, 15 yr for infrastructure; 5% cost of money

Server costs 30% server utilization considered “good” in data centers Application demands uneven across the resources Each server has CPU, memory, disk: most applications exhaust one resource, stranding the others Long provisioning timescales New servers purchased quarterly at best Uncertainty in demand Demand for a new service can spike quickly Risk management Not having spare servers to meet demand brings failure just when success is at hand Session state and storage constraints If the world were stateless servers, life would be good Source: Albert Greenberg keynote, ICDCS 2009

Goal: Agility -- any service, any server Turn the servers into a single large fungible pool Dynamically expand and contract service footprint as needed Place workloads where server resources are available Easier to maintain availability If one rack goes down, machines from another still available Want to view DCN as a pool of compute connected by one big high-speed fabric

Steps to achieving Agility Workload (compute) management Means for rapidly installing a service’s code on a server Virtual machines, disk images, containers Storage management Means for a server to access persistent data Distributed filesystems (e.g., HDFS, blob stores) Network and Routing management Means for communicating with other servers, regardless of where they are in the data center ??

Agility means that the DCN needs… Massive bisection bandwidth Topologies Routing (Multiple paths  Load balancing) Ultra-Low latency (<10 microseconds) The right transport? Switch scheduling/buffer management? Schedule packets or control transmission rates? Centralized or distributed control? Effective Resource Management (across servers & switches) Multi-tenant performance isolation App-aware network scheduling (e.g., for big data) Support for next-generation hardware & apps ML, RDMA, rack-scale computing, memory disaggregation

Conventional DC network Internet CR CR DC-Layer 3 . . . AR AR AR AR DC-Layer 2 Key CR = Core Router (L3) AR = Access Router (L3) S = Ethernet Switch (L2) A = Rack of app. servers S S . . . S S S S A A … A A A … A ~ 1,000 servers/pod == IP subnet Source: “Data Center: Load balancing Data Center Services”, Cisco 2004

Layer 2 vs. Layer 3 Ethernet switching (layer 2) Fixed IP addresses and auto-configuration (plug & play) Seamless mobility, migration, and failover Broadcast limits scale (ARP) Spanning Tree Protocol: no multipath routing IP routing (layer 3) Scalability through hierarchical addressing Multipath routing through equal-cost multipath More complex configuration Can’t migrate w/o changing IP address

Conventional DC Network Problems CR CR ~ 200:1 AR AR AR AR S S S S ~ 40:1 . . . S S S S S S S S ~ 5:1 A A … A A A … A A A … A A A … A Dependence on high-cost proprietary routers Extremely limited server-to-server capacity

Conventional DC Network Problems CR CR ~ 200:1 AR AR AR AR S S S S S S S S S S S S A A … A A A … A A A … A A A A … A Green host from the green IP subnet / VLAN#2 cannot respond to the request from the machine in the IP subnet / VLAN#1! This means that special rules must be installed on the switches to route VLAN 2 data within the sub-tree under the second pair of AR’s in this network. IP subnet (VLAN) #1 IP subnet (VLAN) #2 Resource fragmentation, significantly lowering cloud utilization (and cost-efficiency)

Complicated manual L2/L3 re-configuration Conventional DC Network Problems CR CR ~ 200:1 AR AR Complicated manual L2/L3 re-configuration AR AR S S S S S S S S S S S S A A … A A A … A A A … A A A A … A IP subnet (VLAN) #1 IP subnet (VLAN) #2 Resource fragmentation, significantly lowering cloud utilization (and cost-efficiency)

Google’s DCN Challenges & Approaches Source: Jupiter rising: a decade of Clos topologies and centralized control in Google’s data center network. SIGCOMM’15

Building a high-speed switching fabric

A single (n X m)-port switching fabric Different designs of switching fabric possible Assume n ingress ports and m egress ports, half duplex links

A single (n X m)-port switching fabric Electrical/mechanical/electronic crossover We are OK with any design such that: Any port can connect to any other directly if all other ports free Nonblocking: if input port x and output port y are both free, they should be able to connect Regardless of other ports being connected. If not satisfied, switch is blocking.

High port density + nonblocking == hard! Low-cost nonblocking crossbars are feasible for small # ports However, it is costly to be nonblocking with a large number of ports If each crossover is as fast as each input port, Number of crossover points == n * m Cost grows quadratically on the number of input ports Else, crossover must transition faster than the port … so that you can keep the number of crossovers small Q: How is this relevant to the data center network fabric?