Data Center Networks Mohammad Alizadeh Fall 2018

Slides:



Advertisements
Similar presentations
Data Center Networking
Advertisements

Data Center Networking with Multipath TCP
Introduction into VXLAN Russian IPv6 day June 6 th, 2012 Frank Laforsch Systems Engineer, EMEA
1 o Two issues in practice – Scale – Administrative autonomy o Autonomous system (AS) or region o Intra autonomous system routing protocol o Gateway routers.
Data Center Networking Major Theme: What are new networking issues posed by large-scale data centers? Network Architecture? Topology design? Addressing?
Data Center Fabrics. Forwarding Today Layer 3 approach: – Assign IP addresses to hosts hierarchically based on their directly connected switch. – Use.
Multi-Layer Switching Layers 1, 2, and 3. Cisco Hierarchical Model Access Layer –Workgroup –Access layer aggregation and L3/L4 services Distribution Layer.
Data Center Network Topologies: VL2 (Virtual Layer 2) Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems.
Datacenter Network Topologies
Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,
ProActive Routing In Scalable Data Centers with PARIS Joint work with Dushyant Arora + and Jennifer Rexford* + Arista Networks *Princeton University Theophilus.
COS 461: Computer Networks
CONGA: Distributed Congestion-Aware Load Balancing for Datacenters
VL2: A Scalable and Flexible data Center Network
Jennifer Rexford Princeton University MW 11:00am-12:20pm Data-Center Traffic Management COS 597E: Software Defined Networking.
Jennifer Rexford Fall 2010 (TTh 1:30-2:50 in COS 302) COS 561: Advanced Computer Networks Data.
A Scalable, Commodity Data Center Network Architecture.
Datacenter Networks Mike Freedman COS 461: Computer Networks
Networking the Cloud Presenter: b 電機三 姜慧如.
VL2 – A Scalable & Flexible Data Center Network Authors: Greenberg et al Presenter: Syed M Irteza – LUMS CS678: 2 April 2013.
DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang.
On the Data Path Performance of Leaf-Spine Datacenter Fabrics Mohammad Alizadeh Joint with: Tom Edsall 1.
Cloud Scale Performance & Diagnosability Comprehensive SDN Core Infrastructure Enhancements vRSS Remote Live Monitoring NIC Teaming Hyper-V Network.
Floodless in SEATTLE : A Scalable Ethernet ArchiTecTure for Large Enterprises. Changhoon Kim, Matthew Caesar and Jenifer Rexford. Princeton University.
Microsoft Windows Server 2003 TCP/IP Protocols and Services Technical Reference Slide: 1 Lesson 7 Internet Protocol (IP) Routing.
VL2: A Scalable and Flexible Data Center Network Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David.
Campus Networking Best Practices Hervey Allen NSRC & University of Oregon Dale Smith University of Oregon & NSRC
Data Center Load Balancing T Seminar Kristian Hartikainen Aalto University, Helsinki, Finland
6.888: Lecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016  Slides adapted from presentations by Albert Greenberg and Changhoon.
VL2: A Scalable and Flexible Data Center Network
Data Center Architectures
Data Center Networking
HULA: Scalable Load Balancing Using Programmable Data Planes
CIS 700-5: The Design and Implementation of Cloud Networks
Lecture 2: Cloud Computing
Data Center Network Topologies II
Overview: Cloud Datacenters
How I Learned to Stop Worrying About the Core and Love the Edge
Heitor Moraes, Marcos Vieira, Italo Cunha, Dorgival Guedes
Lecture 2: Leaf-Spine and PortLand Networks
Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.
Scaling the Network: The Internet Protocol
Networking Recap Storage Intro
Revisiting Ethernet: Plug-and-play made scalable and efficient
Data Center Network Architectures
Improving Datacenter Performance and Robustness with Multipath TCP
NOX: Towards an Operating System for Networks
Chapter 4 Data Link Layer Switching
Chapter 4: Routing Concepts
Improving Datacenter Performance and Robustness with Multipath TCP
NTHU CS5421 Cloud Computing
Congestion-Aware Load Balancing at the Virtual Edge
湖南大学-信息科学与工程学院-计算机与科学系
NTHU CS5421 Cloud Computing
VL2: A Scalable and Flexible Data Center Network
Internet and Web Simple client-server model
Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.
Data Center Architectures
Centralized Arbitration for Data Centers
Scaling the Network: The Internet Protocol
Congestion-Aware Load Balancing at the Virtual Edge
CS434/534: Topics in Network Systems Cloud Data Centers: VL2 Control; VLB/ECMP Load Balancing Routing Yang (Richard) Yang Computer Science Department.
CS434/534: Topics in Network Systems Cloud Data Centers: Topology, Control; VL2 Yang (Richard) Yang Computer Science Department Yale University 208A.
2019/5/13 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Peng Wang, George Trimponias, Hong Xu,
CS 401/601 Computer Network Systems Mehmet Gunes
Elmo Muhammad Shahbaz Lalith Suresh, Jennifer Rexford, Nick Feamster,
Reconciling Zero-conf with Efficiency in Enterprises
Lecture 8, Computer Networks (198:552)
Lecture 9, Computer Networks (198:552)
Data Center Traffic Engineering
Presentation transcript:

Data Center Networks Mohammad Alizadeh Fall 2018 6.829: Computer Networks Data Center Networks Mohammad Alizadeh Fall 2018 Slides adapted from presentations by Albert Greenberg and Changhoon Kim (Microsoft)

What are Data Centers? Large facilities with 10s of thousands of networked servers Compute, storage, and networking working in concert “Warehouse-Scale Computers” Huge investment: ~ 0.5 billion for large datacenter

Data Center Costs Amortized Cost* Component Sub-Components ~45% Servers CPU, memory, disk ~25% Power infrastructure UPS, cooling, power distribution ~15% Power draw Electrical utility costs Network Switches, links, transit The Cost of a Cloud: Research Problems in Data Center Networks. Sigcomm CCR 2009. Greenberg, Hamilton, Maltz, Patel. *3 yr amortization for servers, 15 yr for infrastructure; 5% cost of money

Server Costs 30% utilization considered “good” in most data centers! Uneven application fit Each server has CPU, memory, disk: most applications exhaust one resource, stranding the others Uncertainty in demand Demand for a new service can spike quickly Risk management Not having spare servers to meet demand brings failure just when success is at hand

Goal: Agility – Any service, Any Server Turn the servers into a single large fungible pool Dynamically expand and contract service footprint as needed Benefits Lower cost (higher utilization) Increase developer productivity Achieve high performance and reliability Increase productivity because services can be deployed much faster

Provide the illusion of Datacenter Networks 10,000s of ports Compute Storage (Disk, Flash, …) Provide the illusion of “One Big Switch”

Datacenter Traffic Growth Today: Petabits/s in one DC More than core of the Internet! Source: “Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network”, SIGCOMM 2015.

Latency is King 1 user request  1000s of messages over DC network Who does she know? What has she done? Traditional Application Large-scale Web Application Data Center App Tier App Logic App. Logic App Logic Alice << 1µs latency Fabric 10μs-1ms latency Data Structures Single machine Data Tier 1 user request  1000s of messages over DC network Microseconds of latency matter Even at the tail (e.g., 99.9th percentile) Eric Minnie Pics Apps Videos Based on slide by John Ousterhout (Stanford)

Conventional DC Network Problems

Conventional DC Network Internet — L2 pros, cons? — L3 pros, cons? CR CR DC-Layer 3 . . . AR AR AR AR DC-Layer 2 Key CR = Core Router (L3) AR = Access Router (L3) S = Ethernet Switch (L2) A = Rack of app. servers S S . . . S S S S A A … A A A … A ~ 1,000 servers/pod == IP subnet Reference – “Data Center: Load balancing Data Center Services”, Cisco 2004

Reminder: Layer 2 vs. Layer 3 Ethernet switching (layer 2) Fixed IP addresses and auto-configuration (plug & play) Seamless mobility, migration, and failover Broadcast limits scale (ARP) No multipath (Spanning Tree Protocol) IP routing (layer 3) Scalability through hierarchical addressing Multipath routing through equal-cost multipath Can’t migrate w/o changing IP address Complex configuration

Conventional DC Network Problems CR CR ~ 200:1 AR AR AR AR S S S S ~ 40:1 . . . S S S S S S S S ~ 5:1 A A … A A A … A A A … A A A … A Extremely limited server-to-server capacity

Conventional DC Network Problems CR CR ~ 200:1 AR AR AR AR S S S S S S S S S S S S A A … A A A … A A A … A A A A … A IP subnet (VLAN) #1 IP subnet (VLAN) #2 Extremely limited server-to-server capacity Resource fragmentation

Complicated manual L2/L3 re-configuration And More Problems … CR CR ~ 200:1 AR AR AR AR Complicated manual L2/L3 re-configuration S S S S S S S S S S S S A A … A A A … A A A … A A A A … A IP subnet (VLAN) #1 IP subnet (VLAN) #2

VL2 Paper Measurements VL2 Design Clos topology Valiant LB Name/location separation (precursor to network virtualization) http://research.microsoft.com/en-US/news/features/datacenternetworking-081909.aspx

The Illusion of a Huge L2 Switch 3. Performance isolation VL2 Goals The Illusion of a Huge L2 Switch 1. L2 semantics 2. Uniform high capacity 3. Performance isolation A A A A A … A A A A A A … A A A A A A A A A A … A A A A A A A A A A A A … A A A A

Offer huge capacity via multiple paths (scale out, not up) Clos Topology Offer huge capacity via multiple paths (scale out, not up) VL2 Int . . . Aggr . . . . . . . . . . . . TOR . . . . . . . . . . . 20 Servers

https://code. facebook https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-next-generation-facebook-data-center-network/ https://youtu.be/mLEawo6OzFM?t=56

Building Block: Merchant Silicon Switching Chips Switch ASIC 6 pack Facebook Wedge Image courtesy of Facebook

ECMP Load Balancing Pick among equal-cost paths by a hash of 5-tuple Randomized load balancing Preserves packet order Problems? Hash collisions H(f) % 3 = 0

Do Hash Collisions Matter? VL2 Paper: Probabilistic flow distribution would work well enough, because … Flows are numerous and not huge – no elephants Commodity switch-to-switch links are substantially thicker (~ 10x) than the maximum thickness of a flow ? ?

Impact of Fabric vs. NIC Speed Differential Three non-oversubscribed topologies: 5 racks, 100x10G servers 20×10Gbps Downlinks Uplinks 5×40Gbps 2×100Gbps Link speed is an important decision for network scale & cost. Here, we ask, what is impact on perf?

Impact of Fabric vs. NIC Speed Differential “Web search” workload Better 40/100Gbps fabric: ~ same FCT as OQ 10Gbps fabric: FCT up 40% worse than OQ

Intuition Higher speed links improve probabilistic routing 20×10Gbps Uplinks 2×100Gbps 11×10Gbps flows (55% load) Prob of 100% throughput = 3.27% 1 2 20 Prob of 100% throughput = 99.95% 1 2

DC Load Balancing Research Centralized Distributed [Hedera, Planck, Fastpass, …] In-Network Host-Based Cong. Oblivious [Presto, Juggler] Cong. Aware [MPTCP, MP-RDMA, Clove, FlowBender…] Cong. Oblivious [ECMP, WCMP, packet-spray, …] Cong. Aware [Flare, TeXCP, CONGA, DeTail, HULA, LetFlow, …]

VL2 Performance Isolation Why does VL2 need TCP for performance isolation?

Addressing and Routing: Name-Location Separation VL2 Switches run link-state routing and maintain only switch-level topology Directory Service Allows to use low cost switches Protects network from host-state churn … x  ToR2 y  ToR3 z  ToR3 … x  ToR2 y  ToR3 z  ToR4 ToR1 . . . ToR2 . . . ToR3 . . . ToR4 ToR3 y payload Lookup & Response x y y, z z ToR4 ToR3 z z payload payload Servers use flat names

VL2 Agent in Action VLB ECMP Why use hash for Src IP? H(ft) 10.1.1.1 dst IP src IP H(ft) dst IP 10.0.0.6 Int (10.1.1.1) 20.0.0.55 src AA 20.0.0.56 dst AA payload (10.0.0.4) ToR (20.0.0.1) (10.0.0.6) ToR (20.0.0.1) VLB Why hash? Why double encap? ECMP VL2 Agent Why use hash for Src IP? Why anycast & double encap?

VL2 Topology . . . TOR 20 Servers Int . . . . . . Aggr . . . . . . . .

VL2 Directory System Read-optimized Directory Servers for lookups Write-optimized Replicated State Machines for updates Stale mappings? Directory servers: low latency, high throughput, high availability for a high lookup rate RSM: strongly consistent, reliable store of AA-to-LA mappings Reactive cache updates: stale host mapping needs to be corrected only when that mapping is used to deliver traffic. Forward non-deliverable packets to a directory server, so directory server corrects stale mapping in source’s stale cache via unicast

Other details How does L2 broadcast work? How does Internet communication work? Broadcast? -- no ARP (directory system handles it) -- DHCP: intercepted at ToR using conventional DHCP relay agents and unicast forwarded to DHCP servers. -- IP multicast per service, rate-limited Internet? -- L3 routing fabric for network, so external traffic does not need headers rewritten. -- servers: have LA AND AA. LA is externally reachable and announced via BGP -- Internet > Intermediate Switches < Server with LA