15-744: Computer Networking L-12 Data Center Networking I.

Slides:



Advertisements
Similar presentations
Data Center Fabrics Lecture 12 Aditya Akella.
Advertisements

A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric. Presented by: Vinuthna Nalluri Shiva Srivastava.
Data Center Networking Major Theme: What are new networking issues posed by large-scale data centers? Network Architecture? Topology design? Addressing?
Data Center Fabrics. Forwarding Today Layer 3 approach: – Assign IP addresses to hosts hierarchically based on their directly connected switch. – Use.
Applying NOX to the Datacenter Arsalan Tavakoli, Martin Casado, Teemu Koponen, and Scott Shenker 10/22/2009Hot Topics in Networks Workshop 2009.
ElasticTree: Saving Energy in Data Center Networks Brandon Heller, Srini Seetharaman, Priya Mahadevan, Yiannis Yiakoumis, Puneed Sharma, Sujata Banerjee,
Data Center Network Topologies: VL2 (Virtual Layer 2) Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems.
L-26 Cluster Computer (borrowed from Randy Katz, UCB)
Reconfigurable Network Topologies at Rack Scale
Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,
Datacenter Power State-of-the-Art Randy H. Katz University of California, Berkeley LoCal 0 th Retreat “Energy permits things to exist; information, to.
Portland: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric Offense Kai Chen Shih-Chi Chen.
A Computer Scientist Looks at the Energy Problem Randy H. Katz University of California, Berkeley EECS BEARS Symposium February 12, 2009 “Energy permits.
Data Center Network Topologies: FatTree
ProActive Routing In Scalable Data Centers with PARIS Joint work with Dushyant Arora + and Jennifer Rexford* + Arista Networks *Princeton University Theophilus.
Data Center Basics (ENCS 691K – Chapter 5)
CS : Creating the Grid OS—A Computer Science Approach to Energy Problems David E. Culler, Randy H. Katz University of California, Berkeley August.
A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat Presented by Gregory Peaker and Tyler Maclean.
Randy H. Katz 10 February 2009 Power Management in Computing Systems EE290N-3 Contemporary Energy Issues Randy H. Katz
A Scalable, Commodity Data Center Network Architecture Mohammad AI-Fares, Alexander Loukissas, Amin Vahdat Presented by Ye Tao Feb 6 th 2013.
A Scalable, Commodity Data Center Network Architecture
Jennifer Rexford Fall 2010 (TTh 1:30-2:50 in COS 302) COS 561: Advanced Computer Networks Data.
A Scalable, Commodity Data Center Network Architecture.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Networking the Cloud Presenter: b 電機三 姜慧如.
1 Lecture 20: WSC, Datacenters Topics: warehouse-scale computing and datacenters (Sections ) – the basics followed by a look at the future.
VL2 – A Scalable & Flexible Data Center Network Authors: Greenberg et al Presenter: Syed M Irteza – LUMS CS678: 2 April 2013.
© Copyright 2010 Hewlett-Packard Development Company, L.P. 1 Jayaram Mudigonda, HP Labs Praveen Yalagandula, HP Labs Mohammad Al-Fares, UCSD Jeff Mogul,
LAN Switching and Wireless – Chapter 1
Floodless in SEATTLE : A Scalable Ethernet ArchiTecTure for Large Enterprises. Changhoon Kim, Matthew Caesar and Jenifer Rexford. Princeton University.
LAN Switching and Wireless – Chapter 1 Vilina Hutter, Instructor
VL2: A Scalable and Flexible Data Center Network Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David.
Department of Computer Science A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares Alexander Loukissas Amin Vahdat SIGCOMM’08 Reporter:
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Authors: Xiaoqiao Meng, Vasileio Pappas and Li Zhang
Advanced Computer Networks Lecturer: E EE Eng. Ahmed Hemaid Office: I 114.
Data Center Energy-Efficient Network-Aware Scheduling
GreenCloud: A Packet-level Simulator of Energy-aware Cloud Computing Data Centers Dzmitry Kliazovich ERCIM Fellow University of Luxembourg Apr 16, 2010.
Spin-down Disk Model Not Spinning Spinning & Ready Spinning & Access Spinning & Seek Spinning up Spinning down Inactivity Timeout threshold* Request Trigger:
[Fat- tree]: A Scalable, Commodity Data Center Network Architecture
6.888: Lecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016  Slides adapted from presentations by Albert Greenberg and Changhoon.
Data Centers and Cloud Computing 1. 2 Data Centers 3.
PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis.
VL2: A Scalable and Flexible Data Center Network
Data Center Architectures
GreenCloud: A Packet-level Simulator of Energy-aware Cloud Computing Data Centers Dzmitry Kliazovich, Pascal Bouvry, Yury Audzevich, and Samee Ullah Khan.
Yiting Xia, T. S. Eugene Ng Rice University
Data Center Networking
15-744: Computer Networking
CIS 700-5: The Design and Implementation of Cloud Networks
Data Center Network Topologies II
Overview: Cloud Datacenters II
Overview: Cloud Datacenters
Lecture 2: Leaf-Spine and PortLand Networks
Data Center Network Architectures
A Survey of Data Center Network Architectures By Obasuyi Edokpolor
Green cloud computing 2 Cs 595 Lecture 15.
Revisiting Ethernet: Plug-and-play made scalable and efficient
Data Center Network Architectures
CS 268: Computer Networking
Addressing: Router Design
A Scalable, Commodity Data Center Network Architecture
2018/12/10 Energy Efficient SDN Commodity Switch based Practical Flow Forwarding Method Author: Amer AlGhadhban and Basem Shihada Publisher: 2016 IEEE/IFIP.
VL2: A Scalable and Flexible Data Center Network
Data Center Architectures
Elmo Muhammad Shahbaz Lalith Suresh, Jennifer Rexford, Nick Feamster,
Data Center Traffic Engineering
Presentation transcript:

15-744: Computer Networking L-12 Data Center Networking I

Overview Data Center Overview Routing in the DC Transport in the DC 2

Datacenter Arms Race Amazon, Google, Microsoft, Yahoo!, … race to build next-gen mega-datacenters Industrial-scale Information Technology 100,000+ servers Located where land, water, fiber-optic connectivity, and cheap power are available 3

Google Oregon Datacenter 4

5 Computers + Net + Storage + Power + Cooling

Overview Data Center Overview Routing in the DC Transport in the DC 6

Layer 2 vs. Layer 3 for Data Centers 7

Flat vs. Location Based Addresses Commodity switches today have ~640 KB of low latency, power hungry, expensive on chip memory Stores 32 – 64 K flow entries Assume 10 million virtual endpoints in 500,000 servers in datacenter Flat addresses  10 million address mappings  ~100 MB on chip memory  ~150 times the memory size that can be put on chip today Location based addresses  100 – 1000 address mappings  ~10 KB of memory  easily accommodated in switches today 8

PortLand: Main Assumption Hierarchical structure of data center networks: They are multi-level, multi-rooted trees 9

Background: Fat-Tree Inter-connect racks (of servers) using a fat-tree topology Fat-Tree: a special type of Clos Networks (after C. Clos) K-ary fat tree: three-layer topology (edge, aggregation and core) each pod consists of (k/2) 2 servers & 2 layers of k/2 k-port switches each edge switch connects to k/2 servers & k/2 aggr. switches each aggr. switch connects to k/2 edge & k/2 core switches (k/2) 2 core switches: each connects to k pods 10 Fat-tree with K=4

Why Fat-Tree? Fat tree has identical bandwidth at any bisections Each layer has the same aggregated bandwidth Can be built using cheap devices with uniform capacity Each port supports same speed as end host All devices can transmit at line speed if packets are distributed uniform along available paths Great scalability: k-port switch supports k 3 /4 servers 11 Fat tree network with K = 3 supporting 54 hosts

Data Center Network 12

Hierarchical Addresses 13

Hierarchical Addresses 14

Hierarchical Addresses 15

Hierarchical Addresses 16

Hierarchical Addresses 17

Hierarchical Addresses 18

PortLand: Location Discovery Protocol Location Discovery Messages (LDMs) exchanged between neighboring switches Switches self-discover location on boot up 19

Location Discovery Protocol 20

Location Discovery Protocol 21

Location Discovery Protocol 22

Location Discovery Protocol 23

Location Discovery Protocol 24

Location Discovery Protocol 25

Location Discovery Protocol 26

Location Discovery Protocol 27

Location Discovery Protocol 28

Location Discovery Protocol 29

Location Discovery Protocol 30

Location Discovery Protocol 31

Name Resolution 32

Name Resolution 33

Name Resolution 34

Name Resolution 35

Fabric Manager 36

Name Resolution 37

Name Resolution 38

Name Resolution 39

Other Schemes SEATTLE [SIGCOMM ‘08]: Layer 2 network fabric that works at enterprise scale Eliminates ARP broadcast, proposes one-hop DHT Eliminates flooding, uses broadcast based LSR Scalability limited by Broadcast based routing protocol Large switch state VL2 [SIGCOMM ‘09] Network architecture that scales to support huge data centers Layer 3 routing fabric used to implement a virtual layer 2 Scale Layer 2 via end host modifications Unmodified switch hardware and software End hosts modified to perform enhanced resolution to assist routing and forwarding 40

VL2: Name-Location Separation 41 payload ToR 3... y x Servers use flat names Switches run link-state routing and maintain only switch-level topology Cope with host churns with very little overhead yz payload ToR 4 z ToR 2 ToR 4 ToR 1 ToR 3 y, z payload ToR 3 z... Directory Service … x  ToR 2 y  ToR 3 z  ToR 4 … Lookup & Response … x  ToR 2 y  ToR 3 z  ToR 3 … Allows to use low-cost switches Protects network and hosts from host-state churn Obviates host and switch reconfiguration Allows to use low-cost switches Protects network and hosts from host-state churn Obviates host and switch reconfiguration

VL2: Random Indirection 42 xy payload T3T3 y z T5T5 z I ANY Cope with arbitrary TMs with very little overhead Links used for up paths Links used for down paths T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 [ ECMP + IP Anycast ] Harness huge bisection bandwidth Obviate esoteric traffic engineering or optimization Ensure robustness to failures Work with switch mechanisms available today [ ECMP + IP Anycast ] Harness huge bisection bandwidth Obviate esoteric traffic engineering or optimization Ensure robustness to failures Work with switch mechanisms available today

Next Lecture Data center topology Data center scheduling Required reading Efficient Coflow Scheduling with Varys c-Through: Part-time Optics in Data Centers 43

Energy Proportional Computing 44 Figure 1. Average CPU utilization of more than 5,000 servers during a six-month period. Servers are rarely completely idle and seldom operate near their maximum utilization, instead operating most of the time at between 10 and 50 percent of their maximum It is surprisingly hard to achieve high levels of utilization of typical servers (and your home PC or laptop is even worse) “The Case for Energy-Proportional Computing,” Luiz André Barroso, Urs Hölzle, IEEE Computer December 2007

Energy Proportional Computing 45 Figure 2. Server power usage and energy efficiency at varying utilization levels, from idle to peak performance. Even an energy-efficient server still consumes about half its full power when doing virtually no work. “The Case for Energy-Proportional Computing,” Luiz André Barroso, Urs Hölzle, IEEE Computer December 2007 Doing nothing well … NOT!

Energy Proportional Computing 46 Figure 4. Power usage and energy efficiency in a more energy-proportional server. This server has a power efficiency of more than 80 percent of its peak value for utilizations of 30 percent and above, with efficiency remaining above 50 percent for utilization levels as low as 10 percent. “The Case for Energy-Proportional Computing,” Luiz André Barroso, Urs Hölzle, IEEE Computer December 2007 Design for wide dynamic power range and active low power modes Doing nothing VERY well

Thermal Image of Typical Cluster 47 Rack Switch M. K. Patterson, A. Pratt, P. Kumar, “From UPS to Silicon: an end-to-end evaluation of datacenter efficiency”, Intel Corporation

48 DC Networking and Power Within DC racks, network equipment often the “hottest” components in the hot spot Network opportunities for power reduction Transition to higher speed interconnects (10 Gbs) at DC scales and densities High function/high power assists embedded in network element (e.g., TCAMs)

DC Networking and Power 96 x 1 Gbit port Cisco datacenter switch consumes around 15 kW -- approximately 100x a typical 150 W High port density drives network element design, but such high power density makes it difficult to tightly pack them with servers Alternative distributed processing/communications topology under investigation by various research groups 49

Summary Energy Consumption in IT Equipment Energy Proportional Computing Inherent inefficiencies in electrical energy distribution Energy Consumption in Internet Datacenters Backend to billions of network capable devices Enormous processing, storage, and bandwidth supporting applications for huge user communities Resource Management: Processor, Memory, I/O, Network to maximize performance subject to power constraints: “Do Nothing Well” New packaging opportunities for better optimization of computing + communicating + power + mechanical 50

51

Containerized Datacenters Sun Modular Data Center Power/cooling for 200 KW of racked HW External taps for electricity, network, water 7.5 racks: ~250 Servers, 7 TB DRAM, 1.5 PB disk 52

Containerized Datacenters 53

Aside: Disk Power IBM Microdrive (1inch) writing 300mA (3.3V) 1W standby 65mA (3.3V).2W IBM TravelStar (2.5inch) read/write 2W spinning 1.8W low power idle.65W standby.25W sleep.1W startup 4.7 W seek 2.3W

Spin-down Disk Model Not Spinning Spinning & Ready Spinning & Access Spinning & Seek Spinning up Spinning down Request Trigger: request or predict Predictive.2W W 2W 2.3W 4.7W Inactivity Timeout threshold*

Disk Spindown Disk Power Management – Oracle (off-line) Disk Power Management – Practical scheme (on-line) 56 access1 access2 IdleTime > BreakEvenTime Idle for BreakEvenTime Wait time Source: from the presentation slides of the authors

Spin-Down Policies Fixed Thresholds T out = spin-down cost s.t. 2*E transition = P spin *T out Adaptive Thresholds: T out = f (recent accesses) Exploit burstiness in T idle Minimizing Bumps (user annoyance/latency) Predictive spin-ups Changing access patterns (making burstiness) Caching Prefetching

Google Since 2005, its data centers have been composed of standard shipping containers-- each with 1,160 servers and a power consumption that can reach 250 kilowatts Google server was 3.5 inches thick--2U, or 2 rack units, in data center parlance. It had two processors, two hard drives, and eight memory slots mounted on a motherboard built by Gigabyte 58

Google's PUE In the third quarter of 2008, Google's PUE was 1.21, but it dropped to 1.20 for the fourth quarter and to 1.19 for the first quarter of 2009 through March 15 Newest facilities have