VL2: A Scalable and Flexible data Center Network

Slides:

Advertisements

Similar presentations

Improving Datacenter Performance and Robustness with Multipath TCP

Advertisements

Traffic Engineering with Forward Fault Correction (FFC)

Software-defined networking: Change is hard Ratul Mahajan with Chi-Yao Hong, Rohan Gandhi, Xin Jin, Harry Liu, Vijay Gill, Srikanth Kandula, Mohan Nanduri,

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.

Jaringan Komputer Lanjut Packet Switching Network.

PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric. Presented by: Vinuthna Nalluri Shiva Srivastava.

Data Center Fabrics. Forwarding Today Layer 3 approach: – Assign IP addresses to hosts hierarchically based on their directly connected switch. – Use.

Improving Datacenter Performance and Robustness with Multipath TCP Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik,

Data Center Network Topologies: VL2 (Virtual Layer 2) Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems.

60 GHz Flyways: Adding multi-Gbps wireless links to data centers

Virtual Layer 2: A Scalable and Flexible Data-Center Network Work with Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Parantap Lahiri,

Traffic Engineering With Traditional IP Routing Protocols

Data Center Network Topologies: FatTree

ProActive Routing In Scalable Data Centers with PARIS Joint work with Dushyant Arora + and Jennifer Rexford* + Arista Networks *Princeton University Theophilus.

Jennifer Rexford Princeton University MW 11:00am-12:20pm Data-Center Traffic Management COS 597E: Software Defined Networking.

Jennifer Rexford Fall 2010 (TTh 1:30-2:50 in COS 302) COS 561: Advanced Computer Networks Data.

Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.

A Scalable, Commodity Data Center Network Architecture.

Datacenter Networks Mike Freedman COS 461: Computer Networks

1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:

ElasticTree: Saving Energy in Data Center Networks 許倫愷 2013/5/28.

1 The Google File System Reporter: You-Wei Zhang.

Networking the Cloud Presenter: b 電機三姜慧如.

VL2 – A Scalable & Flexible Data Center Network Authors: Greenberg et al Presenter: Syed M Irteza – LUMS CS678: 2 April 2013.

Network Aware Resource Allocation in Distributed Clouds.

Routing & Architecture

DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang.

1 Energy in Networks & Data Center Networks Department of EECS University of Tennessee, Knoxville Yanjun Yao.

Computer Networks Performance Metrics. Performance Metrics Outline Generic Performance Metrics Network performance Measures Components of Hop and End-to-End.

Floodless in SEATTLE : A Scalable Ethernet ArchiTecTure for Large Enterprises. Changhoon Kim, Matthew Caesar and Jenifer Rexford. Princeton University.

A.SATHEESH Department of Software Engineering Periyar Maniammai University Tamil Nadu.

Optimal Content Delivery with Network Coding Derek Leong, Tracey Ho California Institute of Technology Rebecca Cathey BAE Systems CISS 2009 March 19, 2009.

S4-Chapter 3 WAN Design Requirements. WAN Technologies Leased Line –PPP networks –Hub and Spoke Topologies –Backup for other links ISDN –Cost-effective.

VL2: A Scalable and Flexible Data Center Network Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David.

Jiaxin Cao, Rui Xia, Pengkun Yang, Chuanxiong Guo,

Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research.

6.888: Lecture 2 Data Center Network Architectures Mohammad Alizadeh Spring 2016  Slides adapted from presentations by Albert Greenberg and Changhoon.

Performance Comparison of Ad Hoc Network Routing Protocols Presented by Venkata Suresh Tamminiedi Computer Science Department Georgia State University.

MMPTCP: A Multipath Transport Protocol for Data Centres 1 Morteza Kheirkhah University of Edinburgh, UK Ian Wakeman and George Parisis University of Sussex,

Ananta: Cloud Scale Load Balancing Presenter: Donghwi Kim 1.

VL2: A Scalable and Flexible Data Center Network

Data Center Architectures

Yiting Xia, T. S. Eugene Ng Rice University

Data Center Networking

CIS 700-5: The Design and Implementation of Cloud Networks

Lecture 2: Cloud Computing

Data Center Network Topologies II

Heitor Moraes, Marcos Vieira, Italo Cunha, Dorgival Guedes

Data Center Network Architectures

Data Centers: Network Architecture

Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.

Scaling the Network: The Internet Protocol

Revisiting Ethernet: Plug-and-play made scalable and efficient

Data Center Network Architectures

Authors: Sajjad Rizvi, Xi Li, Bernard Wong, Fiodar Kazhamiaka

Improving Datacenter Performance and Robustness with Multipath TCP

NTHU CS5421 Cloud Computing

(Lecture #1) 12/21/2009.

湖南大学-信息科学与工程学院-计算机与科学系

NTHU CS5421 Cloud Computing

VL2: A Scalable and Flexible Data Center Network

Internet and Web Simple client-server model

Data Center Architectures

Data Center Networks Mohammad Alizadeh Fall 2018

Scaling the Network: The Internet Protocol

Towards Predictable Datacenter Networks

Lecture 8, Computer Networks (198:552)

Lecture 9, Computer Networks (198:552)

Data Center Traffic Engineering

Presentation transcript:

VL2: A Scalable and Flexible data Center Network CS538 10/23/2014 Presentation by: Soteris Demetriou Scribe: Cansu Erdogan

Credits Some of the slides were used in their original form or adjusted from Assistant Professor’s Hakim Weatherspoon (Cornell) Those slides are annotated with a * on the top right hand side of the slide.

Paper Details Title Authors Venue Citations VL2: A Scalable and Flexible Data Center Network Authors Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Paveen Patel, Sudipta Sengupta Microsoft Research Venue Proceedings of the ACM SIGCOMM 2009 conference on Data communication Citations 918

Overview Problem: Conventional Data Center Networks do not provide agility. I.e assigning any service to any server efficiently is challenging. Approach: Merge layer 2 and layer 3 into a virtual layer 2 (VL2). How? Use of flat addressing to provide Layer-2 semantics, Valiant Load Balancing for uniform high capacity between servers and TCP to ensure performance isolation. Findings: VL2 can provide uniform high capacity between any 2 servers, performance isolation between services and agility through layer-2 semantics

Outline Background Motivation Measurement Study VL2 Evaluation Summary and Discussion

Outline Background Motivation Measurement Study VL2 Evaluation Summary and Discussion

Clos Topology Multistage circuit switching network Usage: when the physical network exceeds the capacity of the largest crossbar switch (MxN)

Clos Topology Ingress Stage Middle Stage Egress Stage

Clos Topology m n r

Traffic Matrix Traffic rate between a node and every other node E.g network with N nodes Each node N connects to every other node NxN flows  NxN representative matrix Valid: a valid Traffic Matrix is one that ensures no node is oversubscribed

Valiant Load Balancing Keslassy et al. proved that uniform Valiant load-balancing is the unique architecture which requires the minimum node capacity in interconnecting a set of identical nodes Zhang-Shen et al used it to design a predictable network backbone Intuition: It is much easier to estimate the aggregate traffic entering and leaving a node than to estimate a complete traffic matrix (traffic rate from every node to every other node) A valid traffic matrix approach where an ingress to egress path is used, requires link capacity = node capacity (= r) VLB load balances traffic among any two-hop paths Link capacity among any 2 nodes: r/N + r/N = r * 2/N http://yuba.stanford.edu/~nickm/papers/HotNetsIII.pdf Zhang-Shen, Rui, and Nick McKeown. "Designing a predictable Internet backbone network." HotNets, 2004.

Valiant Load Balancing Backbone network Point of Presence Consider a backbone network consisting of multiple PoPs interconnected by long-haul links. The whole network is arranged as a hierarchy, and each PoP connects an access network to the backbone (see Figure 1). Although traffic matrices are hard to obtain, it is straightforward to measure, or estimate, the total amount of traffic entering (leaving) a PoP from (to) its access network. When a new customer joins the network, we add its aggregate traffic rate to the node. When new locations are planned, the aggregate traffic demand for a new node can be estimated from the population that the node serves. This is much easier than trying to estimate the traffic rates from one node to every other node in the backbone. Imagine full mesh between nodes in the backbone network Traffic entering the backbone network will be spread equally across all the nodes A flow is load balanced across every two-hop path from its ingress to egress node Thus Each packet traverses the network twice Link capacity analysis Stage 1: Each node uniformly distributes its traffic to every other node Thus each node receives 1/Nth of each node’s traffic The incoming traffic rate to each node is at most r, But traffic is evenly distributed among N nodes  each link requires r/N capacity Stage 2: All packets are delivered to final destination Each node can receive traffic at a maximum rate of r And it receives 1/N of the traffic from every other node  traffic on the link is at most r/N Total: 2* r/N Access Network Zhang-Shen, Rui, and Nick McKeown. "Designing a predictable Internet backbone network." HotNets, 2004.

Outline Background Motivation Measurement Study VL2 Evaluation Summary and Discussion

Conventional Data Center Network Architecture * Conventional Data Center Network Architecture As shown in Figure 1, the network is a hierarchy reaching from a layer of servers in racks at the bottom to a layer of core routers at the top. There are typically 20 to 40 servers per rack, each singly connected to a Top of Rack (ToR) switch with a 1 Gbps link. ToRs connect to two aggregation switches for redundancy, and these switches aggregate further connecting to access routers. At the top of the hierarchy, core routers carry traffic between access routers and manage traffic into and out of the data center. All links use Ethernet as a physical-layer protocol, with a mix of copper and fiber cabling. All switches below each pair of access routers form a single layer-2 domain, typically connecting several thousand servers. To limit overheads (e.g., packet flooding and ARP broadcasts) and to isolate different services or logical server groups (e.g., email, search, web front ends, web back ends), servers are partitioned into virtual LANs (VLANs). Unfortunately, this conventional design suffers from some fundamental limitations

* DCN Problems . . . I have spare ones, but… 1:5 1:240 I want more CR CR 1:240 AR AR AR AR S S I have spare ones, but… S S I want more 1:80 . . . S S S S S S S S 1:5 A A … A A A … A A A … A A A … A Static network assignment Fragmentation of resource Poor server to server connectivity Traffics affects each other Poor reliability and utilization

* End Result The Illusion of a Huge L2 Switch . . . . . . CR AR S 1. L2 semantics . . . 2. Uniform high capacity 3. Performance isolation . . . A A A A A … A A A A A A … A A A A A A A A A A … A A A A A A A A A A A A … A A A A

* Objectives Uniform high capacity: Performance isolation: Maximum rate of server to server traffic flow should be limited only by capacity on network cards Assigning servers to service should be independent of network topology Performance isolation: Traffic of one service should not be affected by traffic of other services Layer-2 semantics: Easily assign any server to any service Configure server with whatever IP address the service expects VM keeps the same IP address even after migration

Outline Background Motivation Measurement Study VL2 Evaluation Summary and Discussion

Methodology Setting design objectives Deriving typical workloads Interview stakeholders to derive a set of objectives Deriving typical workloads Measurement study on traffic patterns Data-center traffic analysis Flow distribution analysis Traffic matrix analysis Failure characteristics

Measurement Study 2 main questions Who sends how much data, to whom and when How often does the state of the network change due to changes in demand, or switch/link failures and recoveries Studied production data centers of a large cloud provider

Data-Center Traffic Analysis Setting Instrumentation of a highly utilized cluster in a data-center Cluster of 15,000 nodes Center supports data mining on PB of data Servers are distributed roughly evenly among 75 ToR (Top of Rack) switches which are connected hierarchically Setting Instrumentation of a highly utilized cluster in a data-center Cluster of 15,000 nodes Center supports data mining on PB of data Servers are distributed roughly evenly among 75 ToR (Top of Rack) switches which are connected hierarchically

Data-Center Traffic Analysis Ratio of traffic volume between servers in the data-center with traffic entering/leaving the data-center is 4:1 Bandwidth demand between servers inside a data-center shows grows faster than bandwidth demand to external hosts Network is the bottleneck of computation

Flow Distribution Analysis Majority of flows are small (few KB) in par with Internet flows Why? Mostly hellos and meta-data requests to the distributed file system Almost all bytes (>90%) are transported in flows of 100MB to 1GB size Mode is around 100MB The distributed file system breaks long files into 100-MB chunks Flows over a few GB are rear

Flow Distribution Analysis The distribution of internal flows is simpler than that of internet flows and more uniform

Flow Distribution Analysis 2 modes >50% of the time, an average machine has ~10 concurrent flows At least 5% of the time it has >80 concurrent flows Implies that randomizing path selection at flow granularity will not cause perpetual congestion in case of unlucky placement of flows

Traffic Matrix Analysis Poor summarizibility of traffic patterns Even when approximating with 50-60 clusters, fitting error remains high (60%) Engineering for just a few traffic matrices is unlikely to work well for “real” traffic in data centers Instability of traffic patterns Traffic pattern shows no periodicity that can be exploited for prediction

Failure Characteristics 1/2 Failure definition The event that occurs when a system or component is unable to perform its required function for more than 30s Most failures are small in size 50% of network failures involve < 4 devices 95% of network failures involve < 20 devices Downtimes can be significant 95% are resolved in 10 min 98% in < 1 hour 99.6% in < 1 day 0.09% last > 10 days

Failure Characteristics 2/2 In 0.3% of failures all redundant components in a network device group became unavailable Main causes of downtimes Network misconfigurations Firmware bugs Faulty components No obvious way to eliminate all failures from the top of the hierarchy

Outline Background Motivation Measurement Study VL2 Evaluation Summary and Discussion

* Objectives Methodology: Interviews with architects, developers and operators Uniform high capacity: Maximum rate of server to server traffic flow should be limited only by capacity on network cards Assigning servers to service should be independent of network topology Performance isolation: Traffic of one service should not be affected by traffic of other services Layer-2 semantics: Easily assign any server to any service Configure server with whatever IP address the service expects VM keeps the same IP address even after migration

Design Overview Flat Addressing Directory System (resolution service) Layer-2 semantics Name – Location Separation Enforce hose model using existing mechanisms Performance Isolation TCP Valiant Load Balancing Guarantee bandwidth for hose-model traffic Uniform high-capacity Scale-out CLOS topology Solution Approach Objective

Design Overview Randomizing to cope with unpredictability and volatility Valiant Load Balancing: Destination independent traffic spreading across multiple intermediate nodes Clos topology is used to support randomization Proposal of a flow spreading mechanism

Design Overview Building on proven technologies VL2 is based on IP routing and forwarding technologies that are available in commodity switches Link state routing Maintains switch-level topology Does not disseminate end hosts’ info Equal-Cost Multi-Path forwarding with anycast addresses Enables VLB with minimal control plane messaging

Design Overview Separating names from locators Enables agility rapid VM migration Use of Application addresses (AA) Location Addresses (LA) Directory System for name resolution

VL2 Components Scale-out CLOS topology Addressing and Routing – VLB Directory System

* Scale-out topology Bipartite graph Graceful degradation of bandwidth if an IS fails VL2 Int . . . CLOS Aggr . . . The links between the intermediate switches and the aggregation switches form a complete bipartite graph As in the conventional topology, ToR connect to two Aggregation Switches However now, the large number of paths in between AS and IS means that if there are n IS, the failure of 1 of them reduces the bisection bandwidth by only 1/n Thus we get a graceful degradation of bandwidth Furthermore, it is easy and less expensive to build a Clos network for which there is no over subscription Basically they choose to scale down the devices instead of scaling them up, but utilize more of them. . . . . . . . . . TOR . . . . . . . . . . . 20 Servers

Scale-out topology Clos very suitable for VLB By indirectly forwarding traffic through an IS at the top, the network can provide bandwidth guarantees for any traffic matrices subject to the hose model Routing is simple and resilient Need a random path up to an IS and a random path down to a destination ToR

VL2 Addressing and Routing: name-location separation * VL2 Addressing and Routing: name-location separation Allows usage of low-cost switches Protects network and hosts from host-state churn Directory Service Switches run link-state routing and maintain only switch-level topology VL2 … x  ToR2 y  ToR3 z  ToR4 … x  ToR2 y  ToR3 z  ToR3 ToR1 . . . ToR2 . . . ToR3 . . . ToR4 The network infrastructure operates with location specific addresses (LAs) All switches and interfaces are assigned Las Switches run a layer-3 link state protocol that disseminates only those LAs capturing the whole topology eventually Then they can forward packets encapsulated with Las Applications use application-specific addresses AAs Remain unaltered even though servers’ locations change (due to VM migration, re-provisioning) Packet forwarding VL2 agent traps packets from host and encapsulates the apcket with the LA address of the ToR of the destination Once the packet arrives at the dst ToR it decapsulates it and delivers it to the AA address in the inner header Address resolution Servers believe that all of the other servers in the same service are part of the same IP subnet Thus when sending for the first time and ARP is broadcasted VL2 agent on the server hsot, intercepts that and in turn sends a unicast request to the directory system Dir Systems responds with the LA of the ToR of the destination When a ToR fails, we re-assign the service Once a service is assigned to a server, the Directory System will store the mapping between the AA and LA CHURN: how often does the state changes due to switch/link failures, recoveries e.t.c ToR3 y payload Lookup & Response x y y, z z ToR3 ToR4 z z payload payload Servers use flat names

VL2 Addressing and Routing: VLB indirection * VL2 Addressing and Routing: VLB indirection [ ECMP + IP Anycast ] Harness huge bisection bandwidth Avoid esoteric traffic engineering or optimization Ensure robustness to failures Work with switch mechanisms available today Links used for up paths Links used for down paths IANY IANY IANY Overview: VLB causes the traffic between any pair of servers to bounce off a randomly selected Intermediate switch. Then it utilizes layer-3 router features to perform ECMP to spread the traffic along multiple subpaths for these 2 path segment 1 segment: up link path 2nd segment down link path VLB distributes traffic across a set of intermediate nodes Uses flows as the basic unit of traffic spreading, avoiding out-of-order delivery ECMP distributes across equal-cost paths To implement VLB the VL2 agent encapsulate packets to a specific but randomly chosen IS The IS decapsulates the packet, sends it to the dst ToR which decapsulates again to send it to the dst server However this would require a large number of updates once an IS fails. To address that they assign the same LA anycast address to the Iss The directory system returns this anycast address to VL2 agents upon lookup request. Thus if any IS fails, we don’t need to update all the affected VL2 agents that would have had stale values Also, Since all ISs are exactly 3 hops away from the src, ECMP takes care of delivering packets encapsulated with the anycast address to any of the active Iss, taking care of failures ASK: What would be a problem here? Elephant flows: then the random flow placement could lead to persistent congestion on some links while others are underutilized However elephant flows are rear in Data-Centers T1 T2 T3 T4 T5 T6 IANY T3 T5 y z payload payload 1. Must spread traffic 2. Must ensure dst independence Equal Cost Multi Path Forwarding x y z

VL2 Directory System Three key functions Goals Lookups Updates AA to LA mapping Reactive cache update For latency sensitive updates E.g VM during migration Goals Scalability Reliability for updates High lookup performance Eventual consistency (like ARP) Reactive cache update Mappings are cached to at Directory Servers and in VL2 agents’ caches Thus an update can lead to inconsistency

VL2 Directory System . . . RSM RSM Servers Directory Servers DS Agent 2. Reply 1. Lookup “Lookup” 5. Ack 2. Set 4. Ack (6. Disseminate) 3. Replicate 1. Update “Update” Replicated Directory Servers (DS) Moderate # (50 -100 servers for 100K servers) Cache AA to LA mappings Lazy sync its mapping with the RSM every 30 seconds (we don’t need strong consistency here) Handle queries from VL2 agents Ensure (for lookups): High Throughput and High Availability and Low latency : an agent sends a LOOKUP to k randomly chosen DS. The agent chooses the fastest reply. Small # of Replicated State Machine servers (RSM) Strongly consistent, reliable store of mappings Ensure (for a modest # of updates) Strong consistency Durability UPDATE: an update is sent to a randomly chosen DS which forwards the update to a RSM server. The RSM reliably replicates the update to every RSM server and then replies with an ACK to the DS which forwards the ACK to the client. To enhance consistency the DS server can disseminate the ACK to a few other DSs. Reactive cache update Mappings are cached to at Directory Servers and in VL2 agents’ caches Thus an update can lead to inconsistency Reactive cache update: Observation: a stale host mapping needs to be corrected only when that mapping is used to sent traffic Thus when these packets are arrived at a stale LA (a ToR that doesn’t host the dst server anymore), the ToR can forward a sample of such non-deliverable packets to a directory server, triggering the directory sever to gratuitously correct the stale mapping in the source’s cache via unicast

Outline Background Motivation Measurement Study VL2 Evaluation Summary and Discussion

Evaluation Uniform high capacity: All-to-all data shuffle traffic matrix: 75 servers Each delivers 500MB to all others (for a total of 2.7TB shuffle from memory to memory) VL2 completes the shuffle in 395s Aggregate goodput: 58.8Gbps 10x times better than their current data center network Maximal achievable goodput over all flows is 62.3Gbps VL2 network efficiency as 58.8/62.3 = 94%

Evaluation VLB Fairness: 75 node testbed Traffic characteristics as per measurement study All flows pass through the Aggregation switches  sufficient to check there for the split ratio among the links to the Intermediate switches Plot Jain’s fairness index for traffics to intermediate switches VLB split ratio fairness index, averages >0.98% for all ASs Time (s) 0 100 200 300 400 500 1.00 0.98 0.96 0.94 Fairness Index Aggr1 Aggr2 Aggr3 Goal: Evaluate if VLB with ECMP is splitting traffic evenly across the network

Evaluation Performance isolation: Added two types of services to the network: Service one: 18 servers do single TCP transfer to another server, starting at time 0 and lasting throughout the experiment Service two: Start one server at 60s and assigns a new server every 2s for a total of 19servers Each one starts a 8GB transfer over TCP as soon as it starts up (every 2 seconds) To achieve isolation they rely on TCP to ensure that each flow offered to the network is rate limited to its fair share of the bottleneck (i.e obeys the hose model) Q: Does TCP react sufficiently quickly to control the offered rate of flows within services? (enforcement of the hose model for traffic in each service means that it can provide performance isolation between services) TCP works with packets and adjusts their sending rate at the time-scale of RTTs However, conformance to the hose model requires instantaneous feedback to avoid oversubscription of traffic ingress/egress bounds No perceptible change in Service 1, as servers start up on Service 2

Evaluation Performance isolation (cnt’d): To evaluate how mice flows (large number of short TCP connections) – common in DC – affect performance on other services: Service 2: Servers create successively more bursts of short TCP connections (1 to 20 KB) No perceptible change in Service 1 TCP’s natural enforcement of the hose model is sufficient to provide performance isolation when combined with VLB and no oversubscription

Evaluation Convergence after link failures 75 servers All-to-all data shuffle Disconnect links between intermediate and aggregation switches Figure shows a time series of the aggregate goodput achieved by the flows in the data shuffle. Vertical lines illustrate the time where a disconnection or reconnection has occured Maximum capacity of the network, degrades gracefully Restoration is delayed VL2 fully uses a link, roughly 50s after it is restored Restoration does not interfere with traffic and the aggregate throughput eventually returns to its initial level.

Outline Background Motivation Measurement Study VL2 Evaluation Summary and Discussion

That’s a lot to take in! Take Aways please! Problem: Over-subscription in data-centers and lack of agility

Overall Approach Measurement Study Stakeholder Interviews Datacenter workload measurements Design Objectives Architecture Application of known techniques when possible Evaluation Testbed: includes all design components Evaluation with respect to objectives

Design Overview Flat Addressing Directory System (resolution service) Layer-2 semantics Name – Location Separation Enforce hose model using existing mechanisms Performance Isolation TCP Valiant Load Balancing Guarantee bandwidth for hose-model traffic Uniform high-capacity Scale-out CLOS topology Solution Approach Objective

* End Result The Illusion of a Huge L2 Switch . . . . . . CR AR S 1. L2 semantics . . . 2. Uniform high capacity 3. Performance isolation . . . A A A A A … A A A A A A … A A A A A A A A A A … A A A A A A A A A A A A … A A A A

Discussion What is the impact of VL2 on Data-Center power consumption? Security The Directory System can perform access control What are the challenges with that? Other issues? Flat vs Hierarchical addressing VL2 has issues with large flows. How can we address that challenge? What is the impact of VL2 on Data-Center power consumption? All links and switches are working all the times, not power efficient Considering the enormous power consumption of data centers and the grave efforts towards reducing that consumption this is going in the opposite direction in this respect Solutions? Better load balancing vs selective shut down