Data Centers: Network Architecture

Data Centers: Network Architecture
Joshua Eckhardt, Yi Zhang March 17th, 2016

VL2: A Scalable and Flexible Data Center Network
Albert Greenberg, James R. Hamilton, Navendu Jain Srikanth Kandula, Changhoon Kim, Parantap Lahiri David A. Malt, Parveen Patel, Sudipta Sengupta

Outline Problem Objectives Measurements & Implications VL2 Evaluation
Conclusion

Architecture of Conventional DCNs

Conventional DCNs Problems
CR CR 1:240 AR AR AR AR S S I have spare ones, but… S S I want more 1:80 . . . S S S S S S S S 1:5 A A … A A A … A A A … A A A … A Limited Server-to-Server Capacity Fragmentation of Resources Poor reliability and utilization

Objectives Uniform high capacity Performance isolation
Maximum rate of server to server traffic flow should be limited only by capacity on network cards Assigning servers to service should be independent of network topology Performance isolation Traffic of one service should not be affected by traffic of other services Layer-2 semantics Easily assign any server to any service Configure server with whatever IP address the service expects VM keeps the same IP address even after migration

Measurements and Implications
Data Center traffic analysis Ratio of traffic volume between servers in the data-center with traffic entering/leaving the data-center is 4:1 Bandwidth demand between servers inside a data-center grows faster than bandwidth demand to external hosts Network is the bottleneck of computation

Flow distribution analysis
Majority of flows are small, flows over a few GB are rare The distribution of internal flows is simpler and more uniform Number of concurrent Flows ( 2 modes ) >50% of the time, an average machine has ~10 concurrent flows At least 5% of the time it has >80 concurrent flows Implies randomizing at flow granularity (rather than packet) will not cause perpetual congestion if there is unlucky placement of a few flows

Traffic Matrix Analysis
Poor summarization of traffic patterns Even when approximating with clusters, fitting error remains high (60%). Implies engineering for just a few traffic matrices is unlikely to work well for “real” traffic in data centers. Instability of traffic patterns Traffic pattern shows no periodicity that can be exploited for prediction

Failure Characteristics
Failure is the event that occurs when a system or component is unable to perform its required function for more than 30s Most failures are small in size 50% of network failures involve < 4 devices 95% of network failures involve < 20 devices Downtimes can be significant 95% are resolved in 10 min 98% in < 1 hour 99.6% in < 1 day 0.09% last > 10 days Main causes of downtimes Network misconfigurations, Firmware bugs, Faulty components No obvious way to eliminate all failures from the top of the hierarchy

Data centers have tremendous volatility in their workload, their traffic and their failure patterns.
The hierarchical topology is intrinsically unreliable

Virtual Layer 2 Switch (VL2) : Design Principle
Randomizing to cope with volatility Using Valiant Load Balancing (VLB) to do destination independent traffic spreading across multiple intermediate nodes Building on proven networking technology Using IP routing and forwarding technologies available in commodity switches (link state routing, ECMP, IP anycasting and IP multicasting) Separating names from locators Using directory system to maintain the mapping between names and locations Embracing end systems A VL2 agent at each server

Virtual Layer 2 Switch (VL2)
The Illusion of a Huge L2 Switch CR AR S 1. L2 semantics . . . 2. Uniform high capacity 3. Performance isolation Creating the illusion that hosts are connected to a big, non-interfering data-center-wide layer-2 switch. . . . A A A A A … A A A A A A … A A A A A A A A A A … A A A A A A A A A A A A … A A A A

VL2 Overview: Goals and Solutions
Objective Approach Solution 1. Uniform high capacity between servers Guarantee bandwidth for hose-model traffic VLB & Scale-out Clos topology 2. Performance Isolation Enforce hose model using existing mechanisms only TCP 3. Layer-2 semantics Employ flat addressing Name-location separation & resolution service

Hose Model

Scale-Out Topologies (Clos Network)
Ensure huge aggregate capacity and robustness at modest cost VL2 Int . . . D (# of 10G ports) Max DC size (# of Servers) 48 11,520 96 46,080 144 103,680 Aggr . . . K aggr switches with D ports Poor bisection bandwidth is a big problem in the conventional DCNs. The large number of paths between every two Aggregation switches means that if there are n Intermediate switches, the failure of any one of them reduces the bisection bandwidth by only 1/n–a desirable graceful degradation of bandwidth. . . . TOR . . . 20 Servers 20*(DK/4) Servers

Scale-Out Topologies Clos is very suitable for VLB
By indirectly forwarding traffic through an IS at the top, the network can provide bandwidth guarantees for any traffic matrices subject to the hose model Routing is simple and resilient Need a random path up to an IS and a random path down to a destination ToR

VL2 Addressing and Routing
Name/Location Separation Cope with host churns with very little overhead VL2 Switches have physical names Directory Service Switches run link-state routing and maintain only switch-level topology Allows to use low-cost switches Protects network and hosts from host-state churn … x  ToR2 y  ToR3 z  ToR3 … x  ToR2 y  ToR3 z  ToR4 ToR1 . . . ToR2 . . . ToR3 . . . ToR4 ToR3 y payload Lookup & Response x y y, z z ToR3 ToR4 z z payload payload Servers have virtual names

VL2 Addressing and Routing
VLB Indirection Cope with arbitrary TMs with little overhead [ ECMP + IP Anycast ] Harness huge bisection bandwidth Obviate esoteric traffic engineering or optimization Ensure robustness to failures Work with switch mechanisms available today Links used for up paths Links used for down paths IANY IANY IANY T1 T2 T3 T4 T5 T6 IANY T5 z payload 1. Must spread traffic 2. Must ensure dst independence ECMP + VLB x y z

VL2 Directory System . . . RSM RSM Servers Directory Servers DS Agent
2. Reply 1. Lookup “Lookup” 5. Ack 2. Set 4. Ack (6. Disseminate) 3. Replicate 1. Update “Update”

Evaluation Uniform high capacity All-to-all data shuffle stress test
75 servers, deliver 500MB Maximal achievable goodput is 62.3 VL2 network efficiency as 58.8/62.3 = 94% The result is than 10x what the network in our current data centers can achieve with the same investment

Evaluation VLB Fairness 75 nodes testbed
All flows pass through the Aggregation switches → sufficient to check there for the split ratio among the links to the Intermediate switches Plot Jain’s fairness index for traffics to intermediate switches VLB split ratio fairness index, averages >0.98% for all ASs Time (s) 1.00 0.98 0.96 0.94 Fairness Index Aggr Aggr Aggr3

Evaluation Performance Isolation Two types of services:
Service one: 18 servers do single TCP transfer all the time Service two: 19 servers starts a 8GB transfer over TCP every 2 seconds No perceptible change in Service 1, as servers start up on Service 2

Evaluation Performance isolation (continued)
To evaluate how mice flows affect performance on other services Service 2: Servers create successively more bursts of short TCP connections (1 to 20 KB) TCP’s natural enforcement of the hose model is sufficient to provide performance isolation when combined with VLB and no over-subscription

Evaluation Convergence after link failures 75 servers
All-to-all data shuffle Disconnect links between intermediate and aggregation switches Maximum capacity of the network, degrades gracefully Restoration is delayed Restoration does not interfere with traffic and the aggregate throughput eventually returns to its initial level

Other Data Center Architectures
VL2: A Scalable and Flexible Data Center Network consolidate layer-2/layer-3 into a “virtual layer 2” separating “naming” and “addressing”, also deal with dynamic load- balancing issues A Scalable, Commodity Data Center Network Architecture a new Fat-tree “inter-connection” structure (topology) to increases “bisection” bandwidth needs “new” addressing, forwarding/routing Other Approaches: PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric BCube: A High-Performance, Server-centric Network Architecture for Modular Data Centers

Discussion What is the impact of VL2 on Data-Center power consumption?
How secure is VL2? Flat vs Hierarchical addressing? VL2 has issues with large flows. How can we address that challenge? In a flat routing infrastructure, each network ID is represented individually in the routing table. The network IDs have no network/subnet structure and cannot be summarized. In a hierarchical routing infrastructure, groups of network IDs can be represented as a single routing table entry through route summarization. The network IDs in a hierarchical internetwork have a network/subnet/sub-subnet structure. A routing table entry for the highest level (the network) is also the route used for the subnets and sub-subnets of the network. Hierarchical routing infrastructures simplify routing tables and lower the amount of routing information that is exchanged, but they require more planning. IP implements hierarchical network addressing, and IP internetworks can have a hierarchical routing structure.

Conclusion VL2 provides
Uniform High Capacity Performance Isolation Agility through layer two semantics VL2 is a simple design that can be realized today with available networking technologies, and without changes to switch control and data plane capabilities

Application-Driven Bandwidth Guarantees in Datacenters
Jeongkeun Lee, Yoshio Turner, Myungjin Lee, Lucian Popa, Sujata Banerjee, Joon-Myung Kang, Puneet Sharma

Motivation Emerging Cloud applications are diverse
Complex combinations of multiple services and require predictable performance, high availability, and high intra-datacenter bandwidth Need to accurately represent and efficiently map application requirements onto shared physical network Existing models and systems fall short Focused on batch applications like Hadoop and Pregel All to all traffic patterns

Interactive Online applications
E.g., 3-tier web, enterprise ERP, real-time analytics Composed of communicating service components (tiers) High bandwidth requirement Delay-sensitive Amazon –“Every 100ms latency costs 1% in (e-commerce) sales”

CloudMirror: Goals and System Components

Prior work: Pipe model Specifies every VM-to-VM pair bandwidth (not scalable) O(n2) pipes for n VMs Too slow to compute valid VM placements O(n3) or higher complexity for resource-efficient placement

Prior work: Hose model [N. Duffield, Sigcomm’99]
Specifies per-VM aggregate bandwidth Simple, scalable Well describes batch applications with all-to-all communication patterns

Hose model is too coarse-grained
Hose model aggregates BW towards different components Inefficient in terms of resource utilization

Hose model is too coarse-grained
Hose model aggregates BW towards different components fails to guarantee the required bandwidth in case of congestion VOC is 2-level hierarchical hose model 400 400

Lessons Aggregate pipes (Like hose)
Model simplicity Multiplexing gain Preserve inter-component structure(Like pipe) Accurately capture application demands Efficiently utilize network resources Solution aggregating pipes only between a pair of communicating tiers

Tenant Application Graph (TAG)
Vertex = application component (or tier) A group of VMs performing the same function Vertex size N= # of VMs Two types of edges Directional edge between two vertices: inter-component Bsnd= per-VM sending bandwidth (VM-to-component aggregation) Brcv= per-VM receiving bandwidth (component-to-VM aggregation) Self-edge: intra-component Bin = per-VM sending/receiving for traffic between the same tier VMs

TAG Example

Benefits TAG is flexible TAG is accurate and resource-efficient
Per-VM aggregation from/to each other component TAG is accurate and resource-efficient TAG is Intuitive TAG is easy to use because it directly mirrors application structure

Deploying TAG

VM placement Goal: Deploy as many tenants as possible onto a tree- shaped topology while guaranteeing SLAs NP-hard problem Prior heuristics = colocation localize traffic and save core bandwidth

Disadvantages of colocation
Colocation hurts high-availability

Disadvantages of colocation
Colocation can hurt efficient resource utilization

CloudMirror Approach Identify tiers that would benefit from colocation
more than half the VMs of a tier t must be colocated in the subtree (achieve hose bandwidth saving) more than half of the VMs of t or those of t′ need to be colocated in the subtree (achieve trunk bandwidth saving) For tiers w/o BW saving benefit (too large to colocate) Balance resource utilizations over subtrees and resource types = Multi- dimensional knapsack Greedy algorithm

Achieving High Availability
Guarantee survivability for requesting tiers Limit #VMs collocated in the same subtree Opportunistically improve HA for others Even for tiers with BW saving benefit, distribute VMs when BW is not a bottleneck

Evaluation Methodology Workload: bing.com data 3-level tree topology
Simulate a stream of (Poisson) tenant arrivals and departures Workload: bing.com data Various cluster sizes, communication patterns Tenant: a set of connected components 3-level tree topology 2048 hosts, 25 VM slots per host Baseline Model: VOC (2-level hose) Algorithm: Oktopus[Sigcomm’11]

Resource Efficiency

Opportunistic High-Availability
CM+oppHA achieves high average survivability w/ marginal increase of rejected BW requests

Conclusion TAG models application structure
Intuitive, flexible and efficient Algorithm efficiently places TAGs on tree-shaped topology with High-Availability supports Feasibility test with ElasticSwitch [Sigcomm’13] for TAG enforcement 30 lines of patch to support TAG

Data Centers: Network Architecture

Similar presentations

Presentation on theme: "Data Centers: Network Architecture"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Centers: Network Architecture

Similar presentations

Presentation on theme: "Data Centers: Network Architecture"— Presentation transcript:

Similar presentations

About project

Feedback