Data Center Routing – Traffic Engineering Yao Lu Rui Zhang ECE 260C VLSI Advanced Topics.

Data Center Routing – Traffic Engineering Yao Lu Rui Zhang ECE 260C VLSI Advanced Topics

Outline What is routing/traditional routing algorithm What is data center Difference between data center and the Internet Some Recent work in data center TE Open questions/proposals

What is routing

Traditional routing algorithm RIP (Routing Information Protocol) IGRP (Interior Gateway Routing Protocol) EIGRP (Enhanced Interior Gateway Routing Protocol) OSPF (Open Shortest Path First) IS-IS (Intermediate System-to-Intermediate System) BGP (Border Gateway Protocol)

What is data center Nowadays, 40% of the total Internet traffic goes to Google[1]

Difference between data center and the Internet Design Goal – latency, reliability, throughput, energy, etc. Properties – Well-structured topology – Movability of the locations of sources and destinations – Global knowledge of the whole data center network

Recent work Equal-Cost Multi-Path (ECMP)[7] Valiant Load Balancing (VLB)[6] CamCube[5] Hedera[8] Joint VM Placement and Routing (JVMPR)[4]

ECMP Many equal cost paths going up to the core switches Only one path down from each core switch Randomly allocate paths to flows using hash of the flow DS

VLB Goal – Guarantee equal-spread load- balancing in a mesh network Method – Bouncing individual packets from a source switch in the mesh off of randomly chosen intermediate “core” switches, which finally forward those packets to their destination switch.

Camcube 3D Torus Topology Offer Camcube API – To let service/application to design its own routing protocal Core services – Basic routing algorithm link state-based protocol

Hedera Detect Large Flows – Flows that need bandwidth but are network-limited Estimate Flow Demands – Use min-max fairness to allocate flows between src-dst pairs Place Flows – Use estimated demands to heuristically find better placement of large flows on the ECMP paths Estimate Flow Demands Place Flows Detect Large Flows

Hedera Large Flow Detection – Scheduler continually polls edge switches for flow byte- counts – Flows exceeding B/s threshold are “ large ” > %10 of hosts ’ link capacity (i.e. > 100Mbps)

Hedera Demand Estimation – Goal Estimate available bandwidth to allocate – Method Using min-max fairness, given traffic matrix of large flows, modify each flow ’ s size at it source and destination iteratively… – Sender equally distributes bandwidth among outgoing flows that are not receiver-limited – Network-limited receivers decrease exceeded capacity equally between incoming flows – Repeat until all flows converge

Hedera A B C X Y FlowEstimateConv. ? AXAX AYAY BYBY CYCY Sender Available Unconv. BW FlowsShare A121/2 B111 C111 Senders

Hedera RecvRL? Non-SL Flows Share XNo-- YYes31/3 Receivers FlowEstimateConv. ? AXAX1/2 AYAY BYBY1 CYCY1 A B C X Y

Hedera FlowEstimateConv. ? AXAX1/2 AYAY1/3Yes BYBY1/3Yes CYCY1/3Yes Sender Available Unconv. BW FlowsShare A2/31 B000 C000 Senders A B C X Y

Hedera FlowEstimateConv. ? AXAX2/3Yes AYAY1/3Yes BYBY1/3Yes CYCY1/3Yes RecvRL? Non-SL Flows Share XNo-- Y -- Receivers A B C X Y

Hedera Flow Placement – Goal Find a good allocation of paths for the set of large flows, such that the average bisection bandwidth of the flows is maximized – Method Global First Fit: – Greedily choose path that has sufficient unreserved b/w Simulated Annealing: – Iteratively find a globally better mapping of paths to flows

Hedera Global First Hit – New flow detected, linearly search all possible paths from S  D – Place flow on first path whose component links can fit that flow

Hedera Simulated Annealing – 4 specifications State space Neighboring states Energy Temperature Simple example: Minimizing f(x) F(x)

Hedera State: All possible mapping of flows to paths – Constrained to reduce state space size – Flows to a destination constrained to use same core Neighbor State: Swap paths between 2 hosts – Within same pod Function/Energy: Total exceeded b/w capacity – Using the estimated demand of flows – Minimize the exceeded capacity Temperature: Iterations left – Fixed number of iterations (1000s)

Hedera

JVMPR Joint VM Placement and Routing Goal: Efficient traffic engineering under dynamic arrivals and departures of jobs – One method ： Localizing traffic by flexible VM placement node utilization – Another method ： Avoiding congestion by intelligent routing link utilization coupled with each other

JVMPR Figure1:The left structure is the existing VMs and traffic The middle structure is good VM placement with high congestion The right structure is a worse placement with lower congestion existing VM VM we need to add

JVMPR JVMPR consider placement and routing at the same time It develops an approximation algorithm that leverages the specific structure of the joint design problem

JVMPR Placement and Route Selection – Placement: The feasible decision space for VM placement is – Routing ： The feasible decision space for routing is

JVMPR Optimize Resource Utilization – cost net : Network cost Measure the congestion – cost node : Node cost Operating cost induced by a swith or a machine – Goal: Minimize the total cost

JVMPR Any problem? Yes! – The number of jobs is not fixed – Jobs enter or depart the system dynamically Better way: Online solution – Static problem setting to a dynamic environment – Key idea: Perform local re-optimization

JVMPR Online solution algorithm – Upon a new job arrival, assign the new job to one configuration accoridng to the transition probability – Upon a job departure, pick one job and migrate it to new machines according to the transition probability

JVMPR Why dynamic JVMPR solution is appealing? – We do not require VM migrations when new jobs arrive and at most one job migration when jobs depart – The computation of migration probability only requires local information

JVMPR Fig. Performance comparison Max Core Switch Utilization Percentage of elephant flows

JVMPR What is the price we pay for it? – The approximated Markov chain no longer converges to the exact stationary distribution But to a neighborhood around it – Need a lot computation

Summary AlgorithmTopology MovabilityGlobal knowledgeOther idea ECMPY VLBY CamCubeY Y HederaY Y JVMPRYY

Summary AlgorithmProsCons ECMP1. Simple 2. Works great with mice flow 1. Might cause congestion with elephant flows VLB1. Simple 2. Works great with mice flow 1. Might cause congestion with elephant flows CamCube1. Flexible1. Optimization per service/ application, no global optimization is considered Hedera1. Can deal with both mice flow and elephant flow 1. Algorithm cannot guarantee global optimal 2. Assumptions when doing Demand Estimation may not hold JVMPR1. Cost is low 2. Computation only need local information 1.Need a lot computation 2.It is a kind of approximation

Open questions/proposals Imperfection of current algorithms – Hedera Large flow detection too simple Demand estimation only considered TCP flows – JVMPR Demand a lot of computation It is approximation Not fully take advantage of the nice features of data center – Combine topology, movability and VM placement together Add VM placement consideration into Hedera

Reference [1] http://www.forbes.com/sites/timworstall/2013/08/17/fascinating-number-google-is-now-40-of-the- internet/ [2] Moy, John T. OSPF: anatomy of an Internet routing protocol. Addison-Wesley Professional, 1998. [3] Chen, Kai, Chengchen Hu, Xin Zhang, Kai Zheng, Yan Chen, and Athanasios V. Vasilakos. "Survey on routing in data centers: insights and future directions." Network, IEEE 25, no. 4 (2011): 6-10. [4] Jiang, Joe Wenjie, Tian Lan, Sangtae Ha, Minghua Chen, and Mung Chiang. "Joint VM placement and routing for data center traffic engineering." In INFOCOM, 2012 Proceedings IEEE, pp. 2876-2880. IEEE, 2012. [5] Abu-Libdeh, Hussam, Paolo Costa, Antony Rowstron, Greg O'Shea, and Austin Donnelly. "Symbiotic routing in future data centers." ACM SIGCOMM Computer Communication Review 41, no. 4 (2011): 51-62. [6] Farrington, Nathan, George Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya, Yeshaiahu Fainman, George Papen, and Amin Vahdat. "Helios: a hybrid electrical/optical switch architecture for modular data centers." ACM SIGCOMM Computer Communication Review 41, no. 4 (2011): 339-350. [7] Hopps, Christian E. "Analysis of an equal-cost multi-path algorithm." (2000). [8] Al-Fares, Mohammad, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, and Amin Vahdat. "Hedera: Dynamic Flow Scheduling for Data Center Networks." In NSDI, vol. 10, pp. 19-19. 2010.

Thank you!

Data Center Routing – Traffic Engineering Yao Lu Rui Zhang ECE 260C VLSI Advanced Topics.

Similar presentations

Presentation on theme: "Data Center Routing – Traffic Engineering Yao Lu Rui Zhang ECE 260C VLSI Advanced Topics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Center Routing – Traffic Engineering Yao Lu Rui Zhang ECE 260C VLSI Advanced Topics.

Similar presentations

Presentation on theme: "Data Center Routing – Traffic Engineering Yao Lu Rui Zhang ECE 260C VLSI Advanced Topics."— Presentation transcript:

Similar presentations

About project

Feedback