Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Scalable, Commodity Data Center Network Architecture Jingyang Zhu.

Similar presentations


Presentation on theme: "A Scalable, Commodity Data Center Network Architecture Jingyang Zhu."— Presentation transcript:

1 A Scalable, Commodity Data Center Network Architecture Jingyang Zhu

2 Outline Motivation Background Fat Tree Architecture –Topo –Routing –Fault Tolerent Results

3 Motivation Large Data Shuffle Map Reduce

4 Intuitive Approach High End Hardware (e.g., InfiniBand) FDRFourth Data Rate EDREnhanced Data Rate HD R High Data Rate

5 Alternative Approach A dedicated interconnection network –Scalablilty –Cost –Compability (i.e., app, os, hardware)

6 Typical Topology

7 Clos Network (m, n, r) = (5, 3, 4) 1. strictly non- blocking (m >= 2n - 1) 2. rearrangeably non-blocking (m >= n)

8 Benes Network A Clos Network with 2x2 switches

9 Fat Tree Multi-path Routing: UpLink (right) + DownLink (left) Oversubscription: ideal BW / actual BW of host end. e.g. 1 : 1 is good; 5 : 1 is bad Node 1 (0001) -> Node 6 (0110): 2 possible paths

10 Topo of Data Center - Hierachy Multi-Path = 2 Conventional Topo in Data Center GigE Link 10 GigE Link

11 Topo of Data Center - Fat Tree Fat Tree Topo (k = 4) (k/2)^2 k-port # of hosts: k^3 / 4, e.g., k = 48 => # of hosts: 27648 (Scalability!!!) k pods k/2 k-port

12 Addressing - Compability!!! Pod switches: 10.pod #.switch #.1 Core Address: 10.k.j.i (k - radix, - coordinate) j,i = 1,2,...,k/2 10.0.0.1 10.0.3.1 10.1.0.1 10.1.1.1 10.1.3.1 10.1.2.1 10.2.1.1 10.2.3.1 10.3.3.1 10.3.1.110.3.0.1 10.3.2.1

13 Addressing (con't) Host: 10.pod #.switch #.ID 10.0.0.2 10.0.0.3 10.0.1.3 10.1.0.2 10.1.0.3 10.1.1.2 10.1.1.3 switch 0 switch 1 switch 0 switch 1 Addressing Format is for further routing purpose

14 2-level table routing - pod switch Downlink to Host Uplink to Core 10.2.1.2 10.2.1.3 24 - MSB 8 - LSB Traffic diffusion occurs only in the first half of a packet’s journey

15 Generation of routing table addPrefix (pod switch, pre, port) addSuffix (pod switch, suf, port)

16 1-level table routing - core switch PrefixOutput Port 10.0.0.0/160 10.1.0.0/161 10.2.0.0/162 10.3.0.0/163

17 Routing Table Implementation Content Addressable Memory (CAM) Input: data; output: match / mismatch

18 Routing Table Implementation (con't) Host Address Match RAM Address 10.2.0.3 1001 00

19 Routing Example: Hierarchical Tree 10.0.1.2 -> 10.2.0.3 10.0.1.3 -> 10.2.0.2

20 Routing Example: Fat Tree 10.0.1.2 -> 10.2.0.3 10.0.1.3 -> 10.2.0.2 No Contention!!!

21 Dynamic Routing Up to now, the routing alg is based on static table...any improvement??? –Yes, using dynamic routing Dynamic Routing –Flow Classification –Flow Scheduling

22 Dynamic Routing 1 - Flow Classification Flow: A set of packets that must have its order preserved Dynamic Routing –Avoid reordering for same flow –Reassign a minimum number of flows to minimize the disparity between ports Flow Classifier: identify flows

23 Flow Classification Check src & dst address Balance the port load dynamically Avoid reordering Balance the port DYNAMICALLY Every t seconds to rearrange flows Max 3 flows to be rearranged Have some risks to reorder the flow!!! - For performance consideration, not for correctness

24 Dynamic Routing 2 - Flow Scheduling Large flows are critical - schedule the large flows independently // edge switches if (length (flow_in) > threshold) notify central schedular else route as normal // central schedular if (receive notification) foreach possible path if (path not reserved) reserve the path & notify switches along the path Data StructureFunction bool Link [LINKSIZE]link status hash Flow record reservation & clear reservation (retire the flow)

25 Discussion Which one is better? –Flow classification –Flow scheduling Locally, inter pod switch Globally, among all the paths and switches Locally, inter pod switch Globally, among all the paths and switches

26 Fault Tolerance How to know links or switches fail? Bidirectional Forwarding Detection (BFD)

27 Fault Tolerance (con't) Basic ideas –Mark the link unavailable when routing, e.g., marking the load inf in flow classification –Broadcast the fault to other switches and avoid routing it

28 Cost 1:1

29 Power & Heat Power and Heat for different switches 10 GigE

30 Performance Different Benchmarks Percentage of ideal bisection bandwidth

31 Conclusion Fat tree for data center interconnection –Scalable –Cost efficient –Compatible Routing details, locally & globally Fault tolerant


Download ppt "A Scalable, Commodity Data Center Network Architecture Jingyang Zhu."

Similar presentations


Ads by Google