Download presentation
Presentation is loading. Please wait.
Published byEaster Conley Modified over 8 years ago
1
A Scalable, Commodity Data Center Network Architecture Jingyang Zhu
2
Outline Motivation Background Fat Tree Architecture –Topo –Routing –Fault Tolerent Results
3
Motivation Large Data Shuffle Map Reduce
4
Intuitive Approach High End Hardware (e.g., InfiniBand) FDRFourth Data Rate EDREnhanced Data Rate HD R High Data Rate
5
Alternative Approach A dedicated interconnection network –Scalablilty –Cost –Compability (i.e., app, os, hardware)
6
Typical Topology
7
Clos Network (m, n, r) = (5, 3, 4) 1. strictly non- blocking (m >= 2n - 1) 2. rearrangeably non-blocking (m >= n)
8
Benes Network A Clos Network with 2x2 switches
9
Fat Tree Multi-path Routing: UpLink (right) + DownLink (left) Oversubscription: ideal BW / actual BW of host end. e.g. 1 : 1 is good; 5 : 1 is bad Node 1 (0001) -> Node 6 (0110): 2 possible paths
10
Topo of Data Center - Hierachy Multi-Path = 2 Conventional Topo in Data Center GigE Link 10 GigE Link
11
Topo of Data Center - Fat Tree Fat Tree Topo (k = 4) (k/2)^2 k-port # of hosts: k^3 / 4, e.g., k = 48 => # of hosts: 27648 (Scalability!!!) k pods k/2 k-port
12
Addressing - Compability!!! Pod switches: 10.pod #.switch #.1 Core Address: 10.k.j.i (k - radix, - coordinate) j,i = 1,2,...,k/2 10.0.0.1 10.0.3.1 10.1.0.1 10.1.1.1 10.1.3.1 10.1.2.1 10.2.1.1 10.2.3.1 10.3.3.1 10.3.1.110.3.0.1 10.3.2.1
13
Addressing (con't) Host: 10.pod #.switch #.ID 10.0.0.2 10.0.0.3 10.0.1.3 10.1.0.2 10.1.0.3 10.1.1.2 10.1.1.3 switch 0 switch 1 switch 0 switch 1 Addressing Format is for further routing purpose
14
2-level table routing - pod switch Downlink to Host Uplink to Core 10.2.1.2 10.2.1.3 24 - MSB 8 - LSB Traffic diffusion occurs only in the first half of a packet’s journey
15
Generation of routing table addPrefix (pod switch, pre, port) addSuffix (pod switch, suf, port)
16
1-level table routing - core switch PrefixOutput Port 10.0.0.0/160 10.1.0.0/161 10.2.0.0/162 10.3.0.0/163
17
Routing Table Implementation Content Addressable Memory (CAM) Input: data; output: match / mismatch
18
Routing Table Implementation (con't) Host Address Match RAM Address 10.2.0.3 1001 00
19
Routing Example: Hierarchical Tree 10.0.1.2 -> 10.2.0.3 10.0.1.3 -> 10.2.0.2
20
Routing Example: Fat Tree 10.0.1.2 -> 10.2.0.3 10.0.1.3 -> 10.2.0.2 No Contention!!!
21
Dynamic Routing Up to now, the routing alg is based on static table...any improvement??? –Yes, using dynamic routing Dynamic Routing –Flow Classification –Flow Scheduling
22
Dynamic Routing 1 - Flow Classification Flow: A set of packets that must have its order preserved Dynamic Routing –Avoid reordering for same flow –Reassign a minimum number of flows to minimize the disparity between ports Flow Classifier: identify flows
23
Flow Classification Check src & dst address Balance the port load dynamically Avoid reordering Balance the port DYNAMICALLY Every t seconds to rearrange flows Max 3 flows to be rearranged Have some risks to reorder the flow!!! - For performance consideration, not for correctness
24
Dynamic Routing 2 - Flow Scheduling Large flows are critical - schedule the large flows independently // edge switches if (length (flow_in) > threshold) notify central schedular else route as normal // central schedular if (receive notification) foreach possible path if (path not reserved) reserve the path & notify switches along the path Data StructureFunction bool Link [LINKSIZE]link status hash Flow record reservation & clear reservation (retire the flow)
25
Discussion Which one is better? –Flow classification –Flow scheduling Locally, inter pod switch Globally, among all the paths and switches Locally, inter pod switch Globally, among all the paths and switches
26
Fault Tolerance How to know links or switches fail? Bidirectional Forwarding Detection (BFD)
27
Fault Tolerance (con't) Basic ideas –Mark the link unavailable when routing, e.g., marking the load inf in flow classification –Broadcast the fault to other switches and avoid routing it
28
Cost 1:1
29
Power & Heat Power and Heat for different switches 10 GigE
30
Performance Different Benchmarks Percentage of ideal bisection bandwidth
31
Conclusion Fat tree for data center interconnection –Scalable –Cost efficient –Compatible Routing details, locally & globally Fault tolerant
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.