Download presentation
Presentation is loading. Please wait.
Published byVera Tanudjaja Modified over 5 years ago
1
Dragonfly+: Low Cost Topology for scaling Datacenters
Authors: Alexander Shpiner, Zachy Haramaty, Saar Eliad, Vladimir Zdornov, Barak Gafni and Eitan Zahavi
2
Outline: Topology Fully Progressive Adaptive Routing
Analytical and Simulative Analysis Conclusion
3
Topology Dragonfly and Fat-Trees Dragonfly Topology Fat Tree Topology
4
Topology Dragonfly+
5
Topology Dragonfly+
6
Topology Dragonfly+
7
Topology For keeping full bi-sectional bandwidth inside the group:
(1) p= l = s = h (2) p= h = k/2 (3) Ngroup = pl = (k*k)/4 l: leaf routers p: hosts per leaf routers s,h: spine routers k: router radix Ngroup : number of hosts in the group
8
Routing
9
Deadlock Avoidance (1) Packet that traverses the minimal route does not change its VL. (2) Packet that traverses the intermediate spine route changes its VL in intermediate spine router. (3) Packet that traverses the intermediate leaf route changes its VL in intermediate leaf router.
10
Routing Is Min-routing optimal?
Non-Min route is choosen if all egress queues on min routes are longer than T, and there is an egress queue on the non-min route that is shorter than T. (T is queue length threshold.) Routing decision are evaluated in every router on the packet's path.
11
Routing Fully Progressive Adaptive Routing (FPAR-Rules)
12
Routing Fully Progressive Adaptive Routing(FPAR-Rules)
13
Routing How does it handle Remote Congestion?
ARN(Adaptive Routing Notification) ARN messages: destination address A and incoming port ARN ARN messages are sent among the routers to notify distant congestion that can be resolved by previous router on the route. Packet Excludes port P from a list of possible ports for packets destined to A, for predefined time. If p is the only port packets are queued and ARN messeage is sent to previous router.
14
Analytical and Simulative Analysis
Analytical Analysis (Dragonfly+, Dragonfly, 3-level Fat Tree with 2:1 blocking ratio, 3-level Fat Tree non-blocking, Slimfly) assuming router radix of 36. Scalability: Maximal number of hosts Cost: Number of hosts per router Locality: Full Bisection Group Size (number of hosts inside the group) Network Throughput Number of VLs Diameter and Maximal assured route length
15
Analytical Analysis Scalability
Fig A. Maximal network size in number of hosts vs. Router radix (k). Fig B. Group size in number of hosts vs router radix(k). Dragonfly+ and Non-blocking Fat Tree graphs are merging
16
Analytical Analysis
17
Simulative Analysis (over Omnet++ based infrastructure)
Uniform Random Traffic (packets are injected to random destination by hosts) DF+ network of 1296 hosts and k=36, 4 groups. Each Spine router of a group is connected by six parallel links to a spine router in each other group. Permutation Traffic (simulated with 100 randomized permutation and selected a single permutation that achieved worst performance) Maximal Dragonfly+ network of K=8 radix routers, 272 hosts. Speedup analysis: Static, Random, and Adaptive routing schemes with permutation traffic of 8KB, 256KB, and 1MB.
18
Simulative Analysis -Uniform Random Traffic
Fig: End to End Network Latency vs. Load with Uniform Random Traffic
19
Simulative Analysis -Permutation Traffic
Fig A: Speedup of Permutation Pattern with various routing schemes and message size Fig B: Mean End to End Network Latency vs. Load with Permutation Traffic with message size of 1MB.
20
Conclusions Presented novel Fully Progressive Adapting routing technique Dragonfly+ is 4 times more scalable than Dragonfly with the same cost. Provides same or better throughput for equivalent Dragonfly and Fat Tree under various traffic patterns.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.