Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to Train your Dragonfly

Similar presentations


Presentation on theme: "How to Train your Dragonfly"— Presentation transcript:

1 How to Train your Dragonfly
EE382c Final presentation May 24,2011 Hyungmin Cho Andrew Danowitz Mario Flajslik Amimul Ihsan

2 Outline Topology Routing Flow Control Hot Spot Management Status

3 Topology Dragonfly Minimizes expensive global communication
No more than 4 hops Modification: Each node connected to two routers Source: Lecture 7 Notes

4 Topology Assumptions Resulting design
Per Node Traffic: All intergroup connections in optical cables Routers can have up to 107 ports Resulting design a=26, p=27, h=4 a p h_r h_g Nodes per Group Groups Total Cost ($) Global Bandwidth (GB) Required Average (GB) Min Router Ports Max Router Ports Endpoint Ports Router Ports Router Ports for global connection 64 32 1 48 2048 49 127,478 240 1004 127 128 26 40 4 97 1024 98 478,485 485 507 105 106 80 30 1025 508 85 86 60 27 1026 79 54 13 1027 509 51 52

5 Routing … … … … … Group Group 1 Group 2 Routing decision
Global Network Group Group 1 Group 2 Routing decision Potential congestion Router h 1 Local Network Router 1 2 Router h 2 3 Local Network Router 1 2 Figure modified from: Jiang, Dally, Kim: Indirect Adaptive Routing on Large Scale Interconnection Networks Figure modified from: Jiang, Dally, Kim: Indirect Adaptive Routing on Large Scale Interconnection Networks

6 Routing UGAL-L globally adaptive routing that chooses between:
MIN – minimal path VAL – non minimal path routing to a random group first (Valiant load balancing) Choice made based on local queue information: qminHmin compared to: qvalHval Problems with limited throughput and higher intermediate latency

7 Routing Problem: limited throughput due to imperfect load-balancing of UGAL-L UGAL-L will never route non-minimally through the same router that is used for minimal routing Solution: UGAL-L using selective Virtual Channel discrimination Figure modified from: John Kim, Wiliam J. Dally, Steve Scott, and Dennis Abts Technology-Driven, Highly-Scalable Dragonfly Topology. Figure modified from: John Kim, Wiliam J. Dally, Steve Scott, and Dennis Abts Technology-Driven, Highly-Scalable Dragonfly Topology.

8 Routing Problem: High intermediate latency due to having to fill up buffers before sensing congestion Buffers still need to be sized correctly to achieve maximum throughput Solution: Using credit round-trip latency to sense and signal congestion Figure from: John Kim, Wiliam J. Dally, Steve Scott, and Dennis Abts Technology-Driven, Highly-Scalable Dragonfly Topology. Figure from: John Kim, Wiliam J. Dally, Steve Scott, and Dennis Abts Technology-Driven, Highly-Scalable Dragonfly Topology.

9 Flow Control Basic virtual-channel flow control with credit-based backpressure Virtual Channel Flow Control 6 VCs 3 for standard traffic 3 for hotspot traffic Exploring Packet Sizes Running simulations with different packet sizes

10 Hotspot Management Tree saturation problem Non-interfering networks
Worse with more path diversity Non-interfering networks Separate VCs for hotspot and non-hotspot traffic Figure taken from: EE382C: Lecture15 slides Figure taken from: EE382C: Lecture15 slides In the project hotspot traffic is easily distinguished and hotspot nodes are assigned statically: Class separation

11 Hotspot Management Dynamic hotspot detection
Still use class separation to manage hotspots Statically assigned (or slow changing) hotspot nodes Detect hotspots at last hop routers (by counting packets) and propagate information through the network Inspect queues for multiple packets going to the same destination, which is then likely to be a hotspot Fast changing hotspot nodes Assumption is that traffic to hotspot nodes is going to spike after node becomes hotspot Detect spikes by counting packets and looking for per destination peaks Use more virtual channels Impractical case of one VC per destination would solve the problem Use higher level QoS to do class separation

12 Status Bugs squashed to date: 2 Topology Routing Flow Control

13 Status: Topology In progress Changing:
Router per group no longer 2a # Groups no longer a*p+1 Each node connected to two routers Downsized network of 1,024 nodes

14 Status: Traffic Pattern
4 kinds of traffic patterns to implement 3 patterns complete bit-reversal traffic pending Requires the number of nodes to be power of 2 Iteration of 30 requests-replies TrafficManager class has been modified extensively

15 Status: Routing UGAL-L algorithm
Default function in Dragonfly.cpp Minimum routing okay on uniform traffic Working on UGAL-LCR Credit mechanism needs to be changed

16 Status: Flow Control VC size: 256 flits Non-interfering networks
Separate VC set for the hotspot traffic class 3 VCs are dedicated for hotspot traffic Exclusively for hotspot traffic Divide the messages into packets Started requests and replies at {10,10,10} Iterating size to: {20,20,20}, etc.

17 Questions


Download ppt "How to Train your Dragonfly"

Similar presentations


Ads by Google