How to Train your Dragonfly

Slides:



Advertisements
Similar presentations
Interconnection Networks: Flow Control and Microarchitecture.
Advertisements

Wide Area Wi-Fi Sam Bhoot. Wide Area Wi-Fi  Definition: Wi-Fi (Wireless Fidelity) n. – popular term for high frequency wireless local area networks operating.
Misbah Mubarak, Christopher D. Carothers
A Novel 3D Layer-Multiplexed On-Chip Network
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
George Michelogiannakis, Nan Jiang, Daniel Becker, William J. Dally This work was completed in Stanford University.
EFFICIENT ROUTING MECHANISMS FOR DRAGONFLY NETWORKS Marina García Enrique Vallejo Ramón Beivide Miguel Odriozola Mateo Valero International Conference.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
Department of Computer Engineering University of California at Santa Cruz Networking Systems (1) Hai Tao.
December 20, 2004MPLS: TE and Restoration1 MPLS: Traffic Engineering and Restoration Routing Zartash Afzal Uzmi Computer Science and Engineering Lahore.
Interconnection Networks
Cristóbal Camarero With support from: Enrique Vallejo Ramón Beivide
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.
Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
DUKE UNIVERSITY Self-Tuned Congestion Control for Multiprocessor Networks Shubhendu S. Mukherjee VSSAD, Alpha Development Group.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Dragonfly Topology for networks Presented by : Long Bao.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
Network-on-Chip Introduction Axel Jantsch / Ingo Sander
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Virtual-Channel Flow Control William J. Dally
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
Data Center Architectures
Congestion Control in Data Networks and Internets
Architecture and Algorithms for an IEEE 802
Topics discussed in this section:
Constraint-Based Routing
Chapter 8 Switching Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Datacenter Interconnection Network Design
Network Layer.
Congestion Control and Resource Allocation
EEC-484/584 Computer Networks
Mechanics of Flow Control
Computer Network Performance Measures
Architecture of Parallel Computers CSC / ECE 506 Summer 2006 Scalable Programming Models Lecture 11 6/19/2006 Dr Steve Hunter.
William Stallings Data and Computer Communications
EEC-484/584 Computer Networks
Interconnection Networks: Routing
Virtual-Channel Flow Control
Computer Network Performance Measures
Congestion Control (from Chapter 05)
EEC-484/584 Computer Networks
Congestion Control (from Chapter 05)
CEG 4131 Computer Architecture III Miodrag Bolic
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Congestion Control (from Chapter 05)
EE382C Lecture 6 Adaptive Routing 4/14/11 What is tornado traffic?
Dhruv Gupta EEC 273 class project Prof. Chen-Nee Chuah
Congestion Control (from Chapter 05)
Congestion Control (from Chapter 05)
Congestion Control (from Chapter 05)
Congestion Control (from Chapter 05)
Congestion Control (from Chapter 05)
Dragonfly+: Low Cost Topology for scaling Datacenters
Congestion Control and Resource Allocation
EE382C Final Project Crouching Tiger, Hidden Dragonfly
Presentation transcript:

How to Train your Dragonfly EE382c Final presentation May 24,2011 Hyungmin Cho Andrew Danowitz Mario Flajslik Amimul Ihsan

Outline Topology Routing Flow Control Hot Spot Management Status

Topology Dragonfly Minimizes expensive global communication No more than 4 hops Modification: Each node connected to two routers Source: Lecture 7 Notes

Topology Assumptions Resulting design Per Node Traffic: 5GB/s/node @10% All intergroup connections in optical cables Routers can have up to 107 ports Resulting design a=26, p=27, h=4 a p h_r h_g Nodes per Group Groups Total Cost ($) Global Bandwidth (GB) Required Average (GB) Min Router Ports Max Router Ports Endpoint Ports Router Ports Router Ports for global connection 64 32 1 48 2048 49 127,478 240 1004 127 128 26 40 4 97 1024 98 478,485 485 507 105 106 80 30 1025 508 85 86 60 27 1026 79 54 13 1027 509 51 52

Routing … … … … … Group Group 1 Group 2 Routing decision Global Network … Group Group 1 Group 2 Routing decision Potential congestion Router h 1 … … Local Network Router 1 2 Router h 2 3 … … Local Network Router 1 2 Figure modified from: Jiang, Dally, Kim: Indirect Adaptive Routing on Large Scale Interconnection Networks Figure modified from: Jiang, Dally, Kim: Indirect Adaptive Routing on Large Scale Interconnection Networks

Routing UGAL-L globally adaptive routing that chooses between: MIN – minimal path VAL – non minimal path routing to a random group first (Valiant load balancing) Choice made based on local queue information: qminHmin compared to: qvalHval Problems with limited throughput and higher intermediate latency

Routing Problem: limited throughput due to imperfect load-balancing of UGAL-L UGAL-L will never route non-minimally through the same router that is used for minimal routing Solution: UGAL-L using selective Virtual Channel discrimination Figure modified from: John Kim, Wiliam J. Dally, Steve Scott, and Dennis Abts. 2008. Technology-Driven, Highly-Scalable Dragonfly Topology. Figure modified from: John Kim, Wiliam J. Dally, Steve Scott, and Dennis Abts. 2008. Technology-Driven, Highly-Scalable Dragonfly Topology.

Routing Problem: High intermediate latency due to having to fill up buffers before sensing congestion Buffers still need to be sized correctly to achieve maximum throughput Solution: Using credit round-trip latency to sense and signal congestion Figure from: John Kim, Wiliam J. Dally, Steve Scott, and Dennis Abts. 2008. Technology-Driven, Highly-Scalable Dragonfly Topology. Figure from: John Kim, Wiliam J. Dally, Steve Scott, and Dennis Abts. 2008. Technology-Driven, Highly-Scalable Dragonfly Topology.

Flow Control Basic virtual-channel flow control with credit-based backpressure Virtual Channel Flow Control 6 VCs 3 for standard traffic 3 for hotspot traffic Exploring Packet Sizes Running simulations with different packet sizes

Hotspot Management Tree saturation problem Non-interfering networks Worse with more path diversity Non-interfering networks Separate VCs for hotspot and non-hotspot traffic Figure taken from: EE382C: Lecture15 slides Figure taken from: EE382C: Lecture15 slides In the project hotspot traffic is easily distinguished and hotspot nodes are assigned statically: Class separation

Hotspot Management Dynamic hotspot detection Still use class separation to manage hotspots Statically assigned (or slow changing) hotspot nodes Detect hotspots at last hop routers (by counting packets) and propagate information through the network Inspect queues for multiple packets going to the same destination, which is then likely to be a hotspot Fast changing hotspot nodes Assumption is that traffic to hotspot nodes is going to spike after node becomes hotspot Detect spikes by counting packets and looking for per destination peaks Use more virtual channels Impractical case of one VC per destination would solve the problem Use higher level QoS to do class separation

Status Bugs squashed to date: 2 Topology Routing Flow Control

Status: Topology In progress Changing: Router per group no longer 2a # Groups no longer a*p+1 Each node connected to two routers Downsized network of 1,024 nodes

Status: Traffic Pattern 4 kinds of traffic patterns to implement 3 patterns complete bit-reversal traffic pending Requires the number of nodes to be power of 2 Iteration of 30 requests-replies TrafficManager class has been modified extensively

Status: Routing UGAL-L algorithm Default function in Dragonfly.cpp Minimum routing okay on uniform traffic Working on UGAL-LCR Credit mechanism needs to be changed

Status: Flow Control VC size: 256 flits Non-interfering networks Separate VC set for the hotspot traffic class 3 VCs are dedicated for hotspot traffic Exclusively for hotspot traffic Divide the messages into packets Started requests and replies at {10,10,10} Iterating size to: {20,20,20}, etc.

Questions