Dragonfly Topology and Routing

Slides:



Advertisements
Similar presentations
Ch. 12 Routing in Switched Networks
Advertisements

Interconnection Networks: Flow Control and Microarchitecture.
Prof. Natalie Enright Jerger
Ch. 12 Routing in Switched Networks Routing in Packet Switched Networks Routing Algorithm Requirements –Correctness –Simplicity –Robustness--the.
Misbah Mubarak, Christopher D. Carothers
A Novel 3D Layer-Multiplexed On-Chip Network
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
George Michelogiannakis, Nan Jiang, Daniel Becker, William J. Dally This work was completed in Stanford University.
EFFICIENT ROUTING MECHANISMS FOR DRAGONFLY NETWORKS Marina García Enrique Vallejo Ramón Beivide Miguel Odriozola Mateo Valero International Conference.
Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California,
Module R R RRR R RRRRR RR R R R R Efficient Link Capacity and QoS Design for Wormhole Network-on-Chip Zvika Guz, Isask ’ har Walter, Evgeny Bolotin, Israel.
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
CSE 291-a Interconnection Networks Lecture 12: Deadlock Avoidance (Cont’d) Router February 28, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter.
Teknik Routing Pertemuan 20 Matakuliah: H0484/Jaringan Komputer Tahun: 2007.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
Cristóbal Camarero With support from: Enrique Vallejo Ramón Beivide
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Power saving technique for multi-hop ad hoc wireless networks.
1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring
Diamonds are a Memory Controller’s Best Friend* *Also known as: Achieving Predictable Performance through Better Memory Controller Placement in Many-Core.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
8/28/2015  A. Orda, R. Rom, A. Segall, Design of Computer Networks Prof. Ariel Orda Room 914, ext 4646.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
1 Pertemuan 20 Teknik Routing Matakuliah: H0174/Jaringan Komputer Tahun: 2006 Versi: 1/0.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
DUKE UNIVERSITY Self-Tuned Congestion Control for Multiprocessor Networks Shubhendu S. Mukherjee VSSAD, Alpha Development Group.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
Deadlock CEG 4131 Computer Architecture III Miodrag Bolic.
Dragonfly Topology for networks Presented by : Long Bao.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
Network-on-Chip Introduction Axel Jantsch / Ingo Sander
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Data Communications and Networking Chapter 11 Routing in Switched Networks References: Book Chapters 12.1, 12.3 Data and Computer Communications, 8th edition.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
LECTURE 12 NET301 11/19/2015Lect NETWORK PERFORMANCE measures of service quality of a telecommunications product as seen by the customer Can.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.
Performance Comparison of Ad Hoc Network Routing Protocols Presented by Venkata Suresh Tamminiedi Computer Science Department Georgia State University.
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University
How to Train your Dragonfly
Datacenter Interconnection Network Design
Lecture 14: Interconnection Networks
Interconnection Networks: Routing
On-time Network On-chip
Net301 LECTURE 10 11/19/2015 Lect
CEG 4131 Computer Architecture III Miodrag Bolic
Lecture: Interconnection Networks
EE382C Lecture 6 Adaptive Routing 4/14/11 What is tornado traffic?
CS 6290 Many-core & Interconnect
Dragonfly+: Low Cost Topology for scaling Datacenters
EE382C Final Project Crouching Tiger, Hidden Dragonfly
Presentation transcript:

Dragonfly Topology and Routing

Outline Background Motivation Topology description Routing Minimal Routing Valiant Routing UGAL/G Adaptive Routing Indirect Adaptive Routing Credit Round Trip Reservation Piggyback Progressive Performance Comparison

Background As memory and processor performance increases, interconnect networks are becoming critical Topology of an interconnect network affects the performance and cost of the network A good interconnect network, exploits emerging technologies

Motivation Increasing router pin bandwidth High-radix routers Development of active optical cables Longer links with less cost per unit distance Using above technology advancements, we can build networks with higher performance. How?

Motivation Reduced network diameter and latency

Motivation Problem 1: Number of ports in each router is limited (64, 128, …) We want much higher radices (8K – 1M nodes) Problem 2: Long global links between groups are expensive and dominate network cost We should minimize number of global channels traversed by an average packet

Motivation Solution: use group of networks connected to a sub-network as a virtual high-radix router All minimal routes traverse at most only one global link Length of global links are increased to reduce the cost

Dragonfly Topology K = radix of each router = p + a + h - 1 K’ = virtual router radix = a(p + h) N = ap(ah + 1) [Kim et al. ISCA08]

Topology Description Three-level architecture: Router, Group, System Arbitrary networks can be used for inter-group and intra-group networks K’ >> K Very high radix virtual routers Enables very low global diameter (=1) To balance channel load on load balanced traffic: a = 2p = 2h

Topology Variations [Kim et al. ISCA08]

Minimal Routing Step 1 : If Gs ≠ Gd and Rs does not have a connection to Gd, route within Gs from Rs to Ra, a router that has a global channel to Gd. Step 2 : If Gs ≠ Gd, traverse the global channel from Ra to reach router Rb in Gd. Step 3 : If Rb ≠ Rd, route within Gd from Rb to Rd.

Minimal Routing

Minimal Routing Good for uniform traffic All links are used evenly Link saturation happens on adversarial traffic Global ADV Local ADV Load balancing mechanism needed to distribute traffic

Valiant Randomized Routing Step 1 : If Gs ≠ Gi and Rs does not have a connection to Gi, route within Gs from Rs to Ra, a router that has a global channel to Gi. Step 2 : If Gs ≠ Gi traverse the global channel from Ra to reach router Rx in Gi. Step 3 : If Gi ≠ Gd and Rx does not have a connection to Gd, route within Gi from Rx to Ry, a router that has a global channel to Gd. Step 4 : If Gi ≠ Gd, traverse the global channel from Ry to router Rb in Gd. Step 5 : If Rb ≠ Rd, route within Gd from Rb to Rd.

Valiant Routing

Valiant Routing Balances use of global links Increases path length by at least one global link Performs poorly on benign traffic Maximum throughput can be 50%

UGAL-G/L Adaptive Routing Choose between MIN and VAL on a packet by packet basis to load balance the network Path with minimum delay is selected: Queue length Hop count UGAL-L uses local queue info at the current router node UGAL-G uses queue info for all global channels in Gs

UGAL Adaptive Routing Measuring path queue length is unrealistic (UGAL-G) Use local queue length to approximate path queue length Local queues only sense congestion on a global channel via backpressure over the local channel Requires stiff backpressure

Adaptive Routing [Jiang et al. ISCA09]

Indirect Adaptive Routing Improve routing decision through remote congestion information Four methods: Credit Round Trip Reservation Piggyback Progressive

Credit Round Trip [Jiang et al. ISCA09]

Credit Round Trip Delay the return of local credits to the congested router Creates the illusion of stiffer backpressure Drawbacks: Remote Congestion is still sensed through local queue Info is not up to date MIN VAL GC GC Congestion Delayed Credits Credits Source Router [Jiang et al. ISCA09]

Reservation Reserve bandwidth on minimal global channel If successful send the packet minimally If not, route non-minimally Drawbacks: Needs buffer at source router to hold waiting packets Packet latency increased by round-trip time of RES flit RES flits can create significant load on source group MIN VAL GC GC Congestion RES Failed RES Flit Source Router [Jiang et al. ISCA09]

Piggyback Broadcast link state info of GCs to adjacent routers Each router maintains the most recent link state information for every GCs in its group. routing decision is made using both global state information and the local queue depth congestion level of each GC is compressed into a single bit (SGC) Drawbacks: Consumes extra bandwidth Congestion information not up to date due to broadcast delay MIN VAL GC GC Congestion GC Busy GC Free Source Router [Jiang et al. ISCA09]

Progressive Re-evaluate the decision to route minimally at each hop in the source group Non-minimal routing decisions are final The packet is routed minimally until congestion encountered. Then it routes non-minimally Drawbacks: Adds extra hops Needs an additional virtual channel to avoid deadlocks MIN VAL GC GC Congestion Source Router [Jiang et al. ISCA09]

Steady State Traffic: Uniform Random 300 Piggyback 280 Credit Round Trip Progressive 260 Reservation Minimal 240 220 Packet Latency (Simulation cycles) 200 180 160 140 120 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Throughput (Flit Injection Rate) [Jiang et al. ISCA09]

Steady State Traffic: Worst Case 450 Piggyback Credit Round Trip 400 Progressive Reservation Valiant’s 350 300 Packet Latency (Simulation cycles) 250 200 150 100 0.1 0.2 0.3 0.4 0.5 Throughput (Flit Injection Rate) [Jiang et al. ISCA09]