1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.

Slides:



Advertisements
Similar presentations
Prof. Natalie Enright Jerger
Advertisements

Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford.
Ch. 12 Routing in Switched Networks Routing in Packet Switched Networks Routing Algorithm Requirements –Correctness –Simplicity –Robustness--the.
Misbah Mubarak, Christopher D. Carothers
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
A Novel 3D Layer-Multiplexed On-Chip Network
Packet Switching COM1337/3501 Textbook: Computer Networks: A Systems Approach, L. Peterson, B. Davie, Morgan Kaufmann Chapter 3.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
George Michelogiannakis, Nan Jiang, Daniel Becker, William J. Dally This work was completed in Stanford University.
GCA: Global Congestion Awareness for Load Balance in Networks-on- Chip Mukund Ramakrishna, Paul V. Gratz & Alex Sprintson Department of Electrical and.
EFFICIENT ROUTING MECHANISMS FOR DRAGONFLY NETWORKS Marina García Enrique Vallejo Ramón Beivide Miguel Odriozola Mateo Valero International Conference.
Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California,
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
1 Switching and Forwarding Bridges and Extended LANs.
Network based System on Chip Final Presentation Part B Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
CSE 291-a Interconnection Networks Lecture 12: Deadlock Avoidance (Cont’d) Router February 28, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter.
Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
Cristóbal Camarero With support from: Enrique Vallejo Ramón Beivide
1 Switching and Forwarding Bridges and Extended LANs.
1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Wide-Area Traffic Management COS 597E: Software Defined Networking.
Layer-3 Routing Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring
Dragonfly Topology and Routing
29-Aug-154/598N: Computer Networks Switching and Forwarding Outline –Store-and-Forward Switches.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
1 Pertemuan 20 Teknik Routing Matakuliah: H0174/Jaringan Komputer Tahun: 2006 Versi: 1/0.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Elastic-Buffer Flow-Control for On-Chip Networks
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Deadlock CEG 4131 Computer Architecture III Miodrag Bolic.
Sami Al-wakeel 1 Data Transmission and Computer Networks The Switching Networks.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Network and Communications Ju Wang Chapter 5 Routing Algorithm Adopted from Choi’s notes Virginia Commonwealth University.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
Computer Networks with Internet Technology William Stallings
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Presentation of Wireless sensor network A New Energy Aware Routing Protocol for Wireless Multimedia Sensor Networks Supporting QoS 王 文 毅
NC2 (No.4) 1 Undeliverable packets & solutions Deadlock: packets are unable to progress –Prevention, avoidance, recovery Livelock: packets cannot reach.
GPSR: Greedy Perimeter Stateless Routing for Wireless Networks EECS 600 Advanced Network Research, Spring 2005 Shudong Jin February 14, 2005.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Design, Implementation and Tracing of Dynamic Backpressure Routing for ns-3 José Núñez-Martínez Research Engineer Centre Tecnològic de Telecomunicacions.
Efficient Microarchitecture for Network-on-Chip Routers
Advanced Processor Group The School of Computer Science A Dynamic Link Allocation Router Wei Song, Doug Edwards Advanced Processor Group The University.
Jiaxin Cao, Rui Xia, Pengkun Yang, Chuanxiong Guo,
Virtual-Channel Flow Control William J. Dally
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
Distance Vector Routing
PAC: Perceptive Admission Control for Mobile Wireless Networks Ian D. Chakeres Elizabeth M. Belding-Royer.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University
How to Train your Dragonfly
Datacenter Interconnection Network Design
Switching and Forwarding Bridges and Extended LANs
Exploring Concentration and Channel Slicing in On-chip Network Router
Interconnection Networks: Routing
CEG 4131 Computer Architecture III Miodrag Bolic
Lecture: Interconnection Networks
EE382C Lecture 6 Adaptive Routing 4/14/11 What is tornado traffic?
CS 6290 Many-core & Interconnect
Dragonfly+: Low Cost Topology for scaling Datacenters
EE382C Final Project Crouching Tiger, Hidden Dragonfly
Presentation transcript:

1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim Korean Advanced Institute of Science and Technology

2 Overview Indirect adaptive routing (IAR) –Allow adaptive routing decision to be based on local and remote congestion information Main contributions –Three new IAR algorithms for large scale networks –Steady state and transient performance evaluations –Impact of network configurations –Cost of implementation

3 Presentation Outline Background –The dragonfly network –Adaptive routing Indirect adaptive routing algorithms Performance results Implementation considerations

4 Group1 0 2 … Global Network … …… Local Network Router1 2 The Dragonfly Network High Radix Network –High radix routers –Small network diameter Each router –Three types of channels –Directly connected to a few other groups Each group –Organized by a local network –Large number of global channels (GC) Large network with a global diameter of one Router0 p0 p1 …

5 Routing on the Dragonfly Minimal Routing (MIN) 1.Source local network 2.Global network 3.Destination local network Some Adversarial traffic congests the global channels –Each group i sends all packets to group i+1 Oblivious solution: Valiant’s Algorithm (VAL) –Poor performance on benign traffic Router0 Group1 0 … 2 … p0 p1 … …… Router1 2 Congestion

6 Adaptive Routing Choose between the MIN path and a VAL path at the packet source [Singh'05] –Decision metric: path delay –Delay: product of path distance and path queue depth Measuring path queue length is unrealistic Use local queues length to approximate path –Require stiff backpressure Source Router q0q1 q2q3 MIN GC VAL GC Congestion

7 Adaptive Routing: Worst Case Traffic Throughput (Flit Injection Rate) Packet Latency (Simulation cycles) Valiant’s Minimal Adaptive

8 Indirect Adaptive Routing Improve routing decision through remote congestion information Previous method: –Credit round trip [Kim et. al ISCA’08] Three new methods: –Reservation –Piggyback –Progressive

9 Credit Round Trip (CRT) Delay the return of local credits to the congested router Creates the illusion of stiffer backpressure Drawbacks –Remote congestion is still inferred through local queues –Information not up to date Source Router Congestion Delayed Credits MIN GC VAL GC [Kim et. al ISCA’08]

10 Reservation (RES) Each global channel track the number of incoming MIN packets Injected packets creates a reservation flit Routing decision based on the reservation outcome Drawbacks –Reservation flit flooding –Reservation delay Source Router Congestion RES Flit RES Failed MIN GC VAL GC

11 Piggyback (PB) Local congestion broadcast –Piggybacking on each packet –Send on idle channels Congestion data compression Drawbacks –Consumes extra bandwidth –Congestion information not up to date (broadcast delay) Source Router Congestion GC Busy GC Free MIN GC VAL GC

12 Progressive (PAR) MIN routing decisions at the source are not final VAL decisions are final Switch to VAL when encountering congestion Draw backs –Need an additional virtual channel to avoid deadlock –Add extra hops Source Router Congestion MIN GC VAL GC

13 Experimental Setup Fully connected local and global networks –33 groups –1,056 nodes 10 cycle local channel latency 100 cycle global channel latency 10-flit packets

14 Steady State Traffic: Uniform Random Throughput (Flit Injection Rate) Packet Latency (Simulation cycles) Piggyback Credit Round Trip Progressive Reservation Minimal

15 Steady State Traffic: Worst Case Throughput (Flit Injection Rate) Packet Latency (Simulation cycles) Piggyback Credit Round Trip Progressive Reservation Valiant’s

16 Transient Traffic: Uniform Random to Worst Case Cycles After Transition Packet Latency Average Packet Latency per Cycle - UR to WC Progressive Piggyback Cycles After Transition % of Packets Routing Nonminimally % Packets Routing Non-minimally per Cycle - UR to WC Progressive Piggyback

17 Network Configuration Considerations Packet size –RES requires long packets to amortize reservation flit cost –Routing decision is done on per packet basis Channel latency –Affects information delay (CRT, PB) –Affects packet delay (PAR, RES) Network size –Affects information bandwidth overhead (RES, PB) Global diameter greater than one –Need to exchange congestion information on the global network

18 Cost Considerations Credit round trip –Credit delay tracker for every local channel Reservation –Reservation counter for every global channel –Additional buffering at the injection port to store packets waiting for reservation Piggyback –Global channel lookup table for every router –Increase in packet size Progressive –Extra virtual channel for deadlock avoidance

19 Conclusion Three new indirect adaptive routing algorithms for large scale networks Performance and design evaluation of the algorithms Best Algorithm? –Piggyback performed the best under steady state traffic –Progressive responded fastest to transient changes –Network configurations will affect some algorithm performance –Cost of implementation

20 Thank You! Questions?

21 Adaptive Routing: Uniform Traffic Throughput - Flit Injection Rate Packet Latency - Simulation cycles VAL MIN Adaptive

22 Transient Traffic: Worst Case to Uniform Random

23 Transient Traffic: Worst Case 1 to Worst Case 10

Random Permutation Traffic PB Packet Latency % of 1K Permutations CRT Packet Latency % of 1K Permutations PAR Packet Latency % of 1K Permutations RES Packet Latency % of 1K Permutations VAL Packet Latency % of 1K Permutations

25 Effect of Packet size on RES: Worst Case Traffic

26 Large local network: Uniform Random Throughput - Flit Injection Rate Packet Latency - Simulation cycles PB CRT MIN PAR RES

27 Large local network: Worst Case Throughput - Flit Injection Rate Packet Latency - Simulation cycles PB CRT PAR RES VAL