A DAPTIVE R OUTING David Ouellet-Poulin - 4073219 CEG 4136 – Computer Architecture III November 16 th, 2010.

Slides:



Advertisements
Similar presentations
Data Communications and Networking
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Interconnection Networks: Topology and Routing Natalie EnrightJerger.
Data and Computer Communications Ninth Edition by William Stallings Chapter 12 – Routing in Switched Data Networks Data and Computer Communications, Ninth.
Weighted Random Oblivious Routing on Torus Networks Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Advanced Networking Wickus Nienaber Daniel Beech.
ECE 8813a (1) Non-minimal Routing Non-minimal routing  Wormhole degrades performance while VCT has less secondary effects  Fault tolerance is the main.
©2003 Dror Feitelson Parallel Computing Systems Part II: Networks and Routing Dror Feitelson Hebrew University.
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
CS 258 Parallel Computer Architecture Lecture 5 Routing February 6, 2008 Prof John D. Kubiatowicz
Predictive Load Balancing Reconfigurable Computing Group.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
Dynamic NoC. 2 Limitations of Fixed NoC Communication NoC for reconfigurable devices:  NOC: a viable infrastructure for communication among task dynamically.
Issues in System-Level Direct Networks Jason D. Bakos.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
2015/7/2Deadlock-free Packet Switching1 Introduction to Distributed Algorithm Part One: Protocols Chapter 5- Deadlock-free Packet Switching Teacher: Chun-Yuan.
Cristóbal Camarero With support from: Enrique Vallejo Ramón Beivide
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
1 Chapter 10 Introduction to Metropolitan Area Networks and Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.
John Kubiatowicz Electrical Engineering and Computer Sciences
Storage area network and System area network (SAN)
Interconnection Networks: Introduction
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Interconnect Basics 1. Where Is Interconnect Used? To connect components Many examples  Processors and processors  Processors and memories (banks) 
On-Chip Networks and Testing
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Distributed Routing Algorithms. In a message passing distributed system, message passing is the only means of interprocessor communication. Unicast, Multicast,
R OUTE P ACKETS, N OT W IRES : O N -C HIP I NTERCONNECTION N ETWORKS Veronica Eyo Sharvari Joshi.
Networks-on-Chips (NoCs) Basics
Infiniband subnet management Discuss the Infiniband subnet management system Discuss fat tree and subnet management in an Infiniband with a fat tree topology.
Improving Capacity and Flexibility of Wireless Mesh Networks by Interface Switching Yunxia Feng, Minglu Li and Min-You Wu Presented by: Yunxia Feng Dept.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Computer Architecture Distributed Memory MIMD Architectures Ola Flygt Växjö University
Deadlock CEG 4131 Computer Architecture III Miodrag Bolic.
Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 3, 2000 Topics Network design issues Network Topology.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
ECE669 L21: Routing April 15, 2004 ECE 669 Parallel Computer Architecture Lecture 21 Routing.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
CSE 661 PAPER PRESENTATION
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Under-project Meeting Network-on-Chip Group 2007/3/07 TA: 林書彥 黃群翔.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
Headline in Arial Bold 30pt Cyclic Dependencies and Deadlock Cyclic Dependencies and Deadlock in Computer Networks (with historical anectdotes) Greg Thorson.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
How to Train your Dragonfly
Network Layer – Routing 1
Deadlock.
Routing mechanism and algorithm
Deadlock Free Hardware Router with Dynamic Arbiter
Lecture 14: Interconnection Networks
Interconnection Networks: Routing
CEG 4131 Computer Architecture III Miodrag Bolic
Lecture: Interconnection Networks
EE382C Lecture 6 Adaptive Routing 4/14/11 What is tornado traffic?
CS 258 Parallel Computer Architecture Lecture 5 Routing (Con’t)
EE382C Lecture 9 Deadlock 4/26/11 EE 382C - S11- Lecture 9.
Presentation transcript:

A DAPTIVE R OUTING David Ouellet-Poulin CEG 4136 – Computer Architecture III November 16 th, 2010

Q UESTIONS /T OPICS Describe several algorithms used for adaptive routing. Describe problems and advantages of these routing algorithms. Which rely on routing table in each router? What types of routing algorithms are used in multicore/multiprocessor system-on-chip? Are adaptive algorithms used at all?

O VERVIEW Introduction Livelock Deadlock Algorithms Turn Model Odd-Even Turn Model Planar GOAL Example Systems IBMCell Intel TeraFLOPS Tilera TILE64 ST Microelectronics STNoC

I NTRODUCTION (1,1) (1,2) (1,3) (1,4) (2,1) (2,2) (2,3) (2,4) (3,1) (3,2) (3,3) (3,4) (4,1) (4,2) (4,3) (4,4)

I NTRODUCTION – L IVE L OCK [3] (1,1) (1,2) (1,3) (1,4) (2,1) (2,2) (2,3) (2,4) (3,1) (3,2) (3,3) (3,4) (4,1) (4,2) (4,3) (4,4)

I NTRODUCTION I NTRODUCTION – D EAD L OCK [3] (1,1) (1,2) (1,3) (1,4) (2,1) (2,2) (2,3) (2,4) (3,1) (3,2) (3,3) (3,4) (4,1) (4,2) (4,3) (4,4) (4,3)

A LGORITHMS - T URN M ODEL [1] 3 modes: West First Turns: North Last Turns: Negative First Turns:

A LGORITHMS - T URN M ODEL [1] 1. Partition the channels in the network into sets according to the directions in which they route packets. If each node has v channels in a physical direction, treat these channels as being in v distinct virtual directions and divide them into v distinct sets accordingly. Put any wraparound channels (for tori) in a separate set to be incorporated during Step Identify the possible turns from one virtual direction to another, ignoring 180-degree and 0-degree turns. A 0-degree turn is only possible when there are multiple channels in one direction. It represents a transition from one set of channels to another when the two sets route packets in the same physical direction, but different virtual directions. 3. Identify the cycles that these abstract turns can form. Generally, identifying the simplest cycles in each plane of the topology is adequate. 4. Prohibit one turn in each abstract cycle so as to prevent deadlock. The turns must be chosen carefully in order to break every possible cycle, including complex cycles not identified in Step 3. A useful approach is first to break the cycles in each plane and then to check whether this allows more complex cycles. 5. Incorporate as many turns as possible from the set of wraparound channels, without reintroducing cycles. At least one turn for each wraparound channel can always be incorporated. 6. Incorporate as many 180-degree and 0-degree turns as possible, without reintroducing cycles.

A LGORITHMS - T URN M ODEL [1] Advantages Deadlock free Livelock free Does not require Routing Table Does not require extra (virtual) channels Disadvantages Router must determine correct type Can easily lead to non-optimal routes (prohibits certain turns) Complex router logic Low level of adaptiveness (only partially adaptive)

A LGORITHMS – O DD -E VEN T URN M ODEL [4] Evolution of Turn Model Restricts locations where certain turns can occur: Rule 1: Any packet is not allowed to take an EN turn at any nodes located in an even column, and it is not allowed to take an NW turn at any nodes located in an odd column. Rule 2: Any packet is not allowed to take an ES turn at any nodes located in an even column, and it is not allowed to take an SW turn at any nodes located in an odd column. Deadlock free as long as 180-degree turns are prohibited

A LGORITHMS – O DD -E VEN T URN M ODEL [4] Advantages Does not require Routing Table More routing adaptiveness than standard turn model Improved communication performance under non- uniform traffic Does not prohibit turns in general Disadvantages Cannot perform 180-degree turns Fault tolerance implementation not yet determined

A LGORITHMS - P LANAR -A DAPTIVE [2] Fully-adaptive in higher-dimensions Minimal (never moves away from destination) Constrains dimensions for routing: Planar routing can use any path (not adaptive) a b a b

A LGORITHMS - P LANAR -A DAPTIVE [2] Advantages Deadlock free Simplifies routing within high-dimensional networks Simple logic Minimal No Routing Table Disadvantages Requires extra (virtual) channels Only improvement in non-planar networks Planar routing not adaptive

A LGORITHMS – GOAL [6] Globally Oblivious Adaptive Locally Oblivious choice of direction (quadrants) Focus on balancing load in each quadrant (on a torus) Minimal routing in quadrant

A LGORITHMS – GOAL [6] Advantages Deadlock free Livelock free Very little routing logic Disadvantages Requires Routing Tables Requires extra (virtual) channels Deadlock free depends on number of virtual channels Only for Torus topology Non-minimal

E XAMPLE S YSTEMS – IBM C ELL [6] 8 SIMD “Synergistic Processing Elements” Interconnected Ring Topology Oblivious Simple choice of Left and Right Maximum bisection width of > 300 GBytes/s

E XAMPLE S YSTEMS – I NTEL T ERA FLOPS [6] Research prototype with 80 PE’s 5 possible connections at each node Source routing (supports up to 10 hops) Flexible: can be oblivious, deterministic or even adaptive Achieves 20 GigaFLOPS and max bisection bandwidth of 320 GBytes/s

E XAMPLE S YSTEMS – T ILERA TILE64 [6] 64 PE’s Real-time processing oriented 4 meshes of 16 nodes each (UDN, IDN, MDN, TDN) Static networking with pre-set routing (circuit switching)

E XAMPLE S YSTEMS – STN O C [6] Prototype architecture “Ring-like” topology Source routing Not deadlock-free

C ONCLUSION Adaptive routing algorithms are good for large number of PE’s Can easily avoid deadlocks & livelocks Complex logic is prohibitive Not used extensively

R EFERENCES [1] Glass, C.J.; Ni, M.N.;, “The turn model for adaptive routing”, Proceedings of the International Symposium on Computer Architecture, pp. 278–287, May 1992 [2] Chien, A.A.; Kim, J.H.;, “Planar-adaptive routing: low-cost adaptive networks for multiprocessors”, Proceedings of the International Symposium on Computer Architecture, pp , 1992 [3] Dally, W.J.; Aoki, H.;, "Deadlock-free adaptive routing in multicomputer networks using virtual channels“, Parallel and Distributed Systems, IEEE Transactions on, vol.4, no.4, pp , Apr 1993

R EFERENCES [4] Ge-Ming Chiu;, "The odd-even turn model for adaptive routing," Parallel and Distributed Systems, IEEE Transactions on, vol.11, no.7, pp , Jul 2000 [5] Arjun Singh; Dally, W.J.; Gupta, A.K.; Towles, B.;, "GOAL: a load-balanced adaptive routing algorithm for torus networks“, Computer Architecture, Proceedings. 30th Annual International Symposium on, vol., no., pp , 9-11 June 2003 [6] Jerger, N.E.; Peh, L.S.;, “On-Chip Networks”, Synthesis Lectures on Computer Architecture, vol. 4, no. 1, pp 1-141, 2009