O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.

Slides:



Advertisements
Similar presentations
QuT: A Low-Power Optical Network-on-chip
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
GCA: Global Congestion Awareness for Load Balance in Networks-on- Chip Mukund Ramakrishna, Paul V. Gratz & Alex Sprintson Department of Electrical and.
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
Interconnection Networks: Topology and Routing Natalie EnrightJerger.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Weighted Random Oblivious Routing on Torus Networks Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego.
Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
A DAPTIVE R OUTING David Ouellet-Poulin CEG 4136 – Computer Architecture III November 16 th, 2010.
Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California,
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
Parallel Routing Bruce, Chiu-Wing Sham. Overview Background Routing in parallel computers Routing in hypercube network –Bit-fixing routing algorithm –Randomized.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Predictive Load Balancing Reconfigurable Computing Group.
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
Trace-Driven Optimization of Networks-on-Chip Configurations Andrew B. Kahng †‡ Bill Lin ‡ Kambiz Samadi ‡ Rohit Sunkam Ramanujam ‡ University of California,
Statistical Approach to NoC Design Itamar Cohen, Ori Rottenstreich and Isaac Keslassy Technion (Israel)
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
Cristóbal Camarero With support from: Enrique Vallejo Ramón Beivide
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.
Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
McRouter: Multicast within a Router for High Performance NoCs
TitleEfficient Timing Channel Protection for On-Chip Networks Yao Wang and G. Edward Suh Cornell University.
Tightly-Coupled Multi-Layer Topologies for 3D NoCs Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN)
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.
Elastic-Buffer Flow-Control for On-Chip Networks
Networks-on-Chips (NoCs) Basics
Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.
DUKE UNIVERSITY Self-Tuned Congestion Control for Multiprocessor Networks Shubhendu S. Mukherjee VSSAD, Alpha Development Group.
Improving Capacity and Flexibility of Wireless Mesh Networks by Interface Switching Yunxia Feng, Minglu Li and Min-You Wu Presented by: Yunxia Feng Dept.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
Algorithms for Allocating Wavelength Converters in All-Optical Networks Authors: Goaxi Xiao and Yiu-Wing Leung Presented by: Douglas L. Potts CEG 790 Summer.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
Network-on-Chip Introduction Axel Jantsch / Ingo Sander
Express Cube Topologies for On-chip Interconnects Boris Grot J. Hestness, S. W. Keckler, O. Mutlu † The University of Texas at Austin † Carnegie Mellon.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano.
University of Michigan, Ann Arbor
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
1 Oblivious Routing Design for Mesh Networks to Achieve a New Worst-Case Throughput Bound Guang Sun 1,2, Chia-Wei Chang 1, Bill Lin 1, Lieguang Zeng 2,
Lecture 16: Router Design
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
1 Low Latency Multimedia Broadcast in Multi-Rate Wireless Meshes Chun Tung Chou, Archan Misra Proc. 1st IEEE Workshop on Wireless Mesh Networks (WIMESH),
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
How to Train your Dragonfly
Lecture 23: Interconnection Networks
Datacenter Interconnection Network Design
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Interconnection Networks: Routing
Lecture 25: Interconnection Networks
EE382C Final Project Crouching Tiger, Hidden Dragonfly
Presentation transcript:

O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of Electrical and Computer Engineering Purdue University

June Purdue University2 Motivation New routing algorithm for 2D Mesh networks : O1TURN Why 2D Mesh networks? –Important class of interconnection network –Natural topology for on-chip network –Many Applications “yet another routing algorithm”?

June Purdue University3 Routing Algorithms: Objectives Maximize throughput and minimize latency O1TURN satisfies all design goals IDEALDORROMMVALIANTMIN-ADAPTIVE Average case throughput XXX Worst case Throughput XX? Minimal # of network hops XXXX Low complexity router XXX

June Purdue University4 Challenges Intuition: Path flexibility, Load Balancing, Throughput correlated Prior results –Throughput : Increasing path flexibility [SPAA 2002] May not improve worst case throughput, even decrease Likely to improve average case throughput –Latency : Increasing path flexibility may increase router complexity IDEALDORROMMVALIANTMIN-ADAPTIVE Average case throughputXXX Worst case ThroughputXX? Minimal # of network hopsXXXX Low complexity routerXXX # of Paths?1Θ(K’ 2 )Θ(K 2 )Θ(2 K’ )

June Purdue University5 Contributions Develop new routing algorithm : O1TURN Throughput –Better than DOR / ROMM for worst-case throughput Near optimal worst-case throughput for 2D Mesh –Captures most of the “opportunity” with limited path flexibility for average case throughput O1TURN (with 2 paths) as good as ROMM (with Θ(K’ 2 ) paths) Latency –Router Implementation for O1TURN Comparable complexity as simple DOR router Key Point : –Partition the delay-critical circuitry O1TURN is minimal : One goal trivially satisfied

June Purdue University6 Outline Background of interconnection network O1TURN routing algorithm O1TURN router implementation Simulation Results Conclusion and Q&A

June Purdue University7 Outline Background of interconnection network O1TURN routing algorithm O1TURN router implementation Simulation Results Conclusion and Q&A

June Purdue University8 Background Packet Switched, 2D mesh network –Each packet independently routed Terminology –Network Radix = k in kxk network (NOT Degree) Simplifying assumptions for this talk –One packet crosses a link in one cycle –Square mesh networks (K x K) –K is even (K = 2p) Analytical method for throughput analysis –TD Method [Towles and Dally, SPAA 2002] –Worst-case throughput = (Maximum channel load) -1 –Given permutation and (oblivious) routing algorithm Find maximum channel load –Given only (oblivious) routing algorithm Find permutation that causes maximum channel load

June Purdue University9 TD-Method Example Traffic : Src -> Dst A -> D D -> A A -> B -> D A -> C -> D D -> C -> A D -> B -> A D -> C -> A Max Channel Load = 1 Worst-case Throughput = (1 / 1) = 1 Max Channel Load = 0.5 Worst-case Throughput = (1 / 0.5) = 2 Unit of worst-case throughput = packets / node / cycle

June Purdue University10 Outline Background of interconnection network O1TURN routing algorithm O1TURN router implementation Simulation Results Conclusion and Q&A

June Purdue University11 O1TURN routing algorithm Orthogonal 1 TURN routing –There is no U-TURN => Orthogonal –At most 1 turn => 1TURN Use 2 routes –At most 2 minimal, 1-turn routes in 2D MESH (XY, YX) –Two routing algorithms (XY routing, YX routing) –With same probability

June Purdue University12 O1TURN routing algorithm Claim: Maximum channel load of O1TURN is K / 2 Proof: Two sources of load contributions –# of nodes of left side of channel by XY routing –# of nodes of right side of channel by YX routing N * 0.5(K - N) * 0.5 XY routingYX routing

June Purdue University13 Optimal Worst Case Throughput Maximum channel load = K / 2 –Worst-case Throughput = 2 / K by TD Method Consider a permutation where 100% packets cross bisection –Throughput (X) bounded when bisection links saturated –X * (K 2 / 2) = K –X = 2 / K packets / node / cycle When K is odd, O1TURN is within (1 / K 2 ) of optimal worst-case throughput K x K mesh

June Purdue University14 Worst-case Throughput Trends Worst-case channel load as network size changes –Normalized to Optimal worst-case throughput –Worst case throughput of DOR, ROMM degrades with K Recall Even Radix : Opt * 1 Odd Radix : Opt * (1 - 1 / K 2 )

June Purdue University15 Average Case Analysis Extension of TD method [B.Towles et.al., SPAA 2003] –Examine randomly chosen permutations –Harmonic means of worst-case throughput of various permutations –1 M random permutations O1TURN shows the better or the same average case throughput 4 x 4 2D MESH DORROMMO1TURN Average case throughput x 8 2D MESH Average case throughput

June Purdue University16 O1TURN Summary Near optimal worst-case Throughput –By TD method –Optimal for even K –Approaches Optimal for large, odd K Average case throughput –Better than DOR and comparable to ROMM Minimal # of network hops –O1TURN is minimal routing

June Purdue University17 Outline Background of interconnection network O1TURN routing algorithm O1TURN router implementation Simulation Results Conclusion and Q&A

June Purdue University18 Base Router Implementation Base Router : Pipelined Virtual Channel Router –4 Stages : Routing, Virtual Channel allocation, Switch allocation, Crossbar & Physical Channel transfer –One control block controls all virtual channels –Critical Stage : Virtual Channel allocation stage

June Purdue University19 O1TURN Router Implementation O1TURN Router –Separate Virtual Channels into two virtual networks (VN) –One VN for XY routing, the other for YX routing –Deadlock prevention in each independent VN due to DOR

June Purdue University20 Existing router delay models for pipelined routers –Peh and Dally [HPCA 2001] Based on the logical effort method –[I.Sutherland, B. Sproull, 1999] –FO4 unit –Comparable complexity as DOR router Delay Analysis VCs / PC DORO1TURN VC allocationSW allocationVC allocationSW allocation

June Purdue University21 O1TURN Summary Near Optimal Worst case Throughput Good average case Throughput Minimal Network Hops Low Complexity Router Implementation –Comparable complexity as DOR router IDEALO1TURN Average case throughput XX Worst case Throughput XX Minimal # of network hops XX Low complexity router XX

June Purdue University22 Outline Background of interconnection network O1TURN routing algorithm O1TURN router implementation Simulation Results Conclusion and Q&A

June Purdue University23 Evaluation Method Modified Popnet network Simulator [L. Shang, 2003] 4x4 2D MESH (8x8 in paper) Full-duplex, bidirectional links 8 VCs per PC 5 Flits per packet 500 K cycles Synthetic Traffic: Uniform Random, BC, MT, HOT SPOT Compared with existing routing algorithms –Oblivious routing algorithms (DOR, ROMM) –Adaptive routing algorithm (DUATO)

June Purdue University24 Simulation Results 4 x 4 2D MESH – Uniform Random Traffic Pattern

June Purdue University25 Simulation Results 4 x 4 2D MESH – Matrix Transpose Traffic Pattern –One of the worst-case traffic pattern for DOR

June Purdue University26 Simulation Results 4 x 4 2D MESH – Bit Complement Traffic Pattern –Already balanced traffic pattern

June Purdue University27 Simulation Results 4 x 4 2D MESH – HOT SPOT Traffic Pattern –2 nodes have 20% of traffic

June Purdue University28 Simulation Results Delay penalty of adaptive routing –How the complexity of router implementation affects on latency –Hot Spot Traffic Pattern

June Purdue University29 Outline Background of interconnection network O1TURN routing algorithm O1TURN router implementation Simulation Results Conclusion and Q&A

June Purdue University30 Related Work Routing algorithms –Valiant [L.G.Valiant et.al, ACM 1981] –ROMM [T.Nesson et.al, ACM 1995] –DUATO [J.Duato et.al, 1993] Partitioned router implementation –Mad Postman [Jesshope et.al, ISCA 1989] –PFNF [Upadhyay et.al, 1997] Analysis methods –Worst-case [B.Towles et.al, 2002] –Throughput centric [B.Towles et.al, 2003] –Delay model [L.S.Peh et.al, HPCA 2001]

June Purdue University31 Conclusion Goals –Good average case throughput –Good or Optimal worst case throughput –Minimal # of network hops –Low complexity router implementation O1TURN –Provide near optimal worst case throughput –Provide the better or the same average case throughput compared with existing routing algorithms –Minimal # of network hops –Simple router implementation : comparable with DOR router –Satisfy all performance aspects

June Purdue University32 Q & A