Download presentation
Presentation is loading. Please wait.
Published byBrittney Fitzgerald Modified over 9 years ago
1
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of Electrical and Computer Engineering Purdue University
2
June 08 2005Purdue University2 Motivation New routing algorithm for 2D Mesh networks : O1TURN Why 2D Mesh networks? –Important class of interconnection network –Natural topology for on-chip network –Many Applications “yet another routing algorithm”?
3
June 08 2005Purdue University3 Routing Algorithms: Objectives Maximize throughput and minimize latency O1TURN satisfies all design goals IDEALDORROMMVALIANTMIN-ADAPTIVE Average case throughput XXX Worst case Throughput XX? Minimal # of network hops XXXX Low complexity router XXX
4
June 08 2005Purdue University4 Challenges Intuition: Path flexibility, Load Balancing, Throughput correlated Prior results –Throughput : Increasing path flexibility [SPAA 2002] May not improve worst case throughput, even decrease Likely to improve average case throughput –Latency : Increasing path flexibility may increase router complexity IDEALDORROMMVALIANTMIN-ADAPTIVE Average case throughputXXX Worst case ThroughputXX? Minimal # of network hopsXXXX Low complexity routerXXX # of Paths?1Θ(K’ 2 )Θ(K 2 )Θ(2 K’ )
5
June 08 2005Purdue University5 Contributions Develop new routing algorithm : O1TURN Throughput –Better than DOR / ROMM for worst-case throughput Near optimal worst-case throughput for 2D Mesh –Captures most of the “opportunity” with limited path flexibility for average case throughput O1TURN (with 2 paths) as good as ROMM (with Θ(K’ 2 ) paths) Latency –Router Implementation for O1TURN Comparable complexity as simple DOR router Key Point : –Partition the delay-critical circuitry O1TURN is minimal : One goal trivially satisfied
6
June 08 2005Purdue University6 Outline Background of interconnection network O1TURN routing algorithm O1TURN router implementation Simulation Results Conclusion and Q&A
7
June 08 2005Purdue University7 Outline Background of interconnection network O1TURN routing algorithm O1TURN router implementation Simulation Results Conclusion and Q&A
8
June 08 2005Purdue University8 Background Packet Switched, 2D mesh network –Each packet independently routed Terminology –Network Radix = k in kxk network (NOT Degree) Simplifying assumptions for this talk –One packet crosses a link in one cycle –Square mesh networks (K x K) –K is even (K = 2p) Analytical method for throughput analysis –TD Method [Towles and Dally, SPAA 2002] –Worst-case throughput = (Maximum channel load) -1 –Given permutation and (oblivious) routing algorithm Find maximum channel load –Given only (oblivious) routing algorithm Find permutation that causes maximum channel load
9
June 08 2005Purdue University9 TD-Method Example Traffic : Src -> Dst A -> D D -> A A -> B -> D A -> C -> D D -> C -> A D -> B -> A D -> C -> A Max Channel Load = 1 Worst-case Throughput = (1 / 1) = 1 Max Channel Load = 0.5 Worst-case Throughput = (1 / 0.5) = 2 Unit of worst-case throughput = packets / node / cycle
10
June 08 2005Purdue University10 Outline Background of interconnection network O1TURN routing algorithm O1TURN router implementation Simulation Results Conclusion and Q&A
11
June 08 2005Purdue University11 O1TURN routing algorithm Orthogonal 1 TURN routing –There is no U-TURN => Orthogonal –At most 1 turn => 1TURN Use 2 routes –At most 2 minimal, 1-turn routes in 2D MESH (XY, YX) –Two routing algorithms (XY routing, YX routing) –With same probability
12
June 08 2005Purdue University12 O1TURN routing algorithm Claim: Maximum channel load of O1TURN is K / 2 Proof: Two sources of load contributions –# of nodes of left side of channel by XY routing –# of nodes of right side of channel by YX routing N * 0.5(K - N) * 0.5 XY routingYX routing
13
June 08 2005Purdue University13 Optimal Worst Case Throughput Maximum channel load = K / 2 –Worst-case Throughput = 2 / K by TD Method Consider a permutation where 100% packets cross bisection –Throughput (X) bounded when bisection links saturated –X * (K 2 / 2) = K –X = 2 / K packets / node / cycle When K is odd, O1TURN is within (1 / K 2 ) of optimal worst-case throughput K x K mesh
14
June 08 2005Purdue University14 Worst-case Throughput Trends Worst-case channel load as network size changes –Normalized to Optimal worst-case throughput –Worst case throughput of DOR, ROMM degrades with K Recall Even Radix : Opt * 1 Odd Radix : Opt * (1 - 1 / K 2 )
15
June 08 2005Purdue University15 Average Case Analysis Extension of TD method [B.Towles et.al., SPAA 2003] –Examine randomly chosen permutations –Harmonic means of worst-case throughput of various permutations –1 M random permutations O1TURN shows the better or the same average case throughput 4 x 4 2D MESH DORROMMO1TURN Average case throughput 11.1131.136 8 x 8 2D MESH Average case throughput 11.1801.188
16
June 08 2005Purdue University16 O1TURN Summary Near optimal worst-case Throughput –By TD method –Optimal for even K –Approaches Optimal for large, odd K Average case throughput –Better than DOR and comparable to ROMM Minimal # of network hops –O1TURN is minimal routing
17
June 08 2005Purdue University17 Outline Background of interconnection network O1TURN routing algorithm O1TURN router implementation Simulation Results Conclusion and Q&A
18
June 08 2005Purdue University18 Base Router Implementation Base Router : Pipelined Virtual Channel Router –4 Stages : Routing, Virtual Channel allocation, Switch allocation, Crossbar & Physical Channel transfer –One control block controls all virtual channels –Critical Stage : Virtual Channel allocation stage
19
June 08 2005Purdue University19 O1TURN Router Implementation O1TURN Router –Separate Virtual Channels into two virtual networks (VN) –One VN for XY routing, the other for YX routing –Deadlock prevention in each independent VN due to DOR
20
June 08 2005Purdue University20 Existing router delay models for pipelined routers –Peh and Dally [HPCA 2001] Based on the logical effort method –[I.Sutherland, B. Sproull, 1999] –FO4 unit –Comparable complexity as DOR router Delay Analysis VCs / PC DORO1TURN VC allocationSW allocationVC allocationSW allocation 41714 820161716
21
June 08 2005Purdue University21 O1TURN Summary Near Optimal Worst case Throughput Good average case Throughput Minimal Network Hops Low Complexity Router Implementation –Comparable complexity as DOR router IDEALO1TURN Average case throughput XX Worst case Throughput XX Minimal # of network hops XX Low complexity router XX
22
June 08 2005Purdue University22 Outline Background of interconnection network O1TURN routing algorithm O1TURN router implementation Simulation Results Conclusion and Q&A
23
June 08 2005Purdue University23 Evaluation Method Modified Popnet network Simulator [L. Shang, 2003] 4x4 2D MESH (8x8 in paper) Full-duplex, bidirectional links 8 VCs per PC 5 Flits per packet 500 K cycles Synthetic Traffic: Uniform Random, BC, MT, HOT SPOT Compared with existing routing algorithms –Oblivious routing algorithms (DOR, ROMM) –Adaptive routing algorithm (DUATO)
24
June 08 2005Purdue University24 Simulation Results 4 x 4 2D MESH – Uniform Random Traffic Pattern
25
June 08 2005Purdue University25 Simulation Results 4 x 4 2D MESH – Matrix Transpose Traffic Pattern –One of the worst-case traffic pattern for DOR
26
June 08 2005Purdue University26 Simulation Results 4 x 4 2D MESH – Bit Complement Traffic Pattern –Already balanced traffic pattern
27
June 08 2005Purdue University27 Simulation Results 4 x 4 2D MESH – HOT SPOT Traffic Pattern –2 nodes have 20% of traffic
28
June 08 2005Purdue University28 Simulation Results Delay penalty of adaptive routing –How the complexity of router implementation affects on latency –Hot Spot Traffic Pattern
29
June 08 2005Purdue University29 Outline Background of interconnection network O1TURN routing algorithm O1TURN router implementation Simulation Results Conclusion and Q&A
30
June 08 2005Purdue University30 Related Work Routing algorithms –Valiant [L.G.Valiant et.al, ACM 1981] –ROMM [T.Nesson et.al, ACM 1995] –DUATO [J.Duato et.al, 1993] Partitioned router implementation –Mad Postman [Jesshope et.al, ISCA 1989] –PFNF [Upadhyay et.al, 1997] Analysis methods –Worst-case [B.Towles et.al, 2002] –Throughput centric [B.Towles et.al, 2003] –Delay model [L.S.Peh et.al, HPCA 2001]
31
June 08 2005Purdue University31 Conclusion Goals –Good average case throughput –Good or Optimal worst case throughput –Minimal # of network hops –Low complexity router implementation O1TURN –Provide near optimal worst case throughput –Provide the better or the same average case throughput compared with existing routing algorithms –Minimal # of network hops –Simple router implementation : comparable with DOR router –Satisfy all performance aspects
32
June 08 2005Purdue University32 Q & A
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.