Download presentation
Presentation is loading. Please wait.
Published byRodney Matthews Modified over 9 years ago
1
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN)
2
Introduction Network-on-Chips –Tile architecture –On-chip routers –Packet switching Various NoC topologies –Mesh, Torus –H-Tree, Fat Trees Fat H-Tree (FHT) Evaluations of FHT –Performance –Area –Energy A mesh-based on-chip network 012 345 678 Tile (RISC, DSP, RAM, I/O) We proposed FHT as an alternative to Fat Trees
3
NoCs’ topologies: Mesh & Torus 2-D Mesh2-D Torus –2x bandwidth of mesh RAW [Taylor, IEEE Micro’02] RouterCore Fat H-Tree is a tree-based topology, but it includes a torus structure
4
NoCs’ topologies: Fat Trees Fat Tree (p, q, c) p: # of upward links q: # of downward links c: # of core ports RouterCore Fat Tree (2,4,2)Fat Tree (2,4,1) Rank-1 Rank-2 Trees are duplicated in Fat Trees and Fat H-Tree, but the connection patterns of trees are different!
5
Outline NoCs’ topologies –Mesh, Torus –H-Trees, Fat Trees Fat H-Tree (FHT) –Structure –2-D layout –Routing algorithm (DTR) Evaluations of FHT –Network logic area –Energy consumption –Throughput
6
Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) [Yamada, EUC’04] Combining two H-Trees (red & black) RouterCoreRouterCore Location of black tree is shifted lower-right direction of red tree By shifting the location of black tree, the connection pattern of trees are different from original Fat Trees
7
Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) [Yamada, EUC’04] Combining two H-Trees (red & black) RouterCoreRouterCore Fat H-Tree is formed on red & black trees
8
Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) [Yamada, EUC’04] Combining two H-Trees (red & black) RouterCoreRouterCore Fat H-Tree is formed on red & black trees
9
Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) [Yamada, EUC’04] Combining two H-Trees (red & black) RouterCoreRouterCore Fat H-Tree is formed on red & black trees
10
Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) [Yamada, EUC’04] Combining two H-Trees (red & black) RouterCoreRouterCore Rank-2 or upper routers are omitted in this figure Each core is connected to both red & black trees Ring is formed with cores & rank1 routers Torus-level performance by combing only two H-Trees
11
Fat H-Tree: 2-D layout on VLSI Fat H-Tree –Torus structure Folded as well as the folded layout of 2-D Torus Fat H-Tree’s 2-D layout RouterCore Topologically equivalent (Long feedback links across chip)
12
Fat H-Tree: Routing algorithm Paths on a single H-tree –Only red tree, or –Only black tree Only red tree 6-hop Only black tree 6-hop
13
Fat H-Tree: Routing algorithm Paths on a single H-tree –Only red tree, or –Only black tree Paths across trees –Transit between trees –Minimum paths Firstly red is used Then black is used, total 4-hop (minimum) Transit! Exploiting such paths is key for improving the performance
14
Fat H-Tree: Dual tree routing (DTR) Dual tree routing –Transit trees for minimum paths –Cycles across trees Deadlock avoidance –VC# is increased when a packet transits from red to black VC#0 is used VC#1 is used Transit! Sufficient number of VCs is only TWO in 64-node FHT
15
Outline NoCs’ topologies –Mesh, Torus –H-Trees, Fat Trees Fat H-Tree (FHT) –Structure –2-D layout –Routing algorithm (DTR) Evaluations of FHT –Network logic area –Energy consumption –Throughput
16
Ideal throughput: Channel bisection Bandwidth of FHT is much improved by the torus structure N=16N=64N=256 HT4444 FT181632 FT2163264 FHT244072 Mesh81632 Torus163264 FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) due to torus due to two H-Trees
17
Number of routers Router count of FHT is less than Fat Tree(2,4,2) N=16N=64N=256 HT52185 FT1628120 FT21256240 FHT1042170 Mesh1664256 Torus1664256 FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) Note number of NI is not considered. FHT requires 2-port NIs for red & black
18
Network logic area (routers & NIs) Synthesis of NoC –16-core, 64-core –Design Compiler –0.18um CMOS Router architecture –1-flit = 32-bit –4-stage pipeline –Wormhole, 2VCs NI architecture –In: 2-flit FIFO –Out: 2-flit FIFO Crossbar Input Ports Buf Wormhole router Buf 2VCs FHT’s NI is implemented as a “router” to forward packets between trees
19
Synthesis result (64-core) Network logic area: 16/64-core Synthesis result (16-core) Network logic area of FHT is smaller than Fat Tree(2,4,2) FHT’s NI is larger than others
20
Total wire length of all links Total unit-length of links –Core router –Router router 1-unit link How many unit-links would FHT require? 1-unit = distance between neighboring cores N=16N=64N=256 HT24112480 FT1321921,024 FT2643842,048 FHT723921,800 Mesh24112480 Torus48224960 FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) Wire length of FHT is almost the same as Fat Tree(2,4,2)
21
Energy: NoC’s energy model Ave. flit energy –Send 1-flit to dest. –How much energy[J] ? Parameters –12mm square chip –16/64-core –0.18um CMOS Switching energy –1-bit switching @ router –Gate-level sim –1.88 [pJ / hop] –1.27 [pJ / hop] –1.45 [pJ / hop] Link energy –1-bit transfer @ link –0.67 [pJ / mm] [Wang, DATE’05] 12mm for routers for NI for NI(fht)
22
Energy consumption: 16/64-core Simulation result (16-core) Energy consumption of FHT is less than Fat Tree(2,4,2) Simulation result (64-core)
23
Throughput: Simulation environment Flit-level simulation –Throughput / latency –16/64-core Topology (routing) –Mesh, Torus (DOR) –Fat Trees (up/down) –Fat H-Tree (DTR) Traffic patterns –Uniform –BT.W –SP.W –CG.W –MG.W –IS.W Packet size16-flit (1-flit header) Buffer size1-flit per channel SwitchingWormhole # of VCs2 Latency3-cycle per 1-hop NAS Parallel Benchmark
24
FHT vs. FTs: Uniform (16/64-core) FHT (DTR) Fat Tree(2,4,2) Fat Tree(2,4,1) FHT outperforms FT2 in 16-core,but it doesn’t in 64-core Uniform (16-core)Uniform (64-core) FHT(DTR) causes congestion around root of trees
25
FHT vs. FTs: BT (16/64-core) BT has neighboring communications. Advantage for FHT(DTR) BT traffic (64-core) FHT (DTR) Fat Tree(2,4,2) Fat Tree(2,4,1) FHT(DTR) doesn’t cause congestion around roots BT traffic (16-core)
26
FHT vs. FTs: MG (16/64-core) Performance is … FHT(DTR) > FT2 > FT1 MG traffic (16-core)MG traffic (64-core) FHT (DTR) Fat Tree(2,4,2) Fat Tree(2,4,1)
27
Summary: Evaluations of FHT Performance –FHT outperforms Fat Tree (FT2), except for uniform Network logic area –FHT requires 20.5%-28.1% smaller area than FT2 Energy consumption –FHT requires 6.7%-7.0% less energy than FT2 Wire length –Wire length of FHT is almost the same as FT2 Ongoing works –Evaluation in 90nm CMOS –3-D layout of FHT for 3-D NoCs wafer (stacked ICs)
28
Thank you for your attention
29
Feasibility of Fat H-Tree Total wire length –Slightly longer than Fat Trees –But a lot of wire resources are available on-chip Wire delay –Length of the longest wire is same as Fat Trees Fat Tree (2,4,1)Fat H-Tree If Fat Trees are feasible, Fat H-Tree can be implemented with smaller area but higher performance
30
Routings for FHT: Torus routing(TOR) Single tree (STR) –Select a single tree per packet –Can’t transit trees Dual tree (DTR) –Transit trees for minimal paths –VCs are needed Torus routing (TOR) –Use torus formed with rank1 & cores –VCs are needed Fat H-Tree’s torus structure Can’t use rank-2 or upper routers To avoid congestion around roots, but non-minimal paths
31
FHT vs. Torus: Uniform (16/64-core) FHT (DTR): FHT (TOR): 2-D Torus 2-D Mesh Minimum routing using links around roots Using torus structure (can’t use links around roots) Uniform (64-core) FHT achieves torus-level throughput using only torus structure Uniform (16-core)
32
Number of VCs in Dual Tree Routing # of VCs required is –H_max is the longest hop count in the network E.g., –16-core FHT requires 2VCs –64-core FHT requires 2VCs –… VC# is increased when a packet transits red to black Two VCs is not so costly…
33
NIs in Fat H-Tree Implemented as a “simplified router” –Connecting red & black trees Routing @ NI is simple –Forward packets to another tree if dst is not me Processing Core Crossbar for red treefor black tree Fat H-Tree
34
Synthesis result (64-core) Network logic area: 16/64-core Synthesis result (16-core) Network logic area of FHT is smaller than Fat Tree(2,4,2) FHT’s NI is larger than others
36
Fat H-Tree –Minimum routing (DTR) routing N=16N=64N=256 FT up/down 3.605.437.36 FHTDTR3.204.846.78 MeshDOR2.675.3310.67 TorusDOR2.134.068.03 FHT offers shorter average hop count than Fat Trees Average hop count FT: Fat Trees
37
Wire length of links Case studies –16-core (1-unit = 3.0mm) –64-core (1-unit = 1.5mm) 1-unit = 3mm Utilization rate of wire resources in 2 metal layers (%) 1-unit = 1.5mm Flit-width = 32-bit @ 12mm square chip 12mm N=16N=64 HT1.6%3.7% FT12.1%6.4% FT24.3%12.8% FHT4.8%13.1% Mesh1.6%3.7% Torus3.2%7.5% Wire length of FHT is almost the same as Fat Tree(2,4,2)
38
Routings for FHT: Single tree (STR) Single tree (STR) –Select a single tree per packet –Can’t transit trees Dual tree (DTR) –Transit trees for minimal paths –VCs are needed Torus routing (TOR) –Use torus formed with rank1 & cores –VCs are needed Case 1: red tree 6-hop Case 2: black tree 4-hop
39
Routings for FHT: Dual tree (DTR) Single tree (STR) –Select a single tree per packet –Can’t transit trees Dual tree (DTR) –Transit trees for minimal paths –VCs are needed Torus routing (TOR) –Use torus formed with rank1 & cores –VCs are needed Firstly red is used Then black is used # of VC is increased when a packet transits red to black
40
Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) [Yamada, EUC’04] Combining two H-Trees (red & black) RouterCoreRouterCore Both edges are connected (folded) By shifting and folding black tree, the connection pattern of trees are different from original Fat Trees
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.