Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

Similar presentations


Presentation on theme: "Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi."— Presentation transcript:

1 Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN)

2 Introduction Network-on-Chips –Tile architecture –On-chip routers –Packet switching Various NoC topologies –Mesh, Torus –H-Tree, Fat Trees Fat H-Tree (FHT) Evaluations of FHT –Performance –Area –Energy A mesh-based on-chip network 012 345 678 Tile (RISC, DSP, RAM, I/O) We proposed FHT as an alternative to Fat Trees

3 NoCs’ topologies: Mesh & Torus 2-D Mesh2-D Torus –2x bandwidth of mesh RAW [Taylor, IEEE Micro’02] RouterCore Fat H-Tree is a tree-based topology, but it includes a torus structure

4 NoCs’ topologies: Fat Trees Fat Tree (p, q, c) p: # of upward links q: # of downward links c: # of core ports RouterCore Fat Tree (2,4,2)Fat Tree (2,4,1) Rank-1 Rank-2 Trees are duplicated in Fat Trees and Fat H-Tree, but the connection patterns of trees are different!

5 Outline NoCs’ topologies –Mesh, Torus –H-Trees, Fat Trees Fat H-Tree (FHT) –Structure –2-D layout –Routing algorithm (DTR) Evaluations of FHT –Network logic area –Energy consumption –Throughput

6 Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) [Yamada, EUC’04] Combining two H-Trees (red & black) RouterCoreRouterCore Location of black tree is shifted lower-right direction of red tree By shifting the location of black tree, the connection pattern of trees are different from original Fat Trees

7 Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) [Yamada, EUC’04] Combining two H-Trees (red & black) RouterCoreRouterCore Fat H-Tree is formed on red & black trees

8 Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) [Yamada, EUC’04] Combining two H-Trees (red & black) RouterCoreRouterCore Fat H-Tree is formed on red & black trees

9 Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) [Yamada, EUC’04] Combining two H-Trees (red & black) RouterCoreRouterCore Fat H-Tree is formed on red & black trees

10 Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) [Yamada, EUC’04] Combining two H-Trees (red & black) RouterCoreRouterCore Rank-2 or upper routers are omitted in this figure Each core is connected to both red & black trees Ring is formed with cores & rank1 routers Torus-level performance by combing only two H-Trees

11 Fat H-Tree: 2-D layout on VLSI Fat H-Tree –Torus structure  Folded as well as the folded layout of 2-D Torus Fat H-Tree’s 2-D layout RouterCore Topologically equivalent (Long feedback links across chip)

12 Fat H-Tree: Routing algorithm Paths on a single H-tree –Only red tree, or –Only black tree Only red tree  6-hop Only black tree  6-hop

13 Fat H-Tree: Routing algorithm Paths on a single H-tree –Only red tree, or –Only black tree Paths across trees –Transit between trees –Minimum paths Firstly red is used Then black is used, total 4-hop (minimum) Transit! Exploiting such paths is key for improving the performance

14 Fat H-Tree: Dual tree routing (DTR) Dual tree routing –Transit trees for minimum paths –Cycles across trees Deadlock avoidance –VC# is increased when a packet transits from red to black VC#0 is used VC#1 is used Transit! Sufficient number of VCs is only TWO in 64-node FHT

15 Outline NoCs’ topologies –Mesh, Torus –H-Trees, Fat Trees Fat H-Tree (FHT) –Structure –2-D layout –Routing algorithm (DTR) Evaluations of FHT –Network logic area –Energy consumption –Throughput

16 Ideal throughput: Channel bisection Bandwidth of FHT is much improved by the torus structure N=16N=64N=256 HT4444 FT181632 FT2163264 FHT244072 Mesh81632 Torus163264 FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) due to torus due to two H-Trees

17 Number of routers Router count of FHT is less than Fat Tree(2,4,2) N=16N=64N=256 HT52185 FT1628120 FT21256240 FHT1042170 Mesh1664256 Torus1664256 FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) Note number of NI is not considered. FHT requires 2-port NIs for red & black

18 Network logic area (routers & NIs) Synthesis of NoC –16-core, 64-core –Design Compiler –0.18um CMOS Router architecture –1-flit = 32-bit –4-stage pipeline –Wormhole, 2VCs NI architecture –In: 2-flit FIFO –Out: 2-flit FIFO Crossbar Input Ports Buf Wormhole router Buf 2VCs FHT’s NI is implemented as a “router” to forward packets between trees

19 Synthesis result (64-core) Network logic area: 16/64-core Synthesis result (16-core) Network logic area of FHT is smaller than Fat Tree(2,4,2) FHT’s NI is larger than others

20 Total wire length of all links Total unit-length of links –Core router –Router router 1-unit link How many unit-links would FHT require? 1-unit = distance between neighboring cores N=16N=64N=256 HT24112480 FT1321921,024 FT2643842,048 FHT723921,800 Mesh24112480 Torus48224960 FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) Wire length of FHT is almost the same as Fat Tree(2,4,2)

21 Energy: NoC’s energy model Ave. flit energy –Send 1-flit to dest. –How much energy[J] ? Parameters –12mm square chip –16/64-core –0.18um CMOS Switching energy –1-bit switching @ router –Gate-level sim –1.88 [pJ / hop] –1.27 [pJ / hop] –1.45 [pJ / hop] Link energy –1-bit transfer @ link –0.67 [pJ / mm] [Wang, DATE’05] 12mm for routers for NI for NI(fht)

22 Energy consumption: 16/64-core Simulation result (16-core) Energy consumption of FHT is less than Fat Tree(2,4,2) Simulation result (64-core)

23 Throughput: Simulation environment Flit-level simulation –Throughput / latency –16/64-core Topology (routing) –Mesh, Torus (DOR) –Fat Trees (up/down) –Fat H-Tree (DTR) Traffic patterns –Uniform –BT.W –SP.W –CG.W –MG.W –IS.W Packet size16-flit (1-flit header) Buffer size1-flit per channel SwitchingWormhole # of VCs2 Latency3-cycle per 1-hop NAS Parallel Benchmark

24 FHT vs. FTs: Uniform (16/64-core) FHT (DTR) Fat Tree(2,4,2) Fat Tree(2,4,1) FHT outperforms FT2 in 16-core,but it doesn’t in 64-core Uniform (16-core)Uniform (64-core) FHT(DTR) causes congestion around root of trees

25 FHT vs. FTs: BT (16/64-core) BT has neighboring communications. Advantage for FHT(DTR) BT traffic (64-core) FHT (DTR) Fat Tree(2,4,2) Fat Tree(2,4,1) FHT(DTR) doesn’t cause congestion around roots BT traffic (16-core)

26 FHT vs. FTs: MG (16/64-core) Performance is … FHT(DTR) > FT2 > FT1 MG traffic (16-core)MG traffic (64-core) FHT (DTR) Fat Tree(2,4,2) Fat Tree(2,4,1)

27 Summary: Evaluations of FHT Performance –FHT outperforms Fat Tree (FT2), except for uniform Network logic area –FHT requires 20.5%-28.1% smaller area than FT2 Energy consumption –FHT requires 6.7%-7.0% less energy than FT2 Wire length –Wire length of FHT is almost the same as FT2 Ongoing works –Evaluation in 90nm CMOS –3-D layout of FHT for 3-D NoCs wafer (stacked ICs)

28 Thank you for your attention

29 Feasibility of Fat H-Tree Total wire length –Slightly longer than Fat Trees –But a lot of wire resources are available on-chip Wire delay –Length of the longest wire is same as Fat Trees Fat Tree (2,4,1)Fat H-Tree If Fat Trees are feasible, Fat H-Tree can be implemented with smaller area but higher performance

30 Routings for FHT: Torus routing(TOR) Single tree (STR) –Select a single tree per packet –Can’t transit trees Dual tree (DTR) –Transit trees for minimal paths –VCs are needed Torus routing (TOR) –Use torus formed with rank1 & cores –VCs are needed Fat H-Tree’s torus structure Can’t use rank-2 or upper routers To avoid congestion around roots, but non-minimal paths

31 FHT vs. Torus: Uniform (16/64-core) FHT (DTR): FHT (TOR): 2-D Torus 2-D Mesh Minimum routing using links around roots Using torus structure (can’t use links around roots) Uniform (64-core) FHT achieves torus-level throughput using only torus structure Uniform (16-core)

32 Number of VCs in Dual Tree Routing # of VCs required is –H_max is the longest hop count in the network E.g., –16-core FHT requires 2VCs –64-core FHT requires 2VCs –… VC# is increased when a packet transits red to black Two VCs is not so costly…

33 NIs in Fat H-Tree Implemented as a “simplified router” –Connecting red & black trees Routing @ NI is simple –Forward packets to another tree if dst is not me Processing Core Crossbar for red treefor black tree Fat H-Tree

34 Synthesis result (64-core) Network logic area: 16/64-core Synthesis result (16-core) Network logic area of FHT is smaller than Fat Tree(2,4,2) FHT’s NI is larger than others

35

36 Fat H-Tree –Minimum routing (DTR) routing N=16N=64N=256 FT up/down 3.605.437.36 FHTDTR3.204.846.78 MeshDOR2.675.3310.67 TorusDOR2.134.068.03 FHT offers shorter average hop count than Fat Trees Average hop count FT: Fat Trees

37 Wire length of links Case studies –16-core (1-unit = 3.0mm) –64-core (1-unit = 1.5mm) 1-unit = 3mm Utilization rate of wire resources in 2 metal layers (%) 1-unit = 1.5mm Flit-width = 32-bit @ 12mm square chip 12mm N=16N=64 HT1.6%3.7% FT12.1%6.4% FT24.3%12.8% FHT4.8%13.1% Mesh1.6%3.7% Torus3.2%7.5% Wire length of FHT is almost the same as Fat Tree(2,4,2)

38 Routings for FHT: Single tree (STR) Single tree (STR) –Select a single tree per packet –Can’t transit trees Dual tree (DTR) –Transit trees for minimal paths –VCs are needed Torus routing (TOR) –Use torus formed with rank1 & cores –VCs are needed Case 1: red tree  6-hop Case 2: black tree  4-hop

39 Routings for FHT: Dual tree (DTR) Single tree (STR) –Select a single tree per packet –Can’t transit trees Dual tree (DTR) –Transit trees for minimal paths –VCs are needed Torus routing (TOR) –Use torus formed with rank1 & cores –VCs are needed Firstly red is used Then black is used # of VC is increased when a packet transits red to black

40 Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) [Yamada, EUC’04] Combining two H-Trees (red & black) RouterCoreRouterCore Both edges are connected (folded) By shifting and folding black tree, the connection pattern of trees are different from original Fat Trees


Download ppt "Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi."

Similar presentations


Ads by Google