Download presentation
Presentation is loading. Please wait.
Published byJared Dalton Modified over 9 years ago
1
Tightly-Coupled Multi-Layer Topologies for 3D NoCs Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN)
2
Outline Network-on-Chip (NoC) –Typical 2D topologies –2D vs. 3D XNoTs –New class of 3D topologies –Definition, Examples –Deadlock-free routing Evaluations –Throughput –Area, Energy consumption
3
Network-on-Chip (NoC) Tile architectures –MIT RAW –Texas U. TRIPS –Intel 80-tile NoC Various topologies –Mesh, Torus, Tree –Large impact on energy, cost, and performance [Vangal, ISSCC’07] [Buger, Computer’04] [Taylor, Micro’02] An example of tile architecture (ASPLA 90nm CMOS process) Tile = Processing core + On-chip router Packet switched network
4
2D Topologies: Mesh & Torus RouterCore 2-D Mesh2-D Torus –2x bandwidth of mesh RAW [Taylor, IEEE Micro’02]
5
2D Topologies: Fat Tree Fat Tree (p, q, c) p: # of upward links q: # of downward links c: # of core ports RouterCore Fat Tree (2,4,2)Fat Tree (2,4,1) Network topology should be carefully selected so as to meet the requirements of application
6
2D NoC vs. 3D NoC 2D NoCs –Long wires, distance –Wire delay –Packets consume power at links according to their wire length 3D NoCs –Several small wafers or dices are stacked Vertical link –Micro bump –Through-wafer via –Very short (10-50um) [Ezaki, ISSCC’04] [Burns, ISSCC’01] Long horizontal wires in 2D NoCs can be replaced by very short vertical links in 3D NoCs
7
3D NoCs that have heterogeneous tiers Different circuits on each tier Different topologies on each tier Processor array Cache memory Custom logic Tier-1 Tier-2 Tier-3 Fat Tree(2,4,1) Ring 2D-Mesh (*) A tier refers a wafer or a die in 3D ICs How to connect different planar topologies? How to route packets in heterogeneous 3D NoCs? How to connect different planar topologies? How to route packets in heterogeneous 3D NoCs? We propose a class of topology for heterogeneous 3D NoCs
8
Outline Network-on-Chip (NoC) –Typical 2D topologies –2D vs. 3D XNoTs –New class of 3D topologies –Definition, Examples –Deadlock-free routing Evaluations –Throughput –Area, Energy consumption Multiple network layers are tightly connected by vertical crossbar switches
9
Existing vertical link designs Vertical bus Merit –Small # of vertical link Demerit –Low peak performance Vertical crossbar Merit –Similar performance to true crossbar –Reasonable # of vertical links [Li, ISCA’06] [Kim, ISCA’07] We assume to use crossbar-based vertical link for 3D NoCs Single bus (only a single transfer at the same time) Segmented buses (multiple transfers at the same time)
10
XNoTs: Xbar-connected Network-on-Tiers XNoTs: –Multiple planar topologies –Connected by crossbars Network-on-Tier (NoT) –A planar topology –Implemented on a tier –Bottom NoT provides connectivity to all cores Network-on-Tier XNoTs A mesh-based NoT Each core and router have a port for a vertical connection RouterCore
11
XNoTs: Xbar-connected Network-on-Tiers XNoTs: –Multiple planar topologies –Connected by crossbars Network-on-Tier (NoT) –A planar topology –Implemented on a tier –Bottom NoT provides connectivity to all cores A mesh-based NoT RouterCore A mesh-based XNoTs All routers and cores in a same pillar are connected by a crossbar Vertical crossbar pillar
12
Examples: all tiers have same topology Mesh-based XNoTs Ring-based XNoTsTree-based XNoTs
13
Side view Mesh-based XNoTs Ring-based XNoTsTree-based XNoTs All routers and cores in a same pillar are connected by a crossbar Examples: all tiers have same topology
14
Examples: Heterogeneous XNoTs (1) Different topologies are used in each tier Fat Tree(2,4,1) Ring 2D-Mesh
15
Examples: Heterogeneous XNoTs (1) Side view Fat Tree(2,4,1) Ring 2D-Mesh Different topologies are used in each tier
16
Packets are transferred via bottom tier (tier-1) No connectivity Examples: Heterogeneous XNoTs (2) All tiers cannot provide connectivity to all cores –Except for the bottom tier (i.e., “escape” tier) Bottom tier (Full connectivity to all cores) Top tier (Some links are disconnected) (*) Only the bottom tier must provide full connectivity to all cores
17
Examples: Heterogeneous XNoTs (2) All tiers cannot provide connectivity to all cores –Except for the bottom tier (i.e., “escape” tier) Packets are transferred via bottom tier (tier-1) Bottom tier (Full connectivity to all cores) Top tier (Some links are disconnected) (*) Only the bottom tier must provide full connectivity to all cores
18
XNoTs: Deadlock-free routing Intra-tier comm. (X and Y directions) –Existing deadlock-free routing is used within a tier –Only tier-0 must guarantee connectivity to all cores Inter-tier comm. (Z direction) –Turns from lower-tier to higher-tier are prohibited –Unless the next hop is final destination Top viewSide viewMesh based XNoTs E.g., dimension-order routing (DOR) OK!NG!
19
XNoTs: Path selection (random) XNoTs routing – Multiple tiers are available Alternative paths are available Path selection policy –How to select a single path? –Random selection Good load balancing 5-hop Top viewSide viewMesh based XNoTs We also proposed some policy based path selection policies. For more detail, please refer to the paper.
20
Outline Network-on-Chip (NoC) –Typical 2D topologies –2D vs. 3D XNoTs –New class of 3D topologies –Definition, Examples –Deadlock-free routing Evaluations –Throughput –Area, Energy consumption
21
Evaluation: Target topologies (64- core) X-Mesh –(4x4 Mesh) x 4 layers X-Torus –(4x4 Torus) x 4 layers X-FT141 –Fat Tree(1,4,1) x 4 layers X-FT241 –Fat Tree(2,4,1) x 4 layers X-FT441 –Fat Tree(4,4,1) x 4 layers X-Mesh p: # of upward links q: # of downward links c: # of core ports Fat Tree (p, q, c) These five topologies are compares with 3D Mesh/Torus
22
Throughput: Simulation environment Grid-based topologies –3D-Mesh, X-Mesh –3D-Torus, X-Torus –Dimension-order routing Tree-based topologies –X-FT141, X-FT241 –X-FT441 –Up*/down* routing Path selection policy –Random Packet size16-flit (1-flit header) Buffer size1-flit per channel SwitchingWormhole switching Latency3-cycle per 1-hop TrafficUniform random (Two virtual channels for tori) X-Mesh (4x4x4)
23
Throughput: Simulation results X-Torus X-Mesh X-FT441 X-FT241 X-FT141 Grid-based XNoTs Tree-based XNoTs No degradation (X-Mesh = 3D-Mesh, X-Torus = 3D-Torus) 3D-Torus 3D-Mesh 3D-Torus 3D-Mesh
24
Network logic area Network area –Routers & NIs –Inter-tier vias Synthesis of NoC –64-core (16-core x 4) –0.18um CMOS Router architecture –1-flit = 32-bit –Wormhole switching –4-stage pipeline Inter-tier vias –1-10um square –25um per layer per 1-bit signal [Li, ISCA’06] [Burns, ISSCC’01] 2 Inter-tier via area is calculated according to # of vertical links CrossbarInput Ports Buf Arbiter Typical wormhole router [Matsutani, IPDPS’07]
25
Network logic area: Results Network logic area [mm ] 3D Mesh/Torus require 2-port for vertical (i.e., up & down) XNoTs require only 1-port for vertical (but # of xbar increases) 2 Synthesis of NoC –64-core (16-core x 4) –0.18um CMOS Router architecture –1-flit = 32-bit –Wormhole switching –4-stage pipeline Inter-tier vias –1-10um square –25um per layer per 1-bit signal [Li, ISCA’06] [Burns, ISSCC’01] 2 Inter-tier via area is calculated according to # of vertical links
26
Energy: NoC’s energy model Ave. flit energy –Send 1-flit to dest. –How much energy[J] ? Parameters –6mm square chip –64-core (16-core x 4) –0.18um CMOS Switching energy –1-bit switching @ Router –Gate-level sim –1.13 [pJ / hop] Link energy –1-bit transfer @ Link –0.67 [pJ / mm] Via energy –4.34 [fF / via] 6mm [Davis, DToC’05]
27
Energy: Simulation results Parameters –6mm square chip –64-core (16-core x 4) –0.18um CMOS Switching energy –1-bit switching @ Router –Gate-level sim –1.13 [pJ / hop] Link energy –1-bit transfer @ Link –0.67 [pJ / mm] Via energy –4.34 [fF / via] [Davis, DToC’05] Ave. Flit energy [pJ] Hop count is short in XNoTs low power
28
Summary: 3D topologies - XNoTs Requirements –Different circuits on each layer –Different topologies on each layer –How to connect/route them? XNoTs –Tiers are connected by crossbars –Arbitrary tiers can be stacked Current problem / future work –We assumed full crossbar as a baseline –More efficient implementation has been proposed by –We must revise router architecture [Kim, ISCA’07] Fat Tree Ring 2D-Mesh
29
Thank you for your attention
31
XNoTs: Path selection (QoS) Control packets –In-order delivery is required Data packets –In-order delivery is not required –Large data streams XNoTs (Side view) Dimension-order (deterministic) Duato’s Protocol (adaptive) Control packets use tier-1 Deterministic routing Adaptive routing
32
XNoTs: Path selection (QoS) Control packets –In-order delivery is required Data packets –In-order delivery is not required –Large data streams Deterministic routing Adaptive routing Dimension-order (deterministic) Duato’s Protocol (adaptive) XNoTs (Side view) Various QoS controls are possible by path selection algorithm Data packets use tier-2 or tier-3
33
XNoTs: Path selection (bottom first) Heat dissipation is crucial in 3D ICs Bottom tier –Close to the board (good heat dissipation property) Bottom tier first –Tier-0 is firstly used if there are alternative paths XNoTs (Side View) board as heat-sink 3D IC Bottom tier
34
Ideal throughput: Channel bisection Number of unidirectional links that cross bisection N-core × n-tier1-tier2-tier4-tier X-Mesh81632 X-Torus163264 X-FT1414816 X-FT24181632 X-FT441163264 3D-Mesh81632 3D-Torus163264 No degradation (X-Mesh = 3D-Mesh, X-Torus = 3D-Torus)
35
3D Topologies: 3D-Mesh 3D-Mesh (4x4x4=64) Average hop count: 5.33 Channel bisection: 16 Number of routers: 64 Node degree: 5 Average hop count: 4.00 Channel bisection: 32 Number of routers: 64 Node degree: 7 2D-Mesh (8x8=64) Tier-0 Tier-1 Tier-2 Tier-3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.