Tightly-Coupled Multi-Layer Topologies for 3D NoCs Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN)

Slides:



Advertisements
Similar presentations
QuT: A Low-Power Optical Network-on-chip
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic.
CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips CiprianSeiculescu Stavros Volos Naser Khosro Pour Babak Falsafi Giovanni De Micheli.
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
L2 to Off-Chip Memory Interconnects for CMPs Presented by Allen Lee CS258 Spring 2008 May 14, 2008.
NUMA Mult. CSE 471 Aut 011 Interconnection Networks for Multiprocessors Buses have limitations for scalability: –Physical (number of devices that can be.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
Issues in System-Level Direct Networks Jason D. Bakos.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Diamonds are a Memory Controller’s Best Friend* *Also known as: Achieving Predictable Performance through Better Memory Controller Placement in Many-Core.
Switching, routing, and flow control in interconnection networks.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
Interconnect Networks
On-Chip Networks and Testing
A Vertical Bubble Flow Network using Inductive-Coupling for 3D CMPs
Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.
Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Adding Slow-Silent Virtual Channels for Low-Power On-Chip Networks Hiroki Matsutani (Keio Univ, Japan) Michihiro Koibuchi (NII, Japan) Daihan Wang (Keio.
R OUTE P ACKETS, N OT W IRES : O N -C HIP I NTERCONNECTION N ETWORKS Veronica Eyo Sharvari Joshi.
Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.
Three-Dimensional Layout of On-Chip Tree-Based Networks Hiroki Matsutani (Keio Univ, Japan) Michihiro Koibuchi (NII, Japan) D. Frank Hsu (Fordham Univ,
Networks-on-Chips (NoCs) Basics
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Network-on-Chip Introduction Axel Jantsch / Ingo Sander
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off.
Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Jose Miguel Montanana (NII, Japan) Michihiro Koibuchi (NII, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) Stabilizing.
Runtime Power Gating of On-Chip Routers Using Look-Ahead Routing
University of Michigan, Ann Arbor
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Yu Cai Ken Mai Onur Mutlu
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Topology-aware QOS Support in Highly Integrated CMPs Boris Grot (UT-Austin) Stephen W. Keckler (NVIDIA/UT-Austin) Onur Mutlu (CMU) WIOSCA '10.
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 
Lecture 23: Interconnection Networks
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Exploring Concentration and Channel Slicing in On-chip Network Router
Azeddien M. Sllame, Amani Hasan Abdelkader
Lecture 23: Router Design
Israel Cidon, Ran Ginosar and Avinoam Kolodny
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
Embedded Computer Architecture 5SAI0 Interconnection Networks
Presentation transcript:

Tightly-Coupled Multi-Layer Topologies for 3D NoCs Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN)

Outline Network-on-Chip (NoC) –Typical 2D topologies –2D vs. 3D XNoTs –New class of 3D topologies –Definition, Examples –Deadlock-free routing Evaluations –Throughput –Area, Energy consumption

Network-on-Chip (NoC) Tile architectures –MIT RAW –Texas U. TRIPS –Intel 80-tile NoC Various topologies –Mesh, Torus, Tree –Large impact on energy, cost, and performance [Vangal, ISSCC’07] [Buger, Computer’04] [Taylor, Micro’02] An example of tile architecture (ASPLA 90nm CMOS process) Tile = Processing core + On-chip router Packet switched network

2D Topologies: Mesh & Torus RouterCore 2-D Mesh2-D Torus –2x bandwidth of mesh RAW [Taylor, IEEE Micro’02]

2D Topologies: Fat Tree Fat Tree (p, q, c) p: # of upward links q: # of downward links c: # of core ports RouterCore Fat Tree (2,4,2)Fat Tree (2,4,1) Network topology should be carefully selected so as to meet the requirements of application

2D NoC vs. 3D NoC 2D NoCs –Long wires, distance –Wire delay –Packets consume power at links according to their wire length 3D NoCs –Several small wafers or dices are stacked Vertical link –Micro bump –Through-wafer via –Very short (10-50um) [Ezaki, ISSCC’04] [Burns, ISSCC’01] Long horizontal wires in 2D NoCs can be replaced by very short vertical links in 3D NoCs

3D NoCs that have heterogeneous tiers Different circuits on each tier Different topologies on each tier Processor array Cache memory Custom logic Tier-1 Tier-2 Tier-3 Fat Tree(2,4,1) Ring 2D-Mesh (*) A tier refers a wafer or a die in 3D ICs How to connect different planar topologies? How to route packets in heterogeneous 3D NoCs? How to connect different planar topologies? How to route packets in heterogeneous 3D NoCs? We propose a class of topology for heterogeneous 3D NoCs

Outline Network-on-Chip (NoC) –Typical 2D topologies –2D vs. 3D XNoTs –New class of 3D topologies –Definition, Examples –Deadlock-free routing Evaluations –Throughput –Area, Energy consumption Multiple network layers are tightly connected by vertical crossbar switches

Existing vertical link designs Vertical bus Merit –Small # of vertical link Demerit –Low peak performance Vertical crossbar Merit –Similar performance to true crossbar –Reasonable # of vertical links [Li, ISCA’06] [Kim, ISCA’07] We assume to use crossbar-based vertical link for 3D NoCs Single bus (only a single transfer at the same time) Segmented buses (multiple transfers at the same time)

XNoTs: Xbar-connected Network-on-Tiers XNoTs: –Multiple planar topologies –Connected by crossbars Network-on-Tier (NoT) –A planar topology –Implemented on a tier –Bottom NoT provides connectivity to all cores Network-on-Tier XNoTs A mesh-based NoT Each core and router have a port for a vertical connection RouterCore

XNoTs: Xbar-connected Network-on-Tiers XNoTs: –Multiple planar topologies –Connected by crossbars Network-on-Tier (NoT) –A planar topology –Implemented on a tier –Bottom NoT provides connectivity to all cores A mesh-based NoT RouterCore A mesh-based XNoTs All routers and cores in a same pillar are connected by a crossbar Vertical crossbar pillar

Examples: all tiers have same topology Mesh-based XNoTs Ring-based XNoTsTree-based XNoTs

Side view Mesh-based XNoTs Ring-based XNoTsTree-based XNoTs All routers and cores in a same pillar are connected by a crossbar Examples: all tiers have same topology

Examples: Heterogeneous XNoTs (1) Different topologies are used in each tier Fat Tree(2,4,1) Ring 2D-Mesh

Examples: Heterogeneous XNoTs (1) Side view Fat Tree(2,4,1) Ring 2D-Mesh Different topologies are used in each tier

Packets are transferred via bottom tier (tier-1) No connectivity Examples: Heterogeneous XNoTs (2) All tiers cannot provide connectivity to all cores –Except for the bottom tier (i.e., “escape” tier) Bottom tier (Full connectivity to all cores) Top tier (Some links are disconnected) (*) Only the bottom tier must provide full connectivity to all cores

Examples: Heterogeneous XNoTs (2) All tiers cannot provide connectivity to all cores –Except for the bottom tier (i.e., “escape” tier) Packets are transferred via bottom tier (tier-1) Bottom tier (Full connectivity to all cores) Top tier (Some links are disconnected) (*) Only the bottom tier must provide full connectivity to all cores

XNoTs: Deadlock-free routing Intra-tier comm. (X and Y directions) –Existing deadlock-free routing is used within a tier –Only tier-0 must guarantee connectivity to all cores Inter-tier comm. (Z direction) –Turns from lower-tier to higher-tier are prohibited –Unless the next hop is final destination Top viewSide viewMesh based XNoTs E.g., dimension-order routing (DOR) OK!NG!

XNoTs: Path selection (random) XNoTs routing – Multiple tiers are available  Alternative paths are available Path selection policy –How to select a single path? –Random selection  Good load balancing 5-hop Top viewSide viewMesh based XNoTs We also proposed some policy based path selection policies. For more detail, please refer to the paper.

Outline Network-on-Chip (NoC) –Typical 2D topologies –2D vs. 3D XNoTs –New class of 3D topologies –Definition, Examples –Deadlock-free routing Evaluations –Throughput –Area, Energy consumption

Evaluation: Target topologies (64- core) X-Mesh –(4x4 Mesh) x 4 layers X-Torus –(4x4 Torus) x 4 layers X-FT141 –Fat Tree(1,4,1) x 4 layers X-FT241 –Fat Tree(2,4,1) x 4 layers X-FT441 –Fat Tree(4,4,1) x 4 layers X-Mesh p: # of upward links q: # of downward links c: # of core ports Fat Tree (p, q, c) These five topologies are compares with 3D Mesh/Torus

Throughput: Simulation environment Grid-based topologies –3D-Mesh, X-Mesh –3D-Torus, X-Torus –Dimension-order routing Tree-based topologies –X-FT141, X-FT241 –X-FT441 –Up*/down* routing Path selection policy –Random Packet size16-flit (1-flit header) Buffer size1-flit per channel SwitchingWormhole switching Latency3-cycle per 1-hop TrafficUniform random (Two virtual channels for tori) X-Mesh (4x4x4)

Throughput: Simulation results X-Torus X-Mesh X-FT441 X-FT241 X-FT141 Grid-based XNoTs Tree-based XNoTs No degradation (X-Mesh = 3D-Mesh, X-Torus = 3D-Torus) 3D-Torus 3D-Mesh 3D-Torus 3D-Mesh

Network logic area Network area –Routers & NIs –Inter-tier vias Synthesis of NoC –64-core (16-core x 4) –0.18um CMOS Router architecture –1-flit = 32-bit –Wormhole switching –4-stage pipeline Inter-tier vias –1-10um square –25um per layer per 1-bit signal [Li, ISCA’06] [Burns, ISSCC’01] 2 Inter-tier via area is calculated according to # of vertical links CrossbarInput Ports Buf Arbiter Typical wormhole router [Matsutani, IPDPS’07]

Network logic area: Results Network logic area [mm ] 3D Mesh/Torus require 2-port for vertical (i.e., up & down) XNoTs require only 1-port for vertical (but # of xbar increases) 2 Synthesis of NoC –64-core (16-core x 4) –0.18um CMOS Router architecture –1-flit = 32-bit –Wormhole switching –4-stage pipeline Inter-tier vias –1-10um square –25um per layer per 1-bit signal [Li, ISCA’06] [Burns, ISSCC’01] 2 Inter-tier via area is calculated according to # of vertical links

Energy: NoC’s energy model Ave. flit energy –Send 1-flit to dest. –How much energy[J] ? Parameters –6mm square chip –64-core (16-core x 4) –0.18um CMOS Switching energy –1-bit Router –Gate-level sim –1.13 [pJ / hop] Link energy –1-bit Link –0.67 [pJ / mm] Via energy –4.34 [fF / via] 6mm [Davis, DToC’05]

Energy: Simulation results Parameters –6mm square chip –64-core (16-core x 4) –0.18um CMOS Switching energy –1-bit Router –Gate-level sim –1.13 [pJ / hop] Link energy –1-bit Link –0.67 [pJ / mm] Via energy –4.34 [fF / via] [Davis, DToC’05] Ave. Flit energy [pJ] Hop count is short in XNoTs  low power

Summary: 3D topologies - XNoTs Requirements –Different circuits on each layer –Different topologies on each layer –How to connect/route them? XNoTs –Tiers are connected by crossbars –Arbitrary tiers can be stacked Current problem / future work –We assumed full crossbar as a baseline –More efficient implementation has been proposed by –We must revise router architecture [Kim, ISCA’07] Fat Tree Ring 2D-Mesh

Thank you for your attention

XNoTs: Path selection (QoS) Control packets –In-order delivery is required Data packets –In-order delivery is not required –Large data streams XNoTs (Side view) Dimension-order (deterministic) Duato’s Protocol (adaptive) Control packets use tier-1 Deterministic routing Adaptive routing

XNoTs: Path selection (QoS) Control packets –In-order delivery is required Data packets –In-order delivery is not required –Large data streams Deterministic routing Adaptive routing Dimension-order (deterministic) Duato’s Protocol (adaptive) XNoTs (Side view) Various QoS controls are possible by path selection algorithm Data packets use tier-2 or tier-3

XNoTs: Path selection (bottom first) Heat dissipation is crucial in 3D ICs Bottom tier –Close to the board (good heat dissipation property) Bottom tier first –Tier-0 is firstly used if there are alternative paths XNoTs (Side View) board as heat-sink 3D IC Bottom tier

Ideal throughput: Channel bisection Number of unidirectional links that cross bisection N-core × n-tier1-tier2-tier4-tier X-Mesh81632 X-Torus X-FT X-FT X-FT D-Mesh D-Torus No degradation (X-Mesh = 3D-Mesh, X-Torus = 3D-Torus)

3D Topologies: 3D-Mesh 3D-Mesh (4x4x4=64) Average hop count: 5.33 Channel bisection: 16 Number of routers: 64 Node degree: 5 Average hop count: 4.00 Channel bisection: 32 Number of routers: 64 Node degree: 7 2D-Mesh (8x8=64) Tier-0 Tier-1 Tier-2 Tier-3