Three-Dimensional Layout of On-Chip Tree-Based Networks Hiroki Matsutani (Keio Univ, Japan) Michihiro Koibuchi (NII, Japan) D. Frank Hsu (Fordham Univ,

Slides:



Advertisements
Similar presentations
Washington State University
Advertisements

A Case for Wireless 3D NoCs for CMPs Hiroki Matsutani (1), Paul Bogdan (2), Radu Marculescu (2), Yasuhiro Take (1), Daisuke Sasaki (1), Hao Zhang (1),
Review of Topology and Access Techniques / Switching Concepts BSAD 141 Dave Novak Sources: Network+ Guide to Networks, Dean 2013.
A Novel 3D Layer-Multiplexed On-Chip Network
Ultra Fine-Grained Run-Time Power Gating of On-Chip Routers for CMPs
A Multi-Vdd Dynamic Variable-Pipeline On-Chip Router for CMPs
CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips CiprianSeiculescu Stavros Volos Naser Khosro Pour Babak Falsafi Giovanni De Micheli.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
Module R R RRR R RRRRR RR R R R R Efficient Link Capacity and QoS Design for Wormhole Network-on-Chip Zvika Guz, Isask ’ har Walter, Evgeny Bolotin, Israel.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
Network-on-Chip Examples System-on-Chip Group, CSE-IMM, DTU.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
Interconnection and Packaging in IBM Blue Gene/L Yi Zhu Feb 12, 2007.
1 25\10\2010 Unit-V Connecting LANs Unit – 5 Connecting DevicesConnecting Devices Backbone NetworksBackbone Networks Virtual LANsVirtual LANs.
Layer-3 Routing Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
SYNTHESIS OF NETWORKS ON CHIPS FOR 3D SYSTEMS ON CHIPS Srinivasan Murali, Ciprian Seiculescu, Luca Benini, Giovanni De Micheli Presented by Puqing Wu.
Interconnection Networks: Introduction
Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Tightly-Coupled Multi-Layer Topologies for 3D NoCs Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN)
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of Wisconsin-Madison 12/3/03.
Blue Gene / C Cellular architecture 64-bit Cyclops64 chip: –500 Mhz –80 processors ( each has 2 thread units and a FP unit) Software –Cyclops64 exposes.
Interconnect Networks
On-Chip Networks and Testing
A Vertical Bubble Flow Network using Inductive-Coupling for 3D CMPs
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
Adding Slow-Silent Virtual Channels for Low-Power On-Chip Networks Hiroki Matsutani (Keio Univ, Japan) Michihiro Koibuchi (NII, Japan) Daihan Wang (Keio.
Data Comm. & Networks Instructor: Ibrahim Tariq Lecture 3.
R OUTE P ACKETS, N OT W IRES : O N -C HIP I NTERCONNECTION N ETWORKS Veronica Eyo Sharvari Joshi.
Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.
Elastic-Buffer Flow-Control for On-Chip Networks
Networks-on-Chips (NoCs) Basics
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), th Annual International Symposium on.
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
Express Cube Topologies for On-chip Interconnects Boris Grot J. Hestness, S. W. Keckler, O. Mutlu † The University of Texas at Austin † Carnegie Mellon.
Shanghai Jiao Tong University 2012 Indirect Networks or Dynamic Networks Guihai Chen …with major presentation contribution from José Flich, UPV (and Cell.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off.
Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano.
10/03/2005: 1 Physical Synthesis of Latency Aware Low Power NoC Through Topology Exploration and Wire Style Optimization CK Cheng CSE Department UC San.
Jose Miguel Montanana (NII, Japan) Michihiro Koibuchi (NII, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) Stabilizing.
Runtime Power Gating of On-Chip Routers Using Look-Ahead Routing
University of Michigan, Ann Arbor
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
KAIS T On the problem of placing Mobility Anchor Points in Wireless Mesh Networks Lei Wu & Bjorn Lanfeldt, Wireless Mesh Community Networks Workshop, 2006.
Yu Cai Ken Mai Onur Mutlu
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Local-Area Networks. Topology Defines the Structure of the Network – Physical topology – actual layout of the wire (media) – Logical topology – defines.
1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Raw Status Update Chips & Fabrics James Psota M.I.T. Computer Architecture Workshop 9/19/03.
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
Design Space Exploration for NoC Topologies ECE757 6 th May 2009 By Amit Kumar, Kanchan Damle, Muhammad Shoaib Bin Altaf, Janaki K.M Jillella Course Instructor:
C. Murad Özsert Intel's Tera Scale Processor Architecture.
Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 
Lecture 23: Interconnection Networks
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Deadlock Free Hardware Router with Dynamic Arbiter
Israel Cidon, Ran Ginosar and Avinoam Kolodny
Indirect Networks or Dynamic Networks
Presentation transcript:

Three-Dimensional Layout of On-Chip Tree-Based Networks Hiroki Matsutani (Keio Univ, Japan) Michihiro Koibuchi (NII, Japan) D. Frank Hsu (Fordham Univ, USA) Hideharu Amano (Keio Univ, Japan)

Outline Introduction –Network-on-Chip (NoC) –2-D vs. 3-D Fat Tree –2-D layout –3-D layout Fat H-Tree –2-D layout –3-D layout Evaluations –Area, Wire length, Energy [Matsutani, IPDPS’07]

Network-on-Chip (NoC) Tile architectures –MIT RAW –Texas U. TRIPS –Intel 80-tile NoC Various topologies –Mesh, Torus –Fat Trees –Fat H-Tree (FHT) [Vangal, ISSCC’07] [Buger, Computer’04] [Taylor, Micro’02] 16-core Tile architecture Tile (core & router) Packet switched network on a chip We proposed FHT as an alternative to Fat Trees [Matsutani, IPDPS’07]

2D Topologies: Mesh & Torus RouterCore 2-D Mesh2-D Torus –2x bandwidth of mesh RAW [Taylor, IEEE Micro’02]

2D Topologies: Fat Tree Fat Tree (p, q, c) p: # of upward links q: # of downward links c: # of core ports RouterCore Fat Tree (2,4,2)Fat Tree (2,4,1) In this talk, we focus on 3-D layout scheme of tree-based topologies Rank-1 Rank-2

2D NoC vs. 3D NoC 2D NoCs –Long wires (esp. trees) –Wire delay –Packets consume power at links according to their wire length 3D NoCs –Several small wafers or dices are stacked Vertical link –Micro bump –Through-wafer via –Very short (10-50um) [Ezaki, ISSCC’04] [Burns, ISSCC’01] Long horizontal wires in 2D NoCs can be replaced by very short vertical links in 3D NoCs Next slides show the 3D layout scheme of Fat Tree and FHT

Outline Introduction –Network-on-Chip (NoC) –2-D vs. 3-D Fat Tree –2-D layout –3-D layout Fat H-Tree –2-D layout –3-D layout Evaluations –Area, Wire length, Energy [Matsutani, IPDPS’07]

Fat Tree: 2-D layout Fat Tree (p, q, c) p: # of upward links q: # of downward links c: # of core ports RouterCore Fat Tree (2,4,2)Fat Tree (2,4,1) We preliminarily show the 3D layout scheme of Fat Trees

Fat Tree: 3-D layout (4-split) 2-D coordinates3-D coordinates Original 2-D layout transformation Dividing into 4 layers Top-rank routers are distributed to each layer Layer-0Layer-1 Layer-2Layer-3

Original 2-D layout Fat Tree: 3-D layout (4-split) Top-rank links are replaced with vertical interconnects (10-50um) 2-D coordinates3-D coordinates transformation 3-D layout (4-stacked) This 3-D layout is evaluated in terms of area, wire, & energy Layer-0

Outline Introduction –Network-on-Chip (NoC) –2-D vs. 3-D Fat Tree –2-D layout –3-D layout Fat H-Tree –2-D layout –3-D layout Evaluations –Area, Wire length, Power [Matsutani, IPDPS’07]

Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) [Matsutani, IPDPS’07] Combining two H-Trees (red & black) RouterCoreRouterCore Location of black tree is shifted lower-right direction of red tree By shifting the location of black tree, the connection pattern of trees is different from the original Fat Trees

[Matsutani, IPDPS’07] Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) Combining two H-Trees (red & black) RouterCoreRouterCore Fat H-Tree is formed on red & black trees

[Matsutani, IPDPS’07] Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) Combining two H-Trees (red & black) RouterCoreRouterCore Fat H-Tree is formed on red & black trees

[Matsutani, IPDPS’07] Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) Combining two H-Trees (red & black) RouterCoreRouterCore Fat H-Tree is formed on red & black trees

[Matsutani, IPDPS’07] Fat H-Tree: Structure Fat H-Tree –Red Tree (H-Tree) –Black Tree (H-Tree) Combining two H-Trees (red & black) RouterCoreRouterCore Rank-2 or upper routers are omitted in this figure Each core is connected to both red & black trees Ring is formed with cores & rank1 routers Torus-level performance by combing only two H-Trees

Fat H-Tree: 2-D layout on VLSI Fat H-Tree –Torus structure  Folded as well as the folded layout of 2-D Torus Fat H-Tree’s 2-D layout RouterCore Topologically equivalent (Long feedback links across the chip) [Matsutani, IPDPS’07] The next slides propose the 3D layout scheme of Fat H-Tree

Fat H-Tree: 3-D layout (overview) Fat H-Tree –(Problem) Fat H-Tree has a torus structure –Folding so as to keep the torus structure (step 1) fold it horizontally (step 2) fold it vertically consisting of red & black trees Until the # of folded pieces meets the # of layers the 3-D IC has E.g., four layers  fold twice

Fat H-Tree: 3-D layout (overview) Fat H-Tree –(Problem) Fat H-Tree has a torus structure –Folding so as to keep the torus structure consisting of red & black trees (step 1) fold it horizontally (step 2) fold it vertically Until the # of folded pieces meets the # of layers the 3-D IC has E.g., four layers  fold twice

Fat H-Tree: 3-D layout (overview) Here we show the 3D layouts of red & black trees separately Fat H-Tree –(Problem) Fat H-Tree has a torus structure –Folding so as to keep the torus structure consisting of red & black trees (step 1) fold it horizontally (step 2) fold it vertically Until the # of folded pieces meets the # of layers the 3-D IC has E.g., four layers  fold twice

Fat H-Tree: 3-D (Red tree; 4-split) 2-D coordinates3-D coordinates transformation Original 2-D layout 3-D layout (4-stacked) Layer-0Layer-1 Layer-2Layer-3

Fat H-Tree: 3-D (Red tree; 4-split) 2-D coordinates3-D coordinates transformation Original 2-D layout 3-D layout (4-stacked) Top-rank links are replaced with vertical interconnects (10-50um) Layer-0

Fat H-Tree: 3-D (Black tree;4-split) Original 2-D layout 3-D layout (4-stacked) 2-D coordinates3-D coordinates transformation Layer-0Layer-1 Layer-2Layer-3 They can be connected via only a vertical link

Fat H-Tree: 3-D (Black tree;4-split) Original 2-D layout 3-D layout (4-stacked) The periphery cores are connected to different layers 2-D coordinates3-D coordinates transformation

Fat H-Tree: 3-D (Black tree;4-split) 2-D coordinates3-D coordinates transformation Original 2-D layout 3-D layout (4-stacked) Top-rank links are replaced with vertical interconnects (10-50um) The periphery cores are connected to different layers Layer-0

Fat H-Tree: 3-D layout (4-split) Red tree (3-D) Layer-0 Black tree (3-D) Fat H-Tree (3-D) Layer-0 The 3-D layout of Fat H-Tree can be formed by superimposing 3-D layouts of red & black trees

Outline Introduction –Network-on-Chip (NoC) –2-D vs. 3-D Fat Tree –2-D layout –3-D layout Fat H-Tree –2-D layout –3-D layout Evaluations –Area, Wire length, Energy [Matsutani, IPDPS’07]

Evaluations: 2-D vs. 3-D 2-D layout –64-core 3-D layout –16-core x 4-layer –Vertical interconnects L mm L/2 mm

Network logic area: # of routers N=N=16N=64N=256 FT FT FHT Dmesh Dtorus # of routers & their ports in trees are less than mesh/torus 3-D mesh/torus: node degree 7 Fat H-Tree: node degree 5 Fat Tree (2,4,2): node degree 6 FT1: Fat tree(2,4,1) FT2: Fat tree(2,4,2) FHT: Fat H-Tree

Network logic area: 2-D vs. 3-D [Davis, DToC’05] Wormhole router –1-flit = 64-bit –3-stage pipeline Network interface –FIFO buffer –Packet forwarding (Fat H-Tree only) Inter-wafer via –1-10um square –100um per layer per 1-bit signal 2 Inter-wafer via area is calculated according to # of vertical links Network logic area –Routers, NIs –Inter-wafer vias Arbiter 5x5 XBAR FIFO Typical wormhole router Synthesized with a 90nm CMOS [Matsutani, ASPDAC’08]

Network logic area: Overhead of 3D Synthesis result of 64-core (16-core x 4) FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) FHT: Fat H-Tree 3D layout of trees  area overheat is modest (at most 7.8%) 3D torus 2D torus Inter-wafer via area (+7.8%)

Total wire length of all links Total unit-length of links –Core router –Router router 1-unit link How many unit-links is required ? 1-unit = distance between neighboring cores

Total wire length of all links FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) FHT: Fat H-Tree N=N=16N=64N=256 2D FT ,024 2D FT ,048 2D FHT723921,800 2Dmesh Dtorus unit

Total wire length of all links N=N=16N=64N=256 2D FT ,024 2D FT ,048 2D FHT723921,800 2Dmesh Dtorus unit N=N=16N=64N=256 3D FT D FT ,536 3D FHT Dmesh Dtorus unit 4-stacked FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) FHT: Fat H-Tree Wire length of trees is reduced by 25%-50% (close to torus)

Energy: NoC’s energy model Ave. flit energy –Send 1-flit to dest. –How much energy[J] ? Parameters –8mm square chip –64-core (16-core x 4) –90nm CMOS Switching energy –1-bit Router –Gate-level sim –0.183 [pJ / hop] Link energy –1-bit Link –0.150 [pJ / mm] Via energy –4.34 [fF / via] 8mm [Davis, DToC’05]

Energy: Reduction by going 3D Frequent use of longest links Short hop count  less energy FT1: Fat tree(2,4,1) FT2: Fat tree(2,4,2) FHT: Fat H-Tree 2-D layout

Energy: Reduction by going 3D 2-D layout 3-D layout Moving distance of packets is reduced The 3D layout of trees reduces the energy by 30.8%-42.9% FT1: Fat tree(2,4,1) FT2: Fat tree(2,4,2) FHT: Fat H-Tree

Summary: 3-D layout of trees Drawbacks of on-chip tree-based topologies –Long links around the root of tree –Wire delay problem –Repeater insertion  additional energy consumption 3-D layout schemes of Fat Trees & Fat H-Tree –Wire length is reduced by 25%-50% –Area overhead is at most 7.8% –Flit transmission energy is reduced by 30.8%-42.9% Need to consider negative impacts of 3-D (cost,heat,yield…) In addition, energy-hungry repeater buffers can be removed

Thank you for your attention

Backup slides

Energy: Reduction by going 3D 2-D layout (w/o repeaters) 2-D layout (with repeaters) (*) Repeater insertion model: N. Weste et.al, “CMOS VLSI Design (3rd ed)”, (*) Energy is increased FT1: Fat tree(2,4,1) FT2: Fat tree(2,4,2) FHT: Fat H-Tree