Approaching Ideal NoC Latency with Pre-Configured Routes George Michelogiannakis, Dionisios Pnevmatikatos and Manolis Katevenis Institute of Computer Science.

Slides:



Advertisements
Similar presentations
A Novel 3D Layer-Multiplexed On-Chip Network
Advertisements

Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University.
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
Montek Singh COMP Nov 10,  Design questions at various leves ◦ Network Adapter design ◦ Network level: topology and routing ◦ Link level:
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.
IP I/O Memory Hard Disk Single Core IP I/O Memory Hard Disk IP Bus Multi-Core IP R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R Networks.
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
King Fahd University of Petroleum and Minerals CCSE – COESOC 2006 – Tampere, Nov. 2006Abdelhafid Bouhraoua A High Throughput Network-on-Chip Architecture.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
Dynamic NoC. 2 Limitations of Fixed NoC Communication NoC for reconfigurable devices:  NOC: a viable infrastructure for communication among task dynamically.
Issues in System-Level Direct Networks Jason D. Bakos.
The importance of switching in communication The cost of switching is high Definition: Transfer input sample points to the correct output ports at the.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Switching, routing, and flow control in interconnection networks.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
On-FPGA Communication Architectures
On-Chip Networks and Testing
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Elastic-Buffer Flow-Control for On-Chip Networks
Networks-on-Chips (NoCs) Basics
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
The Alpha Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
NC2 (No.4) 1 Undeliverable packets & solutions Deadlock: packets are unable to progress –Prevention, avoidance, recovery Livelock: packets cannot reach.
Variable Packet Size Buffered Crossbar (CICQ) Switches Manolis Katevenis, Georgios Passas, Dimitrios Simos, Ioannis Papaefstathiou, and Nikos Chrysos FORTH.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.
Runtime Power Gating of On-Chip Routers Using Look-Ahead Routing
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
SARC Proprietary and Confidential Processor-to-Memory-Blocks NoC with Pre-Configured (but run-time reconfigurable) Low-Latency Routes G. Mihelogiannakis,
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
® Virtex-E Extended Memory Technical Overview and Applications.
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Packets & Routing Lower OSI layers (1-3) concerned with packets and the network Packets carry data independently through the network, and into other networks…
Advanced Computer Networks
ESE532: System-on-a-Chip Architecture
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Azeddien M. Sllame, Amani Hasan Abdelkader
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Approaching Ideal NoC Latency with Pre-Configured Routes
Switching, routing, and flow control in interconnection networks
Using Packet Information for Efficient Communication in NoCs
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Abdelhafid Bouhraoua and M.E.S El-Rabaa
Network-on-Chip Programmable Platform in Versal™ ACAP Architecture
CS 6290 Many-core & Interconnect
CS 258 Parallel Computer Architecture Lecture 5 Routing (Con’t)
Switching, routing, and flow control in interconnection networks
Multiprocessors and Multi-computers
Presentation transcript:

Approaching Ideal NoC Latency with Pre-Configured Routes George Michelogiannakis, Dionisios Pnevmatikatos and Manolis Katevenis Institute of Computer Science (ICS) Foundation for Research & Technology - Hellas (FORTH) P.O.Box 1385, Heraklion, Crete, GR GREECE

May 2007ICS-FORTH, Greece2 Introduction Problem: Latency NoCs impose. Motivation: Latency introduced to every communication pair. Past work: Achieves 1 cycle/hop at 500 MHz. We extend speculation to routing decisions. Goal: Approach buffered wire latency. Fraction of cycle/hop.

May 2007ICS-FORTH, Greece3 Our Approach 400 ps good scenario; 1 cycle otherwise. 130 nm library

May 2007ICS-FORTH, Greece4 Preferred Paths Each output has one preferred input. This pref. I/O pair is connected by a single pre-enabled tri-state driver. Pre-enabling is crucial: 200 ps pre-enabled mux; 500 ps otherwise. Later check if flits correctly forwarded. Thus, preferred paths are formed. Reconfigurable at run-time. Custom routes (shapes) allowed.

May 2007ICS-FORTH, Greece5 Switch Architecture - Output 400 ps 1 cycle Input FIFOs. Selectable when non-empty, or flit to be enqueued. Pref. path pre- enabled tri-states. Routing logic tri-state. Config. & arbitration logic. Stores pref. path config. & arbitrates.

May 2007ICS-FORTH, Greece6 Switch Architecture - Input Dead flits: Incorrectly eagerly forwarded. Terminated at end of preferred path. Switch resembles a buffered crossbar. Decides if flit needs to be enqueued.

May 2007ICS-FORTH, Greece7 Routing Algorithm Deterministic routing employed. Non-preferred paths follow XY routing. We slightly modify XY routing to handle preferred paths: Flit correctly eagerly forwarded if it approaches the destination in any axis. Flit considered dead otherwise.

May 2007ICS-FORTH, Greece8 Routing Characteristics Flits in preferred paths may not follow XY routing. Duplicate copies of a flit may be delivered. XY routing. Pref. paths. D S

May 2007ICS-FORTH, Greece9 NoC Topology – Bar Floorplan Application: Tiled CPU and RAM blocks. Each switch is 6x6 and serves 4 PEs.

May 2007ICS-FORTH, Greece10 Bar Floorplan Would be 8x12: Vertical links drive address inputs. 2 PE data ports served by 1 switch port.

May 2007ICS-FORTH, Greece11 Cross Floorplan

May 2007ICS-FORTH, Greece12 Layout Results 130 nm implementation library. Typical case. Pref. path latency: ps ps (incl. 1mm). 1 cycle/node otherwise. Past work: 1 cycle/node at 500 MHz. Clock frequency667 MHz Flit width39 FIFO lines2 Number of FIFOs30 Bar area overhead13% Cross area overhead18% Number of cells15 K Number of gates45 K Total dynamic power80 mW

May 2007ICS-FORTH, Greece13 Advanced Issues Deadlock & livelock freedom. Constraints to prevent circle. Keep NoC functional in any case. Out-of-order delivery of flits in the same packet. Apply reconfiguration at a “safe” time. Adaptive routing.

May 2007ICS-FORTH, Greece14 Future Work Synchronization issues – A flit may arrive at any time. Impose preferred path constraints. Implement switch asynchronously. Evaluation in complete system. Implement fault-tolerance.

May 2007ICS-FORTH, Greece15 Conclusion We approach ideal latency. By pre-enabled tri-state paths. Our NoC is a generalized “mad- postman” [C. R. Jesshope et al, 1989]. Our NoC is easily generalized – topology may need to be changed. Past NoC research can be applied for further optimizations.

May 2007ICS-FORTH, Greece16 Related Work Most assumed 2D mesh-like topologies. Reconfigurable topologies studied. Various performance enhancement techniques studied. They achieve 1 clock cycle/node at approx. 500 Mhz. Various routing algorithms studied. Recent field: fault-tolerant techniques.

May 2007ICS-FORTH, Greece17 Backpressure FIFO almost full Previous hop’s feeding output alerted. If fed by a preferred path in the previous hop Flits are also enqueued. Preferred path may or may not be broken.

May 2007ICS-FORTH, Greece18 Mad Postman XY routing. Eagerly forward incoming flits to the same axis. Later examine if correctly forwarded. Terminate “dead” flits in later hops. MSG Penalty Source Destination

May 2007ICS-FORTH, Greece19 NoC Topology Application: Tiled CPU and RAM blocks. Each switch is 6x6 and serves 4 PEs. Would be 8x12 without compromises: Vertical links also drive RAM address in. Two PE data ports are served by a single switch port: Data from one PE: