SARC Proprietary and Confidential - 2006-05 1 Processor-to-Memory-Blocks NoC with Pre-Configured (but run-time reconfigurable) Low-Latency Routes G. Mihelogiannakis,

Slides:



Advertisements
Similar presentations
A Novel 3D Layer-Multiplexed On-Chip Network
Advertisements

Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Class Presentation Of Advance VLSI Course Presented by : Ali Shahabi Major Refrence is : Architecture and Circuit Techniques for a Reconfigurable Memory.
Aérgia: Exploiting Packet Latency Slack in On-Chip Networks
On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic.
PRESENTED BY: PRIYANK GUPTA 04/02/2012 Generic Low Latency NoC Router Architecture for FPGA Computing Systems & A Complete Network on Chip Emulation Framework.
Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
NETWORK ON CHIP ROUTER Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter Final presentation part A Winter 2006.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
Dynamic NoC. 2 Limitations of Fixed NoC Communication NoC for reconfigurable devices:  NOC: a viable infrastructure for communication among task dynamically.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Router Architectures An overview of router architectures.
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Router Architectures An overview of router architectures.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
McRouter: Multicast within a Router for High Performance NoCs
Approaching Ideal NoC Latency with Pre-Configured Routes George Michelogiannakis, Dionisios Pnevmatikatos and Manolis Katevenis Institute of Computer Science.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
1 University of Utah & HP Labs 1 Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian.
On-Chip Networks and Testing
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
Networks-on-Chips (NoCs) Basics
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), th Annual International Symposium on.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION 03/26/
Variable Packet Size Buffered Crossbar (CICQ) Switches Manolis Katevenis, Georgios Passas, Dimitrios Simos, Ioannis Papaefstathiou, and Nikos Chrysos FORTH.
The Alpha Network Architecture Mukherjee, Bannon, Lang, Spink, and Webb Summary Slides by Fred Bower ECE 259, Spring 2004.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Enabling System-Level Modeling of Variation-Induced Faults in Networks-on-Chips Konstantinos Aisopos (Princeton, MIT) Chia-Hsin Owen Chen (MIT) Li-Shiuan.
University of Michigan, Ann Arbor
Soc 5.1 Chapter 5 Interconnect Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)
February 12, 1999 Architecture and Circuits: 1 Interconnect-Oriented Architecture and Circuits William J. Dally Computer Systems Laboratory Stanford University.
Lecture 16: Router Design
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
Author: J. Kim, C. Nicopoulos (Dept. of CSE, PSU)
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Programmable Hardware: Hardware or Software?
Lecture 23: Interconnection Networks
Exploring Concentration and Channel Slicing in On-chip Network Router
Approaching Ideal NoC Latency with Pre-Configured Routes
Using Packet Information for Efficient Communication in NoCs
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Computer Evolution and Performance
Multiprocessors and Multi-computers
Presentation transcript:

SARC Proprietary and Confidential Processor-to-Memory-Blocks NoC with Pre-Configured (but run-time reconfigurable) Low-Latency Routes G. Mihelogiannakis, M. Katevenis, D. Pnevmatikatos FORTH-ICS, Crete, Greece SARC – Preliminary Draft of May 2006

SARC Proprietary and Confidential Traditional Multiprocessor View Local (cache) memory(ies) seen as monolithic blocks, each

SARC Proprietary and Confidential Proposed View for Chip Multiprocessors Simple processors Lots of memory –to compensate for limited chip I/O throughput Large memories need to be built out of multiple smaller blocks –in order to bound word line & bit line capacitance within each block

SARC Proprietary and Confidential Opportunities for (Re-) Configurability Uniform allocation of memory blocks to processors Non-uniform allocation of memory blocks to processors Challenge: make reconfigurable alloc. almost as fast as fixed

SARC Proprietary and Confidential Long on-chip Wires already contain Active Elements Periodic buffers, due to quadratic nature of RC wire delay Approximate worst-case numbers for a 130-nm technology –as currently available to European Universities as synthesized, placed-&-routed, optimized –Synopsys DC V SP2, SOC-Encounter 3.3, Cadence NC Verilog

SARC Proprietary and Confidential Turn these into Low-Latency Configurability Elements 2-to-1 multiplexor made of (semi-custom) and-or-buffer gates –can we do better with (custom) transmission gates?

SARC Proprietary and Confidential Pre-Configuration is critical for Low Latency Control logic plus fan-out to 32 mux bits add considerable delay

SARC Proprietary and Confidential Configure “Preferred” Paths before Data Arrival Preconfigure (speculatively set) control for “preferred” path Alternate paths still work, at increased latency Configuration can change at run-time, quite fast

SARC Proprietary and Confidential Prior Art: Low Latency NoC Routers Optimize routing decision, crossbar arbitration, VC allocation for one-clock-cycle operation –Mullins, West, Moore: “Low-Latency Virtual-Channel Routers for On- Chip Networks”, ISCA 2004 –Kim, Park, Theocharides, Vijaykrishnan, Das: “A Low Latency Router Supporting Adaptivity for On-Chip Interconnects”, DAC 2005

SARC Proprietary and Confidential Contribution: Decouple Data Rate from Configuration Configure “preferred” paths at whatever convenient rate When header/address/data arrive, forward along preferred path and, in parallel, check header –if destination was not along preferred path, recover at longer latency

SARC Proprietary and Confidential Conclusion Coarse-grain reconfigurability –at the level of memory block, compute processor, compute engine, or (simple) control processor (FSM) Configure “preferred routes” in the chip, along which information flows at very low latency Other routes still available, but at longer latency Preferred routes easily reconfigurable, at run-time