Abstract As transistor sizes shrink and we approach the ``end of Moore's law'', interconnects, both on-chip and off-chip, will represent the biggest bottleneck.

Slides:



Advertisements
Similar presentations
A Novel 3D Layer-Multiplexed On-Chip Network
Advertisements

Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
1 Wide-Sense Nonblocking Multicast in a Class of Regular Optical Networks From: C. Zhou and Y. Yang, IEEE Transactions on communications, vol. 50, No.
Mohammed Yousef Abd El ghany, Faculty of Eng., Comm. Dep., 3rd year. Digital Signal Processor The Heart of Modern Real-Time Control Systems.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
Networks on Chip : a very quick introduction! Jeremy Chan 11 May 2005.
Interconnection Networks 1 Interconnection Networks (Chapter 6) References: [1,Wilkenson and Allyn, Ch. 1] [2, Akl, Chapter 2] [3, Quinn, Chapter 2-3]
Towards Data Partitioning for Parallel Computing on Three Interconnected Clusters Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory.
Spring 08, Jan 15 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Towards a Realistic Scheduling Model Oliver Sinnen, Leonel Sousa, Frode Eika Sandnes IEEE TPDS, Vol. 17, No. 3, pp , 2006.
The Maryland Optics Group Multi-Hop View: Interfaces not available between (s, d): Try to create multi-hop path. Link Selection: Local Optimization: Select.
Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and.
CAD and Design Tools for On- Chip Networks Luca Benini, Mark Hummel, Olav Lysne, Li-Shiuan Peh, Li Shang, Mithuna Thottethodi,
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
Issues in System-Level Direct Networks Jason D. Bakos.
A Platform-based Design Flow for Kahn Process Networks Abhijit Davare Qi Zhu December 10, 2004.
ECE 510 Brendan Crowley Paper Review October 31, 2006.
Computation Energy Randy Huang Sep 29, Outline n Why do we care about energy/power n Components of power consumption n Measurements of power consumption.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Torino (Italy) – June 25th, 2013 Ant Colony Optimization for Mapping, Scheduling and Placing in Reconfigurable Systems Christian Pilato Fabrizio Ferrandi,
Lecture 2: Parallel Reduction Algorithms & Their Analysis on Different Interconnection Topologies Shantanu Dutt Univ. of Illinois at Chicago.
1 Extending Summation Precision for Network Reduction Operations George Michelogiannakis, Xiaoye S. Li, David H. Bailey, John Shalf Computer Architecture.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
An Energy-Efficient Reconfigurable Multiprocessor IC for DSP Applications Multiple programmable VLIW processors arranged in a ring topology –Balances its.
SBSE Course 4. Overview: Design Translate requirements into a representation of software Focuses on –Data structures –Architecture –Interfaces –Algorithmic.
Task Alloc. In Dist. Embed. Systems Murat Semerci A.Yasin Çitkaya CMPE 511 COMPUTER ARCHITECTURE.
Interconnect Networks
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
Computer Science Informed Content Delivery Across Adaptive Overlay Networks Overlay networks have emerged as a powerful and highly flexible method for.
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Scalable Reconfigurable Interconnects Ali Pinar Lawrence Berkeley National Laboratory joint work with Shoaib Kamil, Lenny Oliker, and John Shalf CSCAPES.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Embedded Runtime Reconfigurable Nodes for wireless sensor networks applications Chris Morales Kaz Onishi 1.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
1 Multicasting in a Class of Multicast-Capable WDM Networks From: Y. Wang and Y. Yang, Journal of Lightwave Technology, vol. 20, No. 3, Mar From:
CSE 291A Interconnection Networks Instructor: Prof. Chung-Kuan, Cheng CSE Dept. UCSD Winter-2007.
An Integrated Design Environment to Evaluate Power/Performance Tradeoffs for Sensor Network Applications Amol Bakshi, Jingzhao Ou, and Viktor K. Prasanna.
Quadrisection-Based Task Mapping on Many-Core Processors for Energy-Efficient On-Chip Communication Nithin Michael, Yao Wang, G. Edward Suh and Ao Tang.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
Oct 31 st 2007University of Utah1 Multi-Cores: Architecture/VLSI Perspective The Hardware-Software Relationship: Date or Dump?
Hybrid Optoelectric On-chip Interconnect Networks Yong-jin Kwon 1.
Using Custom Accelerators in Wireless Systems Alex Papakonstantinou, Deming Chen Illinois Center for Wireless Systems Wireless SoC Design Trends and Challenges.
Multiprocessor SoC integration Method: A Case Study on Nexperia, Li Bin, Mengtian Rong Presented by Pei-Wei Li.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS Wim Heirman, Iñigo Artundo, Joni Dambre, Christof Debaes, Pham.
University of Maryland at College Park Smart Dust Digital Processing, 1 Digital Processing Platform Low power design and implementation of computation.
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
Building manycore processor-to-DRAM networks using monolithic silicon photonics Ajay Joshi †, Christopher Batten †, Vladimir Stojanović †, Krste Asanović.
Extending wireless Ad-Hoc
System On Chip.
Nithin Michael, Yao Wang, G. Edward Suh and Ao Tang Cornell University
Anne Pratoomtong ECE734, Spring2002
Digital Processing Platform
Digital Processing Platform
Topological Ordering Algorithm: Example
Reducing Total Network Power Consumption
Topological Ordering Algorithm: Example
Topological Ordering Algorithm: Example
Md. Tanveer Anwar University of Arkansas
Topological Ordering Algorithm: Example
Presentation transcript:

Abstract As transistor sizes shrink and we approach the ``end of Moore's law'', interconnects, both on-chip and off-chip, will represent the biggest bottleneck for embedded systems designers. Several groups are researching optical interconnects to cope with this trend. Optical interconnects enable new system architectures. These new architectures in turn require new methods for high-level application mapping and hardware/software co- design. In this presentation, we discuss high-level scheduling and interconnect topology synthesis techniques for embedded multiprocessors. We focus on designs that are streamlined for one or more digital signal processing (DSP) applications. That is, we seek to synthesize an application-specific interconnect topology for a multiprocessor DSP design. We show that flexible interconnect topologies that allow single-hop communication between processors offer advantages for reduced power and latency. We have previously shown that multiprocessor scheduling algorithms can deadlock in the general case of a topology graph that is not strongly connected, or if communication is limited to be single hop. We have also demonstrated an efficient algorithm that can be used in conjunction with existing scheduling algorithms for avoiding this deadlock. In this presentation we discuss the advantages of performing application scheduling and interconnect synthesis jointly, and present a probabilistic scheduling/interconnect algorithm utilizing graph isomorphism to pare the design space. We demonstrate the performance advantages that an application-specific interconnect topology can produce for several DSP benchmarks.

Deadlock and Flexibility Existing scheduling algorithms assume every pair of processors can communicate Scheduling not well studied for irregular interconnection networks Can deadlock for arbitrary topologies Developed algorithms to adapt existing list scheduling algorithms to avoid deadlock Some scheduling moves have greater flexibility Partial Schedule A on proc. 2 B on proc. 1 Constraint Sets F[A] = {2} F[B] = {1} F[F] = {1} F[D] = {1,2} F[E] = {1,2,3} F[C] = {1,2,3} Flexibility = 11/ Topology graph C A B D E F Application graph Partial Schedule A on proc. 2 B on proc. 3 Constraint Sets F[A] = {2} F[B] = {3} F[F] = {0,3} F[D] = {1,2,3} F[E] = {0,1,2,3} F[C] = {0,1,2,3} Flexibility = 15/24

Effect of Topology Lower latency and communication energy for topology 1

Low Hop Communication Saves Energy

Link Synthesis Algorithm Developed both deterministic and evolutionary (GA) algorithms GA objective utilizes DLS scheduling modified with flexibility metric Crossover operators allow fan-out constraints to be preserved Use graph isomorphism to pare the design space Paring the design space Link synthesis results Consider only isomorphically unique graphs Reduction by orders of magnitude GA (red) outperforms deterministic over a range of topologies