Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.

Slides:



Advertisements
Similar presentations
Ch. 12 Routing in Switched Networks
Advertisements

Ch. 12 Routing in Switched Networks Routing in Packet Switched Networks Routing Algorithm Requirements –Correctness –Simplicity –Robustness--the.
U of Houston – Clear Lake
A Novel 3D Layer-Multiplexed On-Chip Network
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
Data and Computer Communications Ninth Edition by William Stallings Chapter 12 – Routing in Switched Data Networks Data and Computer Communications, Ninth.
1 EL736 Communications Networks II: Design and Algorithms Class3: Network Design Modeling Yong Liu 09/19/2007.
PRESENTED BY: PRIYANK GUPTA 04/02/2012 Generic Low Latency NoC Router Architecture for FPGA Computing Systems & A Complete Network on Chip Emulation Framework.
Advanced Networking Wickus Nienaber Daniel Beech.
Module R R RRR R RRRRR RR R R R R Efficient Link Capacity and QoS Design for Wormhole Network-on-Chip Zvika Guz, Isask ’ har Walter, Evgeny Bolotin, Israel.
NoC Modeling Networks-on-Chips seminar May, 2008 Anton Lavro.
Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
CS 258 Parallel Computer Architecture Lecture 5 Routing February 6, 2008 Prof John D. Kubiatowicz
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
Issues in System-Level Direct Networks Jason D. Bakos.
Interconnect Efficient LDPC Code Design Aiman El-Maleh Basil Arkasosy Adnan Al-Andalusi King Fahd University of Petroleum & Minerals, Saudi Arabia Aiman.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Switching, routing, and flow control in interconnection networks.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.
Interconnect Networks
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Network Aware Resource Allocation in Distributed Clouds.
Dynamic Networks CS 213, LECTURE 15 L.N. Bhuyan CS258 S99.
Infiniband subnet management Discuss the Infiniband subnet management system Discuss fat tree and subnet management in an Infiniband with a fat tree topology.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Performance Evaluation of ATM Shortcuts in Overlaid IP/ATM Networks Jim Kurose Don Towsley Department of Computer Science Univ. of Massachusetts, Amherst.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Chi-Cheng Lin, Winona State University CS 313 Introduction to Computer Networking & Telecommunication Chapter 5 Network Layer.
ECE669 L21: Routing April 15, 2004 ECE 669 Parallel Computer Architecture Lecture 21 Routing.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Data Communications and Networking Chapter 11 Routing in Switched Networks References: Book Chapters 12.1, 12.3 Data and Computer Communications, 8th edition.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
Virtual-Channel Flow Control William J. Dally
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 
Routing and Switching Fabrics
Advanced Computer Networks
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Azeddien M. Sllame, Amani Hasan Abdelkader
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Switching, routing, and flow control in interconnection networks
ECE453 – Introduction to Computer Networks
On-time Network On-chip
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
RECONFIGURABLE NETWORK ON CHIP ARCHITECTURE FOR AEROSPACE APPLICATIONS
CS 258 Parallel Computer Architecture Lecture 5 Routing (Con’t)
CSE 417: Algorithms and Computational Complexity
Networks: Routing and Design
Switching, routing, and flow control in interconnection networks
COMPUTER NETWORKS CS610 Lecture-16 Hammad Khalid Khan.
Presentation transcript:

Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis of Irregular Application Specific NoC's by Ben Meakin

Introduction Irregular vs Regular Networks N1N2N3 N4N5N6 N7N8N9 N1N2N3 N4N5N6 N7N8N9 Advantages:  High performance/low power for specific application  Lower hardware cost of network Disadvantages:  Difficult routing  Difficult floor planning/P&R Advantages:  Efficient for general purpose applications  Easy routing  Easy floor planning/P&R Disadvantages:  Expensive  Under utilized channels  Avg. latency not as scalable

Background and Motivation NoC power a function of network hops:  Pperflit = (buff + xbar + arb)*Hops + wire*D NoC latency a function of network hops:  L = Rpipeline * Hops (clock cycles)‏ NoC hardware cost a function of router radix:  C = sum (k1*radix_n + k2) n = 1....nodes Minimize hops and router complexity!

Design of Efficient Irregular Networks for Heterogeneous Systems-on-Chip Christian Neeb and Norbert Wehn University of Kaiserslautern 9 th EUROMICRO Conference on Digital System Design 2006

Application Communication Model Resource Communication Graph (RCG)‏  Vertex: hardware component  Edge: abstract communication channel  Edge Weight: average communication rate (bandwidth)‏ Normalized to physical bandwidth of one channel link Efficient mapping of task communication to RCG is assumed and is not discussed in this paper

INOC Architecture Nodes are tiles consisting of a chip resource and a router Routers: Wormhole with virtual channels Round-Robin switch arbitration Routing:  Deterministic: one output channel per destination address  Adaptive: > one output channel per destination address

Routing in Irregular Topologies Channel Dependency Graph (CDG)‏  Vertex: channel  Edge: pair of channels with a dependency due to routing function  Deadlock free if there are no cycles Node Numbering  Arbitrary in this paper Channels Increasing / Channels Decreasing  Acyclic CDGs  Determined by source/destination node numbers Deadlock Free Routing  Route packets on increasing channels, then decreasing General form of dimension order routing

Routing in Irregular Topologies Unsafe paths allowed provided there is a safe path (acyclic) that a packet can escape to  Routing table filled via Dijkstra's algorithm Virtual channels are used to expand routing function using similar rules as dimension order schemes

Network Generation Start with a bidirectional connection of all nodes in ascending order  Ensures a correct routing path exists Add shortcuts based on network traffic estimation and RCG  Provides an optimal routing path that will be used in the common case

Results Network traffic patterns randomly generated with constraints to model heterogeneous SoC traffic  Simulated with 100 different patterns Variable flit injections rates

Limitations Naive implementation of “safe path” increases cost/complexity of network  RCG is the most efficient topology  More efficient deadlock free solutions may exist Efficient mapping of TCG to RCG is assumed  Can be a difficult problem  Network generation could be linked to a specific programming model or API for simple TCG to RCG mapping

Workload Driven Synthesis of Irregular Application-Specific NoC's Ben Meakin

Project Objectives Synthesis tool for INoC's  Generation of optimal network topology  MCAPI workload specification  Synthesis of deadlock free routing Comparison of irregular to regular NoC performance, power, and cost

Motivation CAD for ASICS always needs improvement  Reduce design cost, time-to-market, etc. Synthesis of INoCs could be useful in implementing parallel algorithms in programmable logic Average router complexity needs to be minimized  Asynch NoCs can realize improvement in average case performance GALS INoCs could be future direction in heterogeneous SoC Minimize hops to reduce power and latency

Specification of Workload Communication channels: ScalarChan N1 N2 256e9 10 Comm Type Sender Receiver BW (bits/sec) Priority Optimization Effort: E = 100 * ((BW / Norm)/2 + (PR / Norm)/2)‏

Network Synthesis Algorithm Sort channels by priority/effort Add all nodes to network For each channel:  if sender has links: while link has not been added  if receiver has links, add link; else increase search depth  else recursive call from next sender neighbor node while not done  if network is not correct, add link; else done

Network Synthesis Example N1N2 Max Radix is 3 N1 – N2 added Highest priority lin k

Network Synthesis Example N1N2 N3 N4 N1 – N4 added N1 – N3 added N3 – N4 added

Network Synthesis Example N1N2 N3 N4 N5 N6 N7 N1 – N5 added Max radix of N1 reached, so N2 used as via N6 – N7 added N5 – N8 added Are we done? N8

Network Synthesis Example N1N2 N3 N4 N5 N6 N7 N2 – N6 added for correctness Every node must have a path to every other node N8

Synthesis of Routing Function Rename nodes with naming algorithm  Depth first, breadth first, root node selection Most efficient synthesis for application  For each node pair in workload check if shortest path is correct  if not, add a virtual channel to make it correct Most correct synthesis for general purpose  For ALL node pairs in network check if shortest path is correct  if not, add a virtual channel to make it correct Synthesize ROM for each node

Uses of this Tool Incorporate with previous MCAPI hardware implementation  Allow user to synthesize a working HDL model based on MCAPI workloads and a set of IP blocks  Modern FPGA's could make this a feasible and cost effective alternative to fabricating ASIC's Aid for studying optimal topology for known network traffic

Questions?