Network-on-Chip Programmable Platform in Versal™ ACAP Architecture

Slides:



Advertisements
Similar presentations
MPLS VPN.
Advertisements

Augmenting FPGAs with Embedded Networks-on-Chip
Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis
A Novel 3D Layer-Multiplexed On-Chip Network
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Brocade VDX 6746 switch module for Hitachi Cb500
The AMD Athlon ™ Processor: Future Directions Fred Weber Vice President, Engineering Computation Products Group.
Multi-Layer Switching Layers 1, 2, and 3. Cisco Hierarchical Model Access Layer –Workgroup –Access layer aggregation and L3/L4 services Distribution Layer.
Hardwired networks on chip for FPGAs and their applications
Towards Virtual Routers as a Service 6th GI/ITG KuVS Workshop on “Future Internet” November 22, 2010 Hannover Zdravko Bozakov.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
Reconfigurable Network Topologies at Rack Scale
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Area & Power Analysis Comparison Against P2P/Buses 4 4.
Microsoft Virtual Academy Module 4 Creating and Configuring Virtual Machine Networks.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Benefits of Partial Reconfiguration Reducing the size of the FPGA device required to implement a given function, with consequent reductions in cost and.
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18.
© 1999, Cisco Systems, Inc. Module 9: Understanding Virtual LANs.
XStream: Rapid Generation of Custom Processors for ASIC Designs Binu Mathew * ASIC: Application Specific Integrated Circuit.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.
The Alpha Network Architecture Mukherjee, Bannon, Lang, Spink, and Webb Summary Slides by Fred Bower ECE 259, Spring 2004.
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
Local-Area Networks. Topology Defines the Structure of the Network – Physical topology – actual layout of the wire (media) – Logical topology – defines.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Mohamed ABDELFATTAH Andrew BITAR Vaughn BETZ. 2 Module 1 Module 2 Module 3 Module 4 FPGAs are big! Design big systems High on-chip communication.
KeyStone SoC Training SRIO Demo: Board-to-Board Multicore Application Team.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
MPLS Introduction How MPLS Works ?? MPLS - The Motivation MPLS Application MPLS Advantages Conclusion.
System on a Programmable Chip (System on a Reprogrammable Chip)
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
VL2: A Scalable and Flexible Data Center Network
Enhancements for Voltaire’s InfiniBand simulator
Mohamed Abdelfattah Vaughn Betz
CIS 700-5: The Design and Implementation of Cloud Networks
Instructor Materials Chapter 4: Introduction to Switched Networks
Modeling and Evaluation of Fibre Channel Storage Area Networks
Overview Parallel Processing Pipelining
LESSON 2.1_A Networking Fundamentals Understand Switches.
Architecture and Algorithms for an IEEE 802
Ottawa, January 9, FETCH FlexTiles: runtime mapping of hardware accelerators on 3D self-adaptive heterogeneous manycore Olivier Sentieys INRIA.
Local Area Networks Honolulu Community College
HyperTransport™ Technology I/O Link
Semester 4 - Chapter 3 – WAN Design
Planning and Troubleshooting Routing and Switching
ESE532: System-on-a-Chip Architecture
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Chapter 4: Switched Networks
Chapter 4: Switched Networks
Israel Cidon, Ran Ginosar and Avinoam Kolodny
The Xilinx Virtex Series FPGA
System Architecture for On-Chip Networks
NTHU CS5421 Cloud Computing
On-time Network On-chip
The Xilinx Virtex Series FPGA
Network Processors for a 1 MHz Trigger-DAQ System
Xilinx Adaptive Compute Acceleration Platform: Versal Architecture
Project proposal: Questions to answer
NetFPGA - an open network development platform
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Network-on-Chip Programmable Platform in Versal™ ACAP Architecture Ian Swarbrick, Dinesh Gaitonde, Sagheer Ahmad, Brian Gaide, Ygal Arbel Xilinx Silicon Architecture Team

Agenda Versal Devices and Motivation for Hardened NoC Overview of Versal NoC Timing Closure Benefits Routing Conclusions

Versal Device Versal – Xilinx 7nm Generation Devices Adaptive Compute Acceleration Platform (ACAP) devices Device-wide hardened NoC Each physical link has full-duplex paths 128 bit data @ 1 GHz at mid-speed grades. All resource address mapped Configurable address map Note: not an actual device

Motivation for Hardened NoC Need for efficient data movement Data movement common across all applications. Wires not scaling with logic. Data movement needs to keep up with memory. ~100 GB/sec DDR Hundreds of GBs/sec with HBM Increasing wire delays make timing closure harder. Move towards Platform with fabric All fabric ports and hard IP are globally addressable from anywhere on the device. Independent from fabric and available without loading any fabric bitstream.

Overview of Versal NoC

Hardened NoC – Packetized Interconnect Packetize, transport, de-packetize Virtual Channels provide independent non-blocking flow control

NoC Topology HNoC – 4 or 2 Physical Lanes VNoC – 2 Physical Lanes 64 GBytes/sec at in each direction 1GHz VNoC – 2 Physical Lanes 32 GBytes/sec in each direction at 1GHz Each NoC lane (physical channel) is 2 full- duplex links. 16 GBytes/sec raw bandwidth (@ 1GHz) HNoCs connect up/down to hard IPs VNoCs connect horizontally to fabric Note: not an actual device

NoC Scaling Horizontal Scaling Bigger devices need more NoC resources Wider devices have more IO on the top and bottom of the device One VNoC column per DDR channel More fabric -> NoC connects is device width grows. More vertical bandwidth as device width grows. Taller devices have more fabric Each half-FSR (fabric sub-region) in VNoC has one ingress and one egress NoC path. More fabric/NoC connections as device height grows. MC MC MC Vertical Scaling MC

NoC for Multi-Die Devices NoC extends across SSIT (Stacked Silicon Interconnect Technology) devices VNoC physical channels continue across devices Source synchronous, single data rate links across interposer

NoC Quality-of-Service Routes through network are programmable Each route is assigned to a traffic class Classes separated using virtual channels Traffic classes can be: Low latency (reads): prioritize traffic to give low latency without compromising other QoS constraints. ISOC: Bounded latency. Best Effort: given whatever service is available. Other mechanisms provide traffic shaping Ingress rate control Weighted arbitration at every NoC switch.

QoS Example (GB/sec) Ask, compiler flow, simulation.

NoC Timing Closure Benefits

Timing Closure Fabric timing closure requires iteration Hardened NoC interfaces and modular design simplify the process

NoC Routing

NoC Routing Versal NoC has distributed routing tables QoS managed per-connection Dynamically re-programmable for partial reconfiguration Tools take user QoS constraints and configure paths Routes created in a deadlock-free manner

NoC Compiler Compiler takes in user traffic and design constraints (including NoC topology) Produces routing assignments that Meet QoS specification Are deadlock free

Hardened NoC Example 4 DDR Memories interleaved 16 Fabric masters External bandwidth from PCIe Local processor control traffic also ~150 GBytes/sec aggregate bandwidth NoC compiler runs in minutes No fabric resources consumed for routing/address/switching or DDR

Conclusions

Versal NoC - Conclusions Hardened NoC that scales across Versal product family Eases timing closure and increases design productivity Flexible QoS Programmable route and bandwidth assignments A core infrastructure piece of the Versal ACAP platform