Rapid Overlay Builder for Xilinx FPGAs

Slides:



Advertisements
Similar presentations
Basic HDL Coding Techniques
Advertisements

FPGA (Field Programmable Gate Array)
Spartan-3 FPGA HDL Coding Techniques
QUIZ What does ICAP stand for ? What is its main use ? Why is Partition Pin preferred over Bus Macro? 1.
Architectural Improvement for Field Programmable Counter Array: Enabling Efficient Synthesis of Fast Compressor Trees on FPGA Alessandro Cevrero 1,2 Panagiotis.
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
© 2003 Xilinx, Inc. All Rights Reserved Architecture Wizard and PACE FPGA Design Flow Workshop Xilinx: new module Xilinx: new module.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
400 Gb/s Programmable Packet Parsing on a Single FPGA Authors : Michael Attig 、 Gordon Brebner Publisher: 2011 Seventh ACM/IEEE Symposium on Architectures.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
Configurable System-on-Chip: Xilinx EDK
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
1 Chapter 7 Design Implementation. 2 Overview 3 Main Steps of an FPGA Design ’ s Implementation Design architecture Defining the structure, interface.
Built-In Self-Test of Programmable I/O Cells in Virtex-4 FPGAs Bradley F. Dutton, Lee W. Lerner, and Charles E. Stroud Dept. of Electrical & Computer Engineering.
Implementation of DSP Algorithm on SoC. Characterization presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompany engineer : Emilia Burlak.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
© 2011 Xilinx, Inc. All Rights Reserved Intro to System Generator This material exempt per Department of Commerce license exception TSU.
© 2011 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Xilinx Tool Flow.
General Purpose FIFO on Virtex-6 FPGA ML605 board Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf 1 Semester: spring 2012.
Digital signature using MD5 algorithm Hardware Acceleration
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
Matrix Multiplication on FPGA Final presentation One semester – winter 2014/15 By : Dana Abergel and Alex Fonariov Supervisor : Mony Orbach High Speed.
Coarse and Fine Grain Programmable Overlay Architectures for FPGAs
Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.
1 Rapid Estimation of Power Consumption for Hybrid FPGAs Chun Hok Ho 1, Philip Leong 2, Wayne Luk 1, Steve Wilton 3 1 Department of Computing, Imperial.
© 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx Design Flow FPGA Design Flow Workshop.
1 Extending Atmel FPGA Flow Nikos Andrikos TEC-EDM, ESTEC, ESA, Netherlands DAUIN, Politecnico di Torino, Italy NPI Final Presentation 25 January 2013.
J. Christiansen, CERN - EP/MIC
HMFlow: Accelerating FPGA Compilation with Hard Macros for Rapid Prototyping Christopher Lavin, Marc Padilla, Jaren Lamprecht, Philip Lundrigan Brent Nelson.
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
This material exempt per Department of Commerce license exception TSU Xilinx Tool Flow.
TOPIC : SYNTHESIS INTRODUCTION Module 4.3 : Synthesis.
Introductory project. Development systems Design Entry –Foundation ISE –Third party tools Mentor Graphics: FPGA Advantage Celoxica: DK Design Suite Design.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Hot Interconnects TCP-Splitter: A Reconfigurable Hardware Based TCP/IP Flow Monitor David V. Schuehler
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.
A Design Flow for Optimal Circuit Design Using Resource and Timing Estimation Farnaz Gharibian and Kenneth B. Kent {f.gharibian, unb.ca Faculty.
Low Power IP Design Methodology for Rapid Development of DSP Intensive SOC Platforms T. Arslan A.T. Erdogan S. Masupe C. Chun-Fu D. Thompson.
Exploring SOPC Performance Across FPGA Architectures Franjo Plavec June 9, 2006.
A Multi-Ported Memory Compiler Utilizing True Dual- port BRAMs Ameer Abdelhadi and Guy Lemieux Department of Electrical and Computer Engineering University.
400 Gb/s Programmable Packet Parsing on a Single FPGA Author: Michael Attig 、 Gordon Brebner Publisher: ANCS 2011 Presenter: Chun-Sheng Hsueh Date: 2013/03/27.
Relational Query Processing on OpenCL-based FPGAs Zeke Wang, Johns Paul, Hui Yan Cheah (NTU, Singapore), Bingsheng He (NUS, Singapore), Wei Zhang (HKUST,
Mohamed Abdelfattah Vaughn Betz
Presenter: Darshika G. Perera Assistant Professor
Author: Yun R. Qu, Shijie Zhou, and Viktor K. Prasanna Publisher:
Time-borrowing platform in the Xilinx UltraScale+ family of FPGAs and MPSoCs Ilya Ganusov, Benjamin Devlin.
A Partial Reconfiguration Controller for Altera Stratix V FPGAs
ASIC Design Methodology
Hardware design considerations of implementing neural-networks algorithms Presenter: Nir Hasidim.
Introduction to Programmable Logic
FPGAs in AWS and First Use Cases, Kees Vissers
Instructor: Prof. Levitan, Prof. Jones Student: Xinyu Yi
Anne Pratoomtong ECE734, Spring2002
Reconfigurable Computing
The University of British Columbia
High Level Synthesis Overview
ChipScope Pro Software
ChipScope Pro Software
THE ECE 554 XILINX DESIGN PROCESS
H a r d w a r e M o d e l i n g O v e r v i e w
Measuring the Gap between FPGAs and ASICs
THE ECE 554 XILINX DESIGN PROCESS
Presentation transcript:

Rapid Overlay Builder for Xilinx FPGAs Michael Yue1, Dirk Koch2, Guy Lemieux1 1University of British Columbia, 2University of Manchester 1

Motivation Design productivity barrier: place-and-route (PAR) process Traditional techniques to accelerate PAR process: Parallel compilation Netlist preservation Trading circuit performance Speedup obtainable is still limited 2

Programmer-friendly Languages Motivation focus of this paper Design flow using overlay architectures Programmer-friendly Languages HDL Code Fast High-level Synthesis Overlay Architecture LONG PAR PROCESS! PAR Process FPGA Substrate FPGA Substrate traditional design flow new design flow 3

Contributions This paper developed a component-based design methodology that: Obtains scalable speedups in building overlay designs Achieves high logic utilization level with scalable speedups Maintains higher and more consistent clock rates compared to ISE 4

CGRA Architecture – PE PE PE PE PE PE 32-bit input/output bus 5-bit personalization bus Nearest-neighbor communication Integer operations Shifting Addition/subtraction Comparison Multiplication Bit manipulation PE PE PE PE PE 5

CGRA Architecture – FPGA Driver Communication: DDR3 Ethernet PCIe Purposes: Streaming application data Personalization Partial reconfiguration HOST PLATFORM FPGA Driver 6

ROB Methodology Step 1 - Resource budgeting Step 2 - Floorplanning Step 3 - Building initial PE variants Step 4 - Extracting PE tiles Step 5 - Relocating PE tiles Step 6 - Establishing interconnects 7

Step 1 - Resource Budgeting Preliminary understanding of the size of one PE tile Implementation without applying any physical constraints Different synthesis options Logic-only Logic and DSP block Logic and BRAM block Logic, DSP and BRAM block 8

Step 2 - Floorplanning Physically constraining each PE tile in the CGRA Floorplan Alternative #1 Floorplan Alternative #2 Floorplan Alternative #3 9 6 PEs 8 PEs 7 PEs

Step 3 - Building Initial PE Variants Same functionality Different underlying resource footprints Determined by the floorplan Built to be replicated across the device 10

Step 3 - Building Initial PE Variants Placed and routed PE variant 11

Step 4 - Extracting PE Tiles Discarding connection anchors Script: ClearSelection; AddBlockToSelection UpperLeftTile=INT_X1YI LowerRightTile=INT_X2Y2; ExtractModule XDL_Input=pe_variant.xdl XDL_Output=pe_tile.xdl; 12

Step 5 - Relocating PE Tiles Script: # Instantiating the left PE column Set Variable=PE_top Value="220"; SetLabel LabelName=LoopHead_1;  AddBlockToSelection UpperLeftTile=INT_X9Y[%PE_top%-1] LowerRightTile=INT_X9Y[%PE_top%-1]; Set Variable=PE_top Value=[%PE_top%-20]; GotoLabel LabelName=LoopHead_1 Condition=%PE_top%>170; AddInstantiationInSelectedTiles = PE_Tile_1; 13

Step 6 - Establishing interconnects Script: FuseNets NetlistName=CGRA PrintProgress=True; NetlistName=FPGA_Driver PrintProgress=True; 14

ROB Methodology Time (seconds) Results Two use cases were evaluated: * XDL netlist conversion is entirely done using Xilinx tools. CGRA Size ROB Methodology Time (seconds) Speedup Initial PE Building Stitching XDL Conversion* Total Time Use Case 1 Use Case 2 18 PEs 1080 40 69 1189 2.0x 22.0x 41 PEs 56 210 1346 2.7x 13.7x 49 PEs 78 277 1435 3.1x 12.4x 57 PEs 81 377 1538 3.7x 12.5x 65 PEs 88 449 1617 3.5x 10.4x 77 PEs 106 695 1881 5.2x 12.2x 89 PEs 113 844 2037 4.8x 10.1x 101 PEs 125 1054 2259 4.9x 9.3x 15

Results Utilization and Fmax results comparison 16

Conclusion This paper developed the ROB methodology that utilizes (1)module relocation, (2)module variants, and (3)zipping to: Obtain scalable speedups in building overlay designs Achieve high logic utilization level with scalable speedups Maintain higher and more consistent clock rates compared to ISE 17

Thank you 18