HMFlow: Accelerating FPGA Compilation with Hard Macros for Rapid Prototyping Christopher Lavin, Marc Padilla, Jaren Lamprecht, Philip Lundrigan Brent Nelson.

Slides:

Advertisements

Similar presentations

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.

Advertisements

Spartan-3 FPGA HDL Coding Techniques

ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.

Architectural Improvement for Field Programmable Counter Array: Enabling Efficient Synthesis of Fast Compressor Trees on FPGA Alessandro Cevrero 1,2 Panagiotis.

EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.

Fast FPGA Resource Estimation Paul Schumacher & Pradip Jha Xilinx, Inc.

Power Visualization, Analysis, and Optimization Tools for FPGAs Matthew French, Li Wang and Michael Wirthlin USC, Information Sciences Institute, BYU.

Integrated Circuits Laboratory Faculty of Engineering Digital Design Flow Using Mentor Graphics Tools Presented by: Sameh Assem Ibrahim 16-October-2003.

RapidSmith: Do-It-Yourself CAD Tools for Xilinx FPGAs Christopher Lavin, Marc Padilla, Jaren Lamprecht, Philip Lundrigan Brent Nelson and Brad Hutchings.

Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.

ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.

The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.

Configurable System-on-Chip: Xilinx EDK

Evolution of implementation technologies

Logic Design Outline –Logic Design –Schematic Capture –Logic Simulation –Logic Synthesis –Technology Mapping –Logic Verification Goal –Understand logic.

Porting EDIF to Viva Sreesa Akella, Heather Wake, Duncan Buell, James P. Davis Department of Computer Science and Engineering, University of South Carolina.

ECE 699: Lecture 2 ZYNQ Design Flow.

1 Chapter 7 Design Implementation. 2 Overview 3 Main Steps of an FPGA Design ’ s Implementation Design architecture Defining the structure, interface.

CS 151 Digital Systems Design Lecture 38 Programmable Logic.

From Concept to Silicon How an idea becomes a part of a new chip at ATI Richard Huddy ATI Research.

1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.

GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.

© 2011 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Xilinx Tool Flow.

Delevopment Tools Beyond HDL

ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.

Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.

Highest Performance Programmable DSP Solution September 17, 2015.

An automatic tool flow for the combined implementation of multi-mode circuits Brahim Al Farisi, Karel Bruneel, João Cardoso, Dirk Stroobandt.

Power Reduction for FPGA using Multiple Vdd/Vth

Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.

Open Discussion of Design Flow Today’s task: Design an ASIC that will drive a TV cell phone Exercise objective: Importance of codesign.

© 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx Design Flow FPGA Design Flow Workshop.

High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.

SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.

Heng Tan Ronald Demara A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management.

Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.

Lecture 2 1 ECE 412: Microcomputer Laboratory Lecture 2: Design Methodologies.

J. Christiansen, CERN - EP/MIC

VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.

Tools - Implementation Options - Chapter15 slide 1 FPGA Tools Course Implementation Options.

Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.

Developing software and hardware in parallel Vladimir Rubanov ISP RAS.

This material exempt per Department of Commerce license exception TSU Xilinx Tool Flow.

EE 466/586 VLSI Design Partha Pande School of EECS Washington State University

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU FPGA Design with Xilinx ISE Presenter: Shu-yen Lin Advisor: Prof. An-Yeu Wu 2005/6/6.

High Speed Digital Systems Lab. Agenda  High Level Architecture.  Part A.  DSP Overview. Matrix Inverse. SCD  Verification Methods. Verification Methods.

Introductory project. Development systems Design Entry –Foundation ISE –Third party tools Mentor Graphics: FPGA Advantage Celoxica: DK Design Suite Design.

1 - CPRE 583 (Reconfigurable Computing): VHDL to FPGA: A Tool Flow Overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 5: 9/7/2011.

Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.

Introduction to FPGA Tools

Tools - Design Manager - Chapter 6 slide 1 Version 1.5 FPGA Tools Training Class Design Manager.

1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.

FPGA CAD 10-MAR-2003.

WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.

FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

DEPARTMENT OF RapidSmith 2: A Framework for BEL-level CAD Exploration on Xilinx FPGAs FPGA’2015 Feb Travis Haroldsen Brent Nelson Brad Hutchings.

Time-borrowing platform in the Xilinx UltraScale+ family of FPGAs and MPSoCs Ilya Ganusov, Benjamin Devlin.

Rapid Overlay Builder for Xilinx FPGAs

FPGA Tools Course Answers

ChipScope Pro Software

ECE 699: Lecture 3 ZYNQ Design Flow.

ChipScope Pro Software

THE ECE 554 XILINX DESIGN PROCESS

Measuring the Gap between FPGAs and ASICs

THE ECE 554 XILINX DESIGN PROCESS

HardWireTM FpgASIC The Superior ASIC Solution

Xilinx Alliance Series

Presentation transcript:

HMFlow: Accelerating FPGA Compilation with Hard Macros for Rapid Prototyping Christopher Lavin, Marc Padilla, Jaren Lamprecht, Philip Lundrigan Brent Nelson and Brad Hutchings FCCM May 2, 2011

Design Cycle Debug Edit Compile 2

The Problem Reduced Productivity Higher Development Costs FPGA Compilation Times Severely Limit Turns Per Day FPGA Compilation Times Severely Limit Turns Per Day 3

Question Without regard for circuit quality… how fast can we make FPGA compilation? 4

Let’s emulate software compilation –Pre-compiled libraries In hardware, these are called “hard macros” –Rapidly link, or assemble hard macros to create designs Goals –Design to implementation in seconds –Find out: How fast can designs using hard macros be compiled for an FPGA? Approach 5

What is a Hard Macro? A pre-placed and pre- routed module Can be placed multiple places on the FPGA fabric Hard-macro based design –Skips Synthesis (XST) technology mapping (NGDBuild) Packing (MAP) –Design assembled from hard macros into final implementation bit Multiplier Hard Macros

How To Create a Hard Macro? Use Xilinx tools to create the macro’s circuitry Custom area constraints generated on-the-fly Provide sufficient area for cells (LUTs, DSPs, BRAMs) Placed and routed NCD extracted for hard macro creation NCD translated into XDL, converted to Hard Macro Hard macro ports are added Illegal primitives removed (TIEOFFs, IOBs, etc) GND and VCC nets removed 7 Regular Design Hard Macro Design

RapidSmith: Create Your Own CAD Tools Makes writing CAD tools easier –Uses XDL as input/output format –Targets real Xilinx FPGAs Over 200 APIs for manipulating XDL netlists Java-based and open source –Available at: 8

Approach: HMFlow 9.mdl HM Cache Generic HMG Design Parser & Mapper Design Stitcher XDL Hard Macro Placer XDL Router.xdl I NPUT D ESIGNS H ARD M ACRO S OURCES C OMPLETELY P LACED & R OUTED XDL X ILINX PAR E QUIVALENT Built on RapidSmith

What is System Generator? A Xilinx hardware library blockset for the Simulink environment in MATLAB 10 HMFlow supports over 75% of the System Generator block set

How Do You Run HMFlow? 11

One Design, Two Flows 12 Placed & Routed Design (XDL) Placed & Routed Design (NCD) XDL 2 NCD XDL 2 NCD Placed & Routed Design (NCD)

How does HMFlow work? 13 XDL Design Add Mux Addressable Shift Register Hard Macro Cache Addressable Shift Register Add Mux Hard Macro Cache Hard Macro Generator

Stitching the Design 14 XDL Design Add Mux Addressable Shift Register Design Stitcher Inserts IOBs and Clk circuitry Examines original design and creates network connections Design is then ready for placement and routing

Placement Xilinx PAR does not handle hard macro- based designs well –Developed fast heuristic for placement Everything is only placed once, rapid results Hard macros with BRAMs/DSPs are placed first Next, largest to smallest hard macros are placed Placement attempts to place each block next to its most highly connected neighbor –Example: places 757 blocks in 219ms –Developed interactive hand placer / visualizer tool 15

Hard Macro Debug Placer blocks placed in 219 ms

Routing PathFinder is just too slow for our goals (Remember: compilation speed at all costs) Maze Router –First come first served routing resource Very fast router –Uses ‘congestion avoidance’ instead of negotiation Analyzes design previous to routing Critical resources reserved to avoid conflicts Depends on chip not being fully utilized 17

V4 Slice Counts in Benchmark Designs SX35 LX200 SX55 18

Xilinx vs. HMFlow Runtimes X Speedup

Runtime Distribution of HMFlow 20

Runtime Distribution of HMFlow + XDL2NCD 21

Maximum Clock Rate of Designs X Slowdown

Conclusion RapidSmith –Java, open source XDL CAD tool framework –Required foundation for HMFlow HMFlow provides hard macro-based design –10-12X speedup over fastest Xilinx flow –Scalable to very large designs –Clock rate 2-4X decrease Still 10,000’s times faster than simulation XDL2NCD conversion time: outstanding issue Come see Demo Night 23

Related and Future Work Related Work: Hard Macros / XDL –Next talk: Automatic HDL-based generation of homogeneous hard macros for FPGAs –USC-ISI’s Torc: Tools for Open Reconfigurable Computing Open source project with similar goals to RapidSmith Our Future Work –Continue support of RapidSmith –HMFlow: Larger hard macros Maintain rapid compilation AND high clock rates LabVIEW FPGA designs 24

Backup Slides 25

Placement Object Reduction by Using Hard Macros X Object Reduction

Supported Blocks in HMFlow 27 Over 75% of blocks supported in most commonly used System Generator packages

Benchmark Runtimes 28 Design Name Simulink Parser BlockGenStitcherPlacerRouter XDL Export HMFlowXDL2NCD HMFlow Total Xilinx Runtime pd_control polyphaseFilter aliasingDDC dualDivider computeMetric fft filtersAndFFT frequencyEstimator dualFilter trellisDecoder filterFFTCM multibandCorrelator signalEstimator All times are recorded in seconds

Benchmark Attributes 29 Design NameSlicesBRAMsDSP48s Primitive Instances Hard Macro Instances Hard Macro Defs Nets Xilinx Clk Speed HMFlow Clk Speed HMFlow Speedup Time2NCD Speedup pd_control x15.85x polyphaseFilter x9.19x aliasingDDC x6.17x dualDivider x9.34x computeMetric x6.20x fft x6.94x filtersAndFFT x7.13x frequencyEstimator x2.89x dualFilter x2.60x trellisDecoder x4.69x filterFFTCM x1.65x multibandCorrelator x1.33x signalEstimator x1.51x

Benchmark Resource Usage as a Percentage of a Virtex4 LX200 30