Download presentation
Presentation is loading. Please wait.
Published byThomasine Hall Modified over 9 years ago
1
HMFlow: Accelerating FPGA Compilation with Hard Macros for Rapid Prototyping Christopher Lavin, Marc Padilla, Jaren Lamprecht, Philip Lundrigan Brent Nelson and Brad Hutchings FCCM May 2, 2011
2
Design Cycle Debug Edit Compile 2
3
The Problem Reduced Productivity Higher Development Costs FPGA Compilation Times Severely Limit Turns Per Day FPGA Compilation Times Severely Limit Turns Per Day 3
4
Question Without regard for circuit quality… how fast can we make FPGA compilation? 4
5
Let’s emulate software compilation –Pre-compiled libraries In hardware, these are called “hard macros” –Rapidly link, or assemble hard macros to create designs Goals –Design to implementation in seconds –Find out: How fast can designs using hard macros be compiled for an FPGA? Approach 5
6
What is a Hard Macro? A pre-placed and pre- routed module Can be placed multiple places on the FPGA fabric Hard-macro based design –Skips Synthesis (XST) technology mapping (NGDBuild) Packing (MAP) –Design assembled from hard macros into final implementation 6 15 20-bit Multiplier Hard Macros
7
How To Create a Hard Macro? Use Xilinx tools to create the macro’s circuitry Custom area constraints generated on-the-fly Provide sufficient area for cells (LUTs, DSPs, BRAMs) Placed and routed NCD extracted for hard macro creation NCD translated into XDL, converted to Hard Macro Hard macro ports are added Illegal primitives removed (TIEOFFs, IOBs, etc) GND and VCC nets removed 7 Regular Design Hard Macro Design
8
RapidSmith: Create Your Own CAD Tools Makes writing CAD tools easier –Uses XDL as input/output format –Targets real Xilinx FPGAs Over 200 APIs for manipulating XDL netlists Java-based and open source –Available at: http://rapidsmith.sourceforge.nethttp://rapidsmith.sourceforge.net 8
9
Approach: HMFlow 9.mdl HM Cache Generic HMG Design Parser & Mapper Design Stitcher XDL Hard Macro Placer XDL Router.xdl I NPUT D ESIGNS H ARD M ACRO S OURCES C OMPLETELY P LACED & R OUTED XDL X ILINX PAR E QUIVALENT Built on RapidSmith
10
What is System Generator? A Xilinx hardware library blockset for the Simulink environment in MATLAB 10 HMFlow supports over 75% of the System Generator block set
11
How Do You Run HMFlow? 11
12
One Design, Two Flows 12 Placed & Routed Design (XDL) Placed & Routed Design (NCD) XDL 2 NCD XDL 2 NCD Placed & Routed Design (NCD)
13
How does HMFlow work? 13 XDL Design Add Mux Addressable Shift Register Hard Macro Cache Addressable Shift Register Add Mux Hard Macro Cache Hard Macro Generator
14
Stitching the Design 14 XDL Design Add Mux Addressable Shift Register Design Stitcher Inserts IOBs and Clk circuitry Examines original design and creates network connections Design is then ready for placement and routing
15
Placement Xilinx PAR does not handle hard macro- based designs well –Developed fast heuristic for placement Everything is only placed once, rapid results Hard macros with BRAMs/DSPs are placed first Next, largest to smallest hard macros are placed Placement attempts to place each block next to its most highly connected neighbor –Example: places 757 blocks in 219ms –Developed interactive hand placer / visualizer tool 15
16
Hard Macro Debug Placer 16 757 blocks placed in 219 ms
17
Routing PathFinder is just too slow for our goals (Remember: compilation speed at all costs) Maze Router –First come first served routing resource Very fast router –Uses ‘congestion avoidance’ instead of negotiation Analyzes design previous to routing Critical resources reserved to avoid conflicts Depends on chip not being fully utilized 17
18
V4 Slice Counts in Benchmark Designs SX35 LX200 SX55 18
19
Xilinx vs. HMFlow Runtimes 19 10-12X Speedup
20
Runtime Distribution of HMFlow 20
21
Runtime Distribution of HMFlow + XDL2NCD 21
22
Maximum Clock Rate of Designs 22 2-4X Slowdown
23
Conclusion RapidSmith –Java, open source XDL CAD tool framework –Required foundation for HMFlow HMFlow provides hard macro-based design –10-12X speedup over fastest Xilinx flow –Scalable to very large designs –Clock rate 2-4X decrease Still 10,000’s times faster than simulation XDL2NCD conversion time: outstanding issue Come see RapidSmith/HMFlow @ Demo Night 23
24
Related and Future Work Related Work: Hard Macros / XDL –Next talk: Automatic HDL-based generation of homogeneous hard macros for FPGAs –USC-ISI’s Torc: Tools for Open Reconfigurable Computing Open source project with similar goals to RapidSmith Our Future Work –Continue support of RapidSmith –HMFlow: Larger hard macros Maintain rapid compilation AND high clock rates LabVIEW FPGA designs 24
25
Backup Slides 25
26
Placement Object Reduction by Using Hard Macros 26 10-20X Object Reduction
27
Supported Blocks in HMFlow 27 Over 75% of blocks supported in most commonly used System Generator packages
28
Benchmark Runtimes 28 Design Name Simulink Parser BlockGenStitcherPlacerRouter XDL Export HMFlowXDL2NCD HMFlow Total Xilinx Runtime pd_control0.090.740.190.020.220.061.312.84.1 65.6 polyphaseFilter0.090.750.220.021.410.112.594.06.6 60.3 aliasingDDC0.110.770.220.021.450.132.697.410.1 62.2 dualDivider0.310.890.200.052.410.224.086.310.3 96.6 computeMetric0.280.890.640.056.360.618.8317.125.9 160.8 fft10240.240.940.300.054.950.386.8410.317.2 119.3 filtersAndFFT0.330.980.800.1912.30.7515.420.335.6 254.0 frequencyEstimator0.441.500.580.2218.11.1722.0107.3129.3 373.5 dualFilter0.471.311.200.4434.71.6639.8140.4180.1 469.0 trellisDecoder0.661.721.420.5554.02.5060.9115.1176.0 824.6 filterFFTCM0.521.941.640.9869.93.0578.1541.2619.3 1021 multibandCorrelator0.831.801.841.8673.35.7885.4506.7592.1 786.2 signalEstimator0.842.332.161.53107.515.4129.8869.2999.0 1509 All times are recorded in seconds
29
Benchmark Attributes 29 Design NameSlicesBRAMsDSP48s Primitive Instances Hard Macro Instances Hard Macro Defs Nets Xilinx Clk Speed HMFlow Clk Speed HMFlow Speedup Time2NCD Speedup pd_control15010200211236814712950.00x15.85x polyphaseFilter680847777930163827510823.24x9.19x aliasingDDC806138767825162819110723.14x6.17x dualDivider18320619515423940041417923.69x9.34x computeMetric2551564027993326474471435718.21x6.20x fft1024255381226563134858892157417.43x6.94x filtersAndFFT52032531532558892115901917416.54x7.13x frequencyEstimator698831727152757249169191676016.97x2.89x dualFilter1117333261128390193259611834611.80x2.60x trellisDecoder16973615317269132819642195823513.55x4.69x filterFFTCM18883811219126920149490371483713.08x1.65x multibandCorrelator1973252231990114729047993140349.21x1.33x signalEstimator2384112647240911448390607271043411.62x1.51x
30
Benchmark Resource Usage as a Percentage of a Virtex4 LX200 30
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.