Presentation is loading. Please wait.

Presentation is loading. Please wait.

Floating-Point FPGA (FPFPGA)

Similar presentations


Presentation on theme: "Floating-Point FPGA (FPFPGA)"— Presentation transcript:

1 Floating-Point FPGA (FPFPGA)
Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009

2 Motivation Goal: Build faster, cheaper, lower power FPGAs
How? Fixed-Functionality (hard) blocks! FPGA reconfigurability comes at the price of area, delay, and power Some reconfigurability is unnecessary, remove it for savings

3 What to Make Hard? What hard blocks to use?
If not used, block is wasted Industry suggests including memories and multipliers Paper suggests adding floating-point units (FPU) Given a hard block, how fractured should it be? Eg. Stratix III FPGA multipliers can be configured in a set of four 18x18 multipliers or one 36x36 multiplier How fractured should the FPU be?

4 Introducing FPFPGA Contains soft and hard blocks CGU characteristics:
Soft blocks are composed of standard LUTs, FFs Hard blocks are FPUs called Coarse-grained units (CGU) CGU characteristics: Floating-point (FP) adds and multiplies only Bus-based LUT operations using “wordblock” Dedicated output registers Accessible to soft blocks and vice-versa

5 Architecture of FPFPGA

6 FGU

7 CGU

8 CGU parameters # of each type of FP block Bus Width
Number of Input Buses Number of Output Buses Number of Feedback Paths

9 Measure Quality of Results
Modeling Methodology Need to measure how “good” FPFPGA is Use empirical measurement method FPFPGA Benchmark Circuit Commercial CAD FLow Measure Quality of Results Very Nice! Commercial tools are unaware of FPFPGA , authors introduce “VEB” as solution

10 Virtual Embedded Block (VEB) Flow
Manually map benchmark circuit into CGU Soft logic Put VEB representing CGU into commercial CAD tool Compile Gather area and timing measurements

11 VEB Create standard cell ASIC CGU and get area/timing numbers
Implement area and timing of ASIC CGU using soft logic of commercial FPGA (different functionality, similar silicon timing, area, and pin demand) Assumes all internal paths == critical path to simplify timing of soft logic implementation

12 VEB

13 VEB Details Model delay with carry-chains
Model area with shift registers Use LUT inputs and outputs for pin demand Note: Area and delay models use independent resources

14 VEB Placement Challenge
Hard block locations are fixed on an FPGA Commercials tools can’t do that for VEB since it’s just a group of clustered soft logic constrained to be placed in a particular relative distance from each other Solution: Let commercial tools place VEB anywhere Then manually place VEB to fixed locations

15 VEB Quality 11% delay error when modeling embedded multiplier (non-fp to compare with existing multiplier) Area is accurate (no number given) Important repeatability hint: Must determine timing post-bitstream because of significant false paths (most CGUs do not use the longest path and this is detected post-bitstream)

16 Benchmarks 32-bit single-precision floating-point 8 benchmarks
5 Core computation blocks 1 application 2 synthetic

17 Experimental Settings
Xilinx Virtex 2: XC2V FF1152 16 CGUs each implemented as a VEB Each CGU takes up 122 Logic Cells 2 FP multipliers, 2 FP adders, 5 wordblocks In the order: W M A W W M A W W 4 input buses 3 output buses 3 feedback registers

18 Results Average area reduced by 25x Average delay reduced by
3.6x for single precision 4.3x for double precision Results are comparable to Kuon FPGA vs ASIC measurements Critical path of all circuits is in FPU

19 Reason for Good Results
Removed reconfiguration bits (area reduction) Efficient directional routing Embedded FP operators

20 Contributions Exploration of FPGA architectures with embedded floating-point cores VEB methodology to leverage commercial tools to explore new embedded hard blocks even when commercial tools are unaware of those new hard blocks

21 Weaknesses Significant amounts of speculation
Try to claim scope for stuff that should be in future work Especially weak was the paper’s analysis of a FPFPGA compiler which is outside of scope and should be listed as such

22 My 2 Cents Primary advantage of FPFPGA vs GPU in the floating-point high computation domain is low latency Several applications demand very low latency and very high computational power Plant monitoring of high-speed reactions Financial automatic buy-sell algorithms Secondary advantage is energy consumed to perform the same computations.

23 My 2 Cents Comparison unfair
Most FPGA designers would convert floating- point to fixed point and not leave it as floating- point Double precision fp add requires 701 slices Fixed point add 64 LUTs == 16 slices Critical path is in FPU suggests benchmark circuits are unusually geared to use FPU cores and this is admitted by the authors


Download ppt "Floating-Point FPGA (FPFPGA)"

Similar presentations


Ads by Google