Floating-Point FPGA (FPFPGA)

Slides:



Advertisements
Similar presentations
Architecture-Specific Packing for Virtex-5 FPGAs
Advertisements

Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
Lecture 7 FPGA technology. 2 Implementation Platform Comparison.
Architectural Improvement for Field Programmable Counter Array: Enabling Efficient Synthesis of Fast Compressor Trees on FPGA Alessandro Cevrero 1,2 Panagiotis.
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
Floating-Point FPGA (FPFPGA) Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009.
Reducing the Pressure on Routing Resources of FPGAs with Generic Logic Chains Hadi P. Afshar Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
EECE579: Digital Design Flows
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Lecture 3 1 ECE 412: Microcomputer Laboratory Lecture 3: Introduction to FPGAs.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
03/12/20101 Analysis of FPGA based Kalman Filter Architectures Arvind Sudarsanam Dissertation Defense 12 March 2010.
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Efficient.
Titan: Large and Complex Benchmarks in Academic CAD
Coarse and Fine Grain Programmable Overlay Architectures for FPGAs
Ch.9 CPLD/FPGA Design TAIST ICTES Program VLSI Design Methodology Hiroaki Kunieda Tokyo Institute of Technology.
Open Discussion of Design Flow Today’s task: Design an ASIC that will drive a TV cell phone Exercise objective: Importance of codesign.
1 Rapid Estimation of Power Consumption for Hybrid FPGAs Chun Hok Ho 1, Philip Leong 2, Wayne Luk 1, Steve Wilton 3 1 Department of Computing, Imperial.
Amalgam: a Reconfigurable Processor for Future Fabrication Processes Nicholas P. Carter University of Illinois at Urbana-Champaign.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.
Implementation of Finite Field Inversion
A Flexible DSP Block to Enhance FGPA Arithmetic Performance
J. Christiansen, CERN - EP/MIC
Heterogeneous FPGA architecture and CAD Peter Jamieson Supervisor: Jonathan Rose.
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Introduction to FPGAs Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
EE3A1 Computer Hardware and Digital Design
A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead Heng Tan and Ronald F. DeMara University of Central Florida.
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010.
A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Routing Wire Optimization through Generic Synthesis on FPGA Carry Hadi P. Afshar Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne.
© PSU Variation Aware Placement in FPGAs Suresh Srinivasan and Vijaykrishnan Narayanan Pennsylvania State University, University Park.
Architecture and algorithm for synthesizable embedded programmable logic core Noha Kafafi, Kimberly Bozman, Steven J. E. Wilton 2003 Field programmable.
ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.
Enhancing the Area-Efficiency of FPGAs with Hard Blocks Using Shadow Clusters Peter Jamieson and Jonathan Rose.
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
Reconfigurable Computing - Performance Issues John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Topics Coarse-grained FPGAs. Reconfigurable systems.
Reconfigurable Architectures
A New Logic Synthesis, ExorBDS
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Application-Specific Customization of Soft Processor Microarchitecture
UNIVERSITY OF MASSACHUSETTS Dept
Instructor: Dr. Phillip Jones
Xilinx ChipScope Pro Overview
Electronics for Physicists
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Andy Ye, Jonathan Rose, David Lewis
Incremental Placement Algorithm for Field Programmable Gate Arrays
A Novel FPGA Logic Block for Improved Arithmetic Performance
Basic Adders and Counters Implementation of Adders
HIGH LEVEL SYNTHESIS.
Electronics for Physicists
Measuring the Gap between FPGAs and ASICs
Application-Specific Customization of Soft Processor Microarchitecture
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Floating-Point FPGA (FPFPGA) Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009

Motivation Goal: Build faster, cheaper, lower power FPGAs How? Fixed-Functionality (hard) blocks! FPGA reconfigurability comes at the price of area, delay, and power Some reconfigurability is unnecessary, remove it for savings

What to Make Hard? What hard blocks to use? If not used, block is wasted Industry suggests including memories and multipliers Paper suggests adding floating-point units (FPU) Given a hard block, how fractured should it be? Eg. Stratix III FPGA multipliers can be configured in a set of four 18x18 multipliers or one 36x36 multiplier How fractured should the FPU be?

Introducing FPFPGA Contains soft and hard blocks CGU characteristics: Soft blocks are composed of standard LUTs, FFs Hard blocks are FPUs called Coarse-grained units (CGU) CGU characteristics: Floating-point (FP) adds and multiplies only Bus-based LUT operations using “wordblock” Dedicated output registers Accessible to soft blocks and vice-versa

Architecture of FPFPGA

FGU

CGU

CGU parameters # of each type of FP block Bus Width Number of Input Buses Number of Output Buses Number of Feedback Paths

Measure Quality of Results Modeling Methodology Need to measure how “good” FPFPGA is Use empirical measurement method FPFPGA Benchmark Circuit Commercial CAD FLow Measure Quality of Results Very Nice! Commercial tools are unaware of FPFPGA , authors introduce “VEB” as solution

Virtual Embedded Block (VEB) Flow Manually map benchmark circuit into CGU Soft logic Put VEB representing CGU into commercial CAD tool Compile Gather area and timing measurements

VEB Create standard cell ASIC CGU and get area/timing numbers Implement area and timing of ASIC CGU using soft logic of commercial FPGA (different functionality, similar silicon timing, area, and pin demand) Assumes all internal paths == critical path to simplify timing of soft logic implementation

VEB

VEB Details Model delay with carry-chains Model area with shift registers Use LUT inputs and outputs for pin demand Note: Area and delay models use independent resources

VEB Placement Challenge Hard block locations are fixed on an FPGA Commercials tools can’t do that for VEB since it’s just a group of clustered soft logic constrained to be placed in a particular relative distance from each other Solution: Let commercial tools place VEB anywhere Then manually place VEB to fixed locations

VEB Quality 11% delay error when modeling embedded multiplier (non-fp to compare with existing multiplier) Area is accurate (no number given) Important repeatability hint: Must determine timing post-bitstream because of significant false paths (most CGUs do not use the longest path and this is detected post-bitstream)

Benchmarks 32-bit single-precision floating-point 8 benchmarks 5 Core computation blocks 1 application 2 synthetic

Experimental Settings Xilinx Virtex 2: XC2V3000-6-FF1152 16 CGUs each implemented as a VEB Each CGU takes up 122 Logic Cells 2 FP multipliers, 2 FP adders, 5 wordblocks In the order: W M A W W M A W W 4 input buses 3 output buses 3 feedback registers

Results Average area reduced by 25x Average delay reduced by 3.6x for single precision 4.3x for double precision Results are comparable to Kuon FPGA vs ASIC measurements Critical path of all circuits is in FPU

Reason for Good Results Removed reconfiguration bits (area reduction) Efficient directional routing Embedded FP operators

Contributions Exploration of FPGA architectures with embedded floating-point cores VEB methodology to leverage commercial tools to explore new embedded hard blocks even when commercial tools are unaware of those new hard blocks

Weaknesses Significant amounts of speculation Try to claim scope for stuff that should be in future work Especially weak was the paper’s analysis of a FPFPGA compiler which is outside of scope and should be listed as such

My 2 Cents Primary advantage of FPFPGA vs GPU in the floating-point high computation domain is low latency Several applications demand very low latency and very high computational power Plant monitoring of high-speed reactions Financial automatic buy-sell algorithms Secondary advantage is energy consumed to perform the same computations.

My 2 Cents Comparison unfair Most FPGA designers would convert floating- point to fixed point and not leave it as floating- point Double precision fp add requires 701 slices Fixed point add 64 LUTs == 16 slices Critical path is in FPU suggests benchmark circuits are unusually geared to use FPU cores and this is admitted by the authors