Click to edit Master title style Literature Review Measuring the Gap Between FPGAs and ASICs Ian Kuon, Jonathan Rose University of Toronto IEEE TCAD/ICAS.

Slides:



Advertisements
Similar presentations
Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis
Advertisements

1 Cleared for Open Publication July 30, S-2144 P148/MAPLD 2004 Rea MAPLD 148:"Is Scaling the Correct Approach for Radiation Hardened Conversions.
June 6, Using Negative Edge Triggered FFs to Reduce Glitching Power in FPGA Circuits Tomasz S. Czajkowski and Stephen D. Brown Department of Electrical.
Architectural Improvement for Field Programmable Counter Array: Enabling Efficient Synthesis of Fast Compressor Trees on FPGA Alessandro Cevrero 1,2 Panagiotis.
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
Leakage and Dynamic Glitch Power Minimization Using MIP for V th Assignment and Path Balancing Yuanlin Lu and Vishwani D. Agrawal Auburn University ECE.
Floating-Point FPGA (FPFPGA) Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009.
Reducing the Pressure on Routing Resources of FPGAs with Generic Logic Chains Hadi P. Afshar Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Spring 08, Jan 15 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
1 Variability Characterization in FPGAs Brendan Hargreaves 10/05/2006.
Architectural-Level Prediction of Interconnect Wirelength and Fanout Kwangok Jeong, Andrew B. Kahng and Kambiz Samadi UCSD VLSI CAD Laboratory
Power Modeling and Architecture Evaluation for FPGA with Novel Circuits for Vdd Programmability Yan Lin, Fei Li and Lei He EE Department, UCLA
Programmable logic and FPGA
Logic Synthesis for Programmable Devices Onur Bay & Debatosh Debnath
ASIC vs. FPGA – A Comparisson Hardware-Software Codesign Voin Legourski.
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
© 2005 Altera Corporation © 2006 Altera Corporation Placement and Timing for FPGAs Considering Variations Yan Lin 1, Mike Hutton 2 and Lei He 1 1 EE Department,
Architecture and Synthesis for Power-Efficient FPGAs Jason Cong University of California, Los Angeles Partially supported by NSF Grants.
The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays Steven J.
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
Dynamic Power Consumption In Large FPGAs WILLIAM GARCIA, ANDREW MORTELLARO.
Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Area & Power Analysis Comparison Against P2P/Buses 4 4.
Octavo: An FPGA-Centric Processor Architecture Charles Eric LaForest J. Gregory Steffan ECE, University of Toronto FPGA 2012, February 24.
Robust Low Power VLSI R obust L ow P ower VLSI Finding the Optimal Switch Box Topology for an FPGA Interconnect Seyi Ayorinde Pooja Paul Chaudhury.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
156 / MAPLD 2005 Rollins 1 Reducing Energy in FPGA Multipliers Through Glitch Reduction Nathan Rollins and Michael J. Wirthlin Department of Electrical.
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Efficient.
Power Reduction for FPGA using Multiple Vdd/Vth
Titan: Large and Complex Benchmarks in Academic CAD
FPGA Switch Block Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Coarse and Fine Grain Programmable Overlay Architectures for FPGAs
Steve Poret RCS – ENG 6530 June 10, [1] Measuring the Gap between FPGAs and ASICs  Ian Kuon and Jonathan Rose  The Edward S. Rogers Sr. Department.
Ch.9 CPLD/FPGA Design TAIST ICTES Program VLSI Design Methodology Hiroaki Kunieda Tokyo Institute of Technology.
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
1 Rapid Estimation of Power Consumption for Hybrid FPGAs Chun Hok Ho 1, Philip Leong 2, Wayne Luk 1, Steve Wilton 3 1 Department of Computing, Imperial.
A Flexible DSP Block to Enhance FGPA Arithmetic Performance
Using Cycle Efficiency as a System Designer Metric to Characterize an Embedded DSP and Compare Hard Core vs. Soft Core Advisor Dr. Vishwani D. Agrawal.
J. Christiansen, CERN - EP/MIC
Heterogeneous FPGA architecture and CAD Peter Jamieson Supervisor: Jonathan Rose.
Power-Aware RAM Processing for FPGAs December 9, 2005 Power-aware RAM Processing for FPGA Embedded Memory Blocks Russell Tessier University of Massachusetts.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
1 Carnegie Mellon University Center for Silicon System Implementation An Architectural Exploration of Via Patterned Gate Arrays Chetan Patel, Anthony Cozzie,
© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
An Improved “Soft” eFPGA Design and Implementation Strategy
March 28, Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate University of Toronto Supervisor: Professor Stephen D.
A Design Flow for Optimal Circuit Design Using Resource and Timing Estimation Farnaz Gharibian and Kenneth B. Kent {f.gharibian, unb.ca Faculty.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Routing Wire Optimization through Generic Synthesis on FPGA Carry Hadi P. Afshar Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne.
Click to edit Master title style Progress Update Energy-Performance Characterization of CMOS/MTJ Hybrid Circuits Fengbo Ren 05/28/2010.
Architecture and algorithm for synthesizable embedded programmable logic core Noha Kafafi, Kimberly Bozman, Steven J. E. Wilton 2003 Field programmable.
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.
-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang.
Floating-Point FPGA (FPFPGA)
ELEC 7770 Advanced VLSI Design Spring 2016 Introduction
Evaluating Register File Size
ELEC 7770 Advanced VLSI Design Spring 2014 Introduction
ELEC 7770 Advanced VLSI Design Spring 2012 Introduction
ELEC 7770 Advanced VLSI Design Spring 2010 Introduction
Circuit Design Techniques for Low Power DSPs
A High Performance SoC: PkunityTM
FPGA Glitch Power Analysis and Reduction
Measuring the Gap between FPGAs and ASICs
Presentation transcript:

Click to edit Master title style Literature Review Measuring the Gap Between FPGAs and ASICs Ian Kuon, Jonathan Rose University of Toronto IEEE TCAD/ICAS Feburary 2007 Henry Chen February 26, 2010

Introduction  Trade-offs between FPGAs and standard-cell ASICs –Decreased NRE, design time –Increased silicon area, power; decreased performance  FPGA inefficiencies known and accepted, but largely un-quantified

Previous Comparisons  Jones et al. (1986): MPGAs to standard cells –1.5  2.6x area, ~1.1x delay –Estimates based on only 5 circuits  Brown et al. (1992): FPGAs to MPGAs –8  12x area, ~3x delay –Optimistic FPGA gate counting? –Anecdotal evidence –Doesn’t consider “hard” macros (multipliers, memories)  Combine for FPGAs to standard cells –12  38x area, ~3.4x delay –Dated; based on (questionable?) extractions

Previous Comparisons (2000’s)  Zuchowski et al. (2002): LUT to ASIC gate (0.25μm  90nm) –~ 1 / 45 gate density, 12  14x delay, ~500x dynamic power –Unexplained process-dependent density/power variation –Dependent on gates implemented per LUT  Wilton et al. (2005): Partial programmable replacement –88x area, 2x delay –Single logic module  Compton & Hauck (2007): FPGA apps. to standard-cell –Avg 7.2x area –Scaled FPGA 0.15μm to 0.18μm standard-cell

Methodology  Implement in both FPGA and standard-cell –Altera Stratix II FPGA: TSMC 90nm multi-V t, 1.2V –Standard-cell: ST CMOS090 90nm, dual-V t, 1.2V  Empirical results from 23 benchmarks –Rejected if different synthesis tools resulted in >5% register count deviation –Mix of logic, memory, DSP  Analyze gains from FPGA’s DSP and memory blocks  Exclude I/Os  Have device data from Altera

Implementations  FPGA –Altera-provided CAD flow –Speed/area balanced optimization; optimize critical paths performance, otherwise optimize area –Automatic DSP, memory block inference –Set to mimic effects of high resource utilization  ASIC –Synopsys/Cadence synthesis/PAR flow –Free to choose from high/standard-V t cells –Timing-driven placement; target 75  85% utilization –Emphasized performance in compiled memories

Area Comparison  ASIC –Post PAR’d core area –Include memory macros  FPGA –Count only silicon area for used resources –Include surrounding routing resources –Count full block area even if only partially used –Area data from Altera

Area Comparison Results  Logic only: 35x avg (17 ‒ 54x)  Logic + DSP: 25x avg (12 ‒ 58x)  Logic + Memory: 33x avg (19 ‒ 70x)  Logic + Memory + DSP: 18x avg (9.5 ‒ 26x)

Impact of Hard Macros on Area  Smaller area penalty for designs using hard macros –Hard macro close to ASIC implementation (plus programmable interface & routing)

Area Comparison Caveats  Pessimistic FPGA area estimation; count full resource area even if only partially used (~5 ‒ 10% reduction)  ASIC density may decrease for larger designs, while FPGAs are designed to handle large designs

Delay Comparison  Altera Quartus II / Synopsys PrimeTime SI  Static timing analysis to extract max. clock frequency  Compare for different FPGA speed grades –FPGAs are binned for performance –ASICs tend to be designed for worst-case

Delay Comparison Results (Fastest Speed Grade)  Logic only: 3.4x avg (1.9 ‒ 5.0x)  Logic + DSP: 3.5x avg (2.4 ‒ 4.7x)  Logic + Memory: 3.5x avg (2.8 ‒ 4.3x)  Logic + Memory + DSP: 3.0x avg (2.6 ‒ 3.5x)

Delay Comparison Results (Slowest Speed Grade)  Logic only: 4.6x avg (2.5 ‒ 6.7x)  Logic + DSP: 4.6x avg (3.0 ‒ 6.3x)  Logic + Memory: 4.8x avg (3.8 ‒ 5.7x)  Logic + Memory + DSP: 4.1x avg (3.8 ‒ 4.7x)

Impact of Hard Macros on Delay  Almost no benefit—sometimes penalty! –Fixed positions in FPGA; extra routing to use –Fixed architecture; some apps. may not use efficiently

Power Comparison  Altera Quartus II Power Analyzer / Synopsys PrimePower  Compare power, not energy consumption –FPGAs slower; need more time or parallelism –Implement for highest speed possible –Simulate at same operating frequency, voltage  Measure only core power  Assume constant toggle rates for all nets in design –Meaningful test vectors not available for all designs  FPGA static power consumption scaled by used fraction

Power Comparison Results  Logic only: 14x avg (5.7 ‒ 52x)  Logic + DSP: 12x avg (7.5 ‒ 16x)  Logic + Memory: 14x avg (12 ‒ 16x)  Logic + Memory + DSP: 7.1x avg (5.3 ‒ 8.3x)

Impact of Hard Macros on Power  Slight benefit—primarily from area savings? –Less area and interconnect

Power Consumption Caveats  May be disproportionate power in FPGA clock network –“Overdesigned” for tested circuits –Could have small incremental power increase  ASIC clock network would have to grow with designs

Static Power Comparison  Unable to draw useful conclusions about static power –87x for typical silicon, typical temp. (25°C) –5.4x for worst-case silicon, worst-case temp. (85°C)  Had to scale worst-case silicon temp. characterization  Subthreshold leakage is process-dependent –Little information on leakage estimate factors –Different processes from different foundries  Some correlation between static power and area gap (correlation coefficient ~0.8) –Hard macros likely reduced static power penalty

Conclusions  Disparity hard to quantify—very application dependent –Avg. gap gap 3x; gap gap range 1.3 ‒ 9.1x  All-LUT designs avg. 35x area, 3.4 ‒ 4.6x delay, 14x power –119x area, 47.6x power gap for equal performance (assuming ideal parallelization)  Hard macros reduce area and power, but have little performance benefit –Avg. 18x area, 3 ‒ 4.1x delay, 7.1x power –54x area, 21.3x power for equal performance

References  Jones, Jr., H. S., Nagle, P. R., Nguyen, H. T., “A Comparison of Standard Cell and Gate Array Implementations in a Common CAD System”, Proc. IEEE CICC, 1986, pp. 228  232  Brown, S. D., Francis, R., Rose, J., Vranesic, Z., Field-Programmable Gate Arrays, Norwell, MA: Kluwer, 1992  Zuchowski, P. S., Reynolds, C. B., Grupp, R. J., Davis, S. G., Cremen, B., Troxel, B., “A Hybrid ASIC and FPGA Architecture,” Proc. ICCAD, Nov. 2002, pp. 187  194  Wilton, S. J., Kafafi, N., Wu, J. C. H., Bozman, K. A., Aken’Ova, V., Saleh, R., “Design Considerations for Soft Embedded Programmable Logic Cores”, IEEE JSSC, vol 40, no. 2, pp. 485  497, Feb  Compton, K., Hauck, S., “Automatic Design of Area-Efficient Configurable ASIC Cores,” IEEE Trans. Comp., vol 56, no. 5, pp. 662  672, May 2007