1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.

Slides:

Advertisements

Similar presentations

TOPIC : SYNTHESIS DESIGN FLOW Module 4.3 Verilog Synthesis.

Advertisements

Spartan-3 FPGA HDL Coding Techniques

Simulation of Fracturable LUTs

Architecture-Specific Packing for Virtex-5 FPGAs

Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)

Altera FLEX 10K technology in Real Time Application.

Architectural Improvement for Field Programmable Counter Array: Enabling Efficient Synthesis of Fast Compressor Trees on FPGA Alessandro Cevrero 1,2 Panagiotis.

A Survey of Logic Block Architectures For Digital Signal Processing Applications.

Graduate Computer Architecture I Lecture 16: FPGA Design.

Reducing the Pressure on Routing Resources of FPGAs with Generic Logic Chains Hadi P. Afshar Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne.

FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.

Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.

Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi.

ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.

Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.

Lecture 3: Field Programmable Gate Arrays II September 10, 2013 ECE 636 Reconfigurable Computing Lecture 3 Field Programmable Gate Arrays II.

Configurable System-on-Chip: Xilinx EDK

VHDL Synthesis in FPGA By Zhonghai Shi February 24, 1998 School of EECS, Ohio University.

Evolution of implementation technologies

1 Chapter 7 Design Implementation. 2 Overview 3 Main Steps of an FPGA Design ’ s Implementation Design architecture Defining the structure, interface.

CS 151 Digital Systems Design Lecture 38 Programmable Logic.

Yehdhih Ould Mohammed Moctar1 Nithin George2 Hadi Parandeh-Afshar2

Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.

FPGA-Based System Design: Chapter 4 Copyright  2004 Prentice Hall PTR HDL coding n Synthesis vs. simulation semantics n Syntax-directed translation n.

Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Efficient.

An automatic tool flow for the combined implementation of multi-mode circuits Brahim Al Farisi, Karel Bruneel, João Cardoso, Dirk Stroobandt.

Power Reduction for FPGA using Multiple Vdd/Vth

Shashi Kumar 1 Logic Synthesis: Course Introduction Shashi Kumar Embedded System Group Department of Electronics and Computer Engineering Jönköping Univ.

Titan: Large and Complex Benchmarks in Academic CAD

CAD for Physical Design of VLSI Circuits

LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.

Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.

Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.

Automated Design of Custom Architecture Tulika Mitra

ICCD Conversion Driven Design of Binary to Mixed Radix Circuits Ashur Rafiev, Julian Murphy, Danil Sokolov, Alex Yakovlev School of EECE, Newcastle.

Julien Lamoureux and Steven J.E Wilton ICCAD

Implementation of Finite Field Inversion

CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.

Heterogeneous FPGA architecture and CAD Peter Jamieson Supervisor: Jonathan Rose.

Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.

ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.

TOPIC : SYNTHESIS INTRODUCTION Module 4.3 : Synthesis.

EE 466/586 VLSI Design Partha Pande School of EECS Washington State University

CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.

1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.

Topics Architecture of FPGA: Logic elements. Interconnect. Pins.

Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation

Lecture 6: Mapping to Embedded Memory and PLAs September 27, 2004 ECE 697F Reconfigurable Computing Lecture 6 Mapping to Embedded Memory and PLAs.

Greg Alkire/Brian Smith 197 MAPLD An Ultra Low Power Reconfigurable Task Processor for Space Brian Smith, Greg Alkire – PicoDyne Inc. Wes Powell.

FPGA CAD 10-MAR-2003.

In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.

1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.

FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

Routing Wire Optimization through Generic Synthesis on FPGA Carry Hadi P. Afshar Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne.

ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.

Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.

Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.

1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.

Placement study at ESA Filomena Decuzzi David Merodio Codinachs

Floating-Point FPGA (FPFPGA)

A New Logic Synthesis, ExorBDS

Application-Specific Customization of Soft Processor Microarchitecture

Exploiting Fast Carry Chains of FPGAs for Designing Compressor Trees

Andy Ye, Jonathan Rose, David Lewis

Topics HDL coding for synthesis. Verilog. VHDL..

Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.

Give qualifications of instructors: DAP

Measuring the Gap between FPGAs and ASICs

Application-Specific Customization of Soft Processor Microarchitecture

Presentation transcript:

1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto {yeandy, lewis,

2 Motivation: Datapath Regularity Larger FPGAs –Larger applications on FPGAs –More datapath logic in larger applications –Datapath logic is highly regular Utilize regularity to improve logic density

3 Utilizing Datapath Regularity A new datapath-oriented FPGA New CAD tools supporting the new FPGA –Synthesis –Packing –Placement –Routing This talk focuses on synthesis

4 Background: Datapath-oriented FPGA Architected to utilize datapath regularity Architectural features –Capture regularity using special logic blocks –Increase logic density by coarse grain routing

5 Background: FPGA Overview LL LL S LLogic cluster Coarse grain routing tracks Fine grain routing tracks S Switch box Routing Channels

6 Background: Logic Cluster BLE Subcluster 1Subcluster 2Subcluster 3Subcluster 4 Local Routing Network BLE A Subcluster MUX LUT DFF M A Basic Logic Element (BLE)

7 Background: FPGA Overview LL LL S LLogic cluster Coarse grain routing tracks Fine grain routing tracks S Switch box Routing Channels

8 Background: Coarse Grain Routing Tracks Logic Cluster Sub- cluster Sub- cluster Sub- Cluster Sub- cluster M Switch Box M M Coarse Grain Routing MMMM Fine Grain Routing

9 Datapath Synthesis Synthesis –The first step in a fully automated CAD flow –Transforms high level descriptions into logic Conventional synthesis (flat synthesis) –Minimizes area and delay metrics –Destroys datapath regularity Datapath synthesis –Preserves datapath regularity –Supports downstream CAD tools

10 Datapath Representation Datapath circuits are represent by netlists of datapath components (VHDL or Verilog) Datapath component library –Multiplexers –Adders/subtracters –Shifters –Comparators –Registers Each component consists of identical bit-slices

11 Hard Boundary Hierarchical Synthesis Optimize within the boundaries of bit-slices Keep identical bit-slices identical Optimized 15 datapath circuits from Pico- java processor using Synopsys [sun] –Good regularity –Bad area - 38% area inflation FPGA architecture – increase logic density –Need a better synthesis tool

12 Causes of Area Inflation Examined circuits to determine the causes Constraint of preserving bit-slice boundaries –Common sub-expressions exist across bit-slices –Harder to discover in datapath synthesis Constraint of preserving datapath regularity –Identical bit-slices have different external connections –Some bit-slices have more optimization opportunities –Missing optimization opportunities if one has to keeping all bit-slices identical

13 Enhanced Module Compaction Netlist of Datapath Components Word-level Optimization Module Compaction Bit-slice Netlist I/O Optimization Flat Synthesis & Optimization Within Bit-slice Boundaries Manual Operation Netlist of Synthesized Bit-slices

14 Word-level Optimization Done manually and will be automated Optimizes across bit-slice boundaries Uses the functionality of each datapath component to create optimization opportunities Two are performed –Multiplexer tree collapsing –Operation reordering More in the future

15 Multiplexer Tree Collapsing Datapath circuits contain multiplexers in a tree topology Collapses several multiplexers in a multiplexer tree into a single multiplexer Collapsing operation creates common sub- expressions Extracts common expressions out of multiple bit-slices to save area

16 An Example FF S1 S2 R A FF A rl S1 S2 rl – random logic mux1 mux2

17 Operation Reordering Transforms result selection into operand selection Accepts the transformation if resulting in smaller area

18 An Example mux ++ abcd s e + acbd e s sumcarrysumcarry a0 b0 cin0a c0 d0 cin0b cout0a cout0b s0 e0 sumcarry e0 cout0 cin0 a0 c0 b0 d0 s0

19 Module Compaction Merges bit-slices into larger bit-slices Based on connectivity between datapath components Larger bit-slices have more optimization opportunities for flat synthesis Avoids merging based on carry chains Similar to the algorithm proposed by Koch

20 An Example mux0mux1mux2mux3 FA0FA1FA2FA3FA4

21 Bit-slice I/O Optimization Granularity of bit-slice I/O optimization, m Breaks datapath components into m-bit wide chunks m bit-slices are kept identical to each other Allows some bit-slices in a datapath component to be optimized more than others

22 Bit-slice I/O Optimization Converts bit-slice I/O signals into internal signals if all m bit-slices meet an optimization criteria More optimization opportunities for flat synthesis Four types of I/O optimizations –Constant absorption –Feedback absorption –Duplicated input absorption –Unused output absorption

23 Experimental Results Fifteen benchmark circuits –From the Pico-java processor –Synthesized into 4-LUTs and DFFs Experiments –Area –Regularity –Area against m (the granularity of bit-slice I/O optimization)

24 Area m (granularity of bit-slice I/O optimization) = 4 Compare datapath synthesis with flat synthesis

25 Post-synthesis Area (LUT Count) Flat Synthesis Area Datapath Synthesis AreaInflation icu_dpath % ex_dpath % multmod_dp % ucode_dat % imdr_dpath % dcu_dpath % mantissa_dp % incmod_dp % smu_dpath % exponent_dp % pipe_dpath % prils_dp % rsadd_dp % code_seq_dp % ucode_reg % Total Area %

26 Regularity m (granularity of bit-slice I/O optimization) = 4 Two terminal connections captured by –4-bit wide buses –4-bit wide control groups

27 Regularity A 4-bit wide bus S1S2S3S4 S1S2S3S4 S1S2S3S4 A 4-bit wide control group

28 Regularity Results Two Terminal Connections 4-bit Wide Buses4-bit Wide Control groups dcu_dpath223249%43% ex_dpath654752%39% icu_dpath804747%36% imdr_dpath310050%36% pipe_dpath104948%42% smu_dpath116748%25% ucode_data314352%41% ucode_reg19472%21% code_seq_dp79958%18% exponent_dp136232%23% incmod_dp201342%33% mantissa_dp253347%36% multmod_dp338039%25% prils_dp86441%32% rsadd_dp72252%27% Total %35% 94% of LUTs remain in regular datapath components

29 Granularity (m) Vs. Area Higher m (the granularity of bit-slice I/O optimization) –Keeps more bit-slices identical –Preserves more regularity –Higher area cost

30 Granularity Vs. Area Inflation

31 Conclusion Presented a datapath-oriented FPGA architecture Presented an enhanced module compaction algorithm Empirically demonstrated the area efficiency of the algorithm –3%-8% area inflation Good regularity –48% two terminal connections are in 4-bit wide buses –35% two terminal connections are in 4-bit wide control groups