1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.

Slides:



Advertisements
Similar presentations
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
Advertisements

Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
Lecture 2: Field Programmable Gate Arrays I September 5, 2013 ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Evolution of implementation technologies
The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays Steven J.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Yehdhih Ould Mohammed Moctar1 Nithin George2 Hadi Parandeh-Afshar2
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Efficient.
An automatic tool flow for the combined implementation of multi-mode circuits Brahim Al Farisi, Karel Bruneel, João Cardoso, Dirk Stroobandt.
Shashi Kumar 1 Logic Synthesis: Course Introduction Shashi Kumar Embedded System Group Department of Electronics and Computer Engineering Jönköping Univ.
Titan: Large and Complex Benchmarks in Academic CAD
FPGA Switch Block Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Arithmetic Building Blocks
Julien Lamoureux and Steven J.E Wilton ICCAD
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Heterogeneous FPGA architecture and CAD Peter Jamieson Supervisor: Jonathan Rose.
Digital Electronics Lecture 6 Combinational Logic Circuit Design.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
A Reconfigurable Low-power High-Performance Matrix Multiplier Architecture With Borrow Parallel Counters Counters : Rong Lin SUNY at Geneseo
Introduction to FPGAs Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
Section 1  Quickly identify faulty components  Design new, efficient testing methodologies to offset the complexity of FPGA testing as compared to.
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Greg Alkire/Brian Smith 197 MAPLD An Ultra Low Power Reconfigurable Task Processor for Space Brian Smith, Greg Alkire – PicoDyne Inc. Wes Powell.
FPGA CAD 10-MAR-2003.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
Combinational Circuit Design. Digital Circuits Combinational CircuitsSequential Circuits Output is determined by current values of inputs only. Output.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Routing Wire Optimization through Generic Synthesis on FPGA Carry Hadi P. Afshar Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne.
ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
VLSI Design Flow The Y-chart consists of three major domains:
A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer.
Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 61 Lecture 6 Logic Simulation n What is simulation? n Design verification n Circuit modeling n True-value.
Field Programmable Gate Arrays
Placement study at ESA Filomena Decuzzi David Merodio Codinachs
Floating-Point FPGA (FPFPGA)
A New Logic Synthesis, ExorBDS
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
VLSI Testing Lecture 5: Logic Simulation
VLSI Testing Lecture 5: Logic Simulation
Application-Specific Customization of Soft Processor Microarchitecture
Vishwani D. Agrawal Department of ECE, Auburn University
Andy Ye, Jonathan Rose, David Lewis
Reconfigurable Computing
Verilog to Routing CAD Tool Optimization
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
HIGH LEVEL SYNTHESIS.
Off-path Leakage Power Aware Routing for SRAM-based FPGAs
A New Hybrid FPGA with Nanoscale Clusters and CMOS Routing Reza M. P
FIGURE 5-1 MOS Transistor, Symbols, and Switch Models
Application-Specific Customization of Soft Processor Microarchitecture
Computer Architecture
Presentation transcript:

1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer Engineering University of Toronto {yeandy, jayar,

2 Outline Motivation –Datapath regularity An datapath-oriented FPGA –Architecture –CAD flow Experimental results –Area efficiency Conclusion

3 Modern FPGAs Very large logic capacities –Over 10 million equivalent logic gates Increasingly used to implement large and complex applications –Central processing units –Graphics accelerators –Digital signal processors –Packet switching networks

4 Datapath Circuits Large applications –Contain a greater amount of datapath circuits Datapath circuits –Consist of multiple identical logic structures called bit-slices Regularity Predictability

5 An Example Full Adder Full Adder Full Adder Full Adder A0A0 A1A1 A2A2 A3A3 B0B0 B1B1 B2B2 B3B3 C0C0 C1C1 C2C2 C3C3 Carry In Carry Out

6 An Example

7 Research Goal Design a new FPGA architecture –Utilize datapath regularity Reduce the implementation area of datapath circuits on FPGAs Implement a full set of CAD tools for the new architecture –Synthesis –Packing –Placement –Routing

8 Key Architectural Features A bus-oriented logic block architecture A mixture of coarse-grain tracks and fine- grain routing tracks

9 Datapath FPGA Overview LL LL S LLogic Block Coarse grain routing tracks Fine grain routing tracks S Switch Block Routing Channels

10 Logic Block — Super-cluster BLE Cluster 4Cluster 3Cluster 2Cluster 1 Local Routing Network BLE A Cluster MUX LUT DFF M A Basic Logic Element (BLE)

11 Datapath FPGA Overview LL LL S LSuper-cluster Coarse grain routing tracks Fine grain routing tracks S Switch Block Routing Channels

12 Coarse-grain Routing Tracks Super-cluster Cluster M Switch Block M M Coarse-grain Routing MMMM Fine-grain Routing

13 CAD flow for the datapath-oriented FPGA consists of –Synthesis –Packing –Placement –Routing Conventional CAD flow –Minimize area and delay metrics –Destroy datapath regularity CAD Flow

14 Datapath-oriented CAD Flow Preserve datapath regularity (bit-sliced structures) Map the preserved regularity onto the datapath-oriented FPGA architecture Maximize the utilization of coarse-grain routing tracks –Minimize the implementation area of datapath structures

15 Datapath Representation Datapath circuits are represent by netlists of datapath components (VHDL or Verilog) Datapath component library –Multiplexers –Adders/subtracters –Shifters –Comparators –Registers Each component consists of identical bit-slices

16 Synthesis Enhanced module compaction algorithm Based on the Synopsys FPGA compiler Augmented with several datapath-oriented features –Preserve datapath regularity by preserving bit- slice boundaries –Achieve as good area results as the conventional synthesis tools

17 An Example Datapath Circuit mux + c1c1 a1a1 b1b1 d1d1 s1s1 + c2c2 a2a2 b2b2 d2d2 s2s2 + c3c3 a3a3 b3b3 d3d3 s3s3 sel mux + c0c0 a0a0 b0b0 d0d0 s0s0 c in c out

18 Synthesis mux c0c0 a0a0 b0b0 d0d0 s0s0 sel c in 4-LUT a0a0 b0b0 c0c0 sel 4-LUT d0d0 s0s0 c in +

19 Synthesis 4-LUT a2a2 b2b2 c2c2 sel 4-LUT d2d2 s2s2 a1a1 b1b1 c1c1 sel 4-LUT d1d1 s1s1 a0a0 b0b0 c0c0 sel 4-LUT d0d0 s0s0 c in 4-LUT a3a3 b3b3 c3c3 sel 4-LUT d3d3 s3s3 c out

20 Packing Based on the T-VPACK packing algorithm Pack adjacent bit-slices into super-clusters Utilize carry connections in super-clusters to minimize the delay of carry chains

21 An Example Four clusters per super-cluster Two BLEs per cluster Six inputs per cluster BLE

22 Packing Into Clusters 4-LUT a0a0 b0b0 c0c0 sel 4-LUT d0d0 s0s0 c in BLE a0a0 b0b0 c0c0 sel d0d0 s0s0 c in BLE

23 Packing Into Super-clusters BLE a0a0 b0b0 c0c0 sela2a2 b2b2 c2c2 a3a3 b3b3 c3c3 d0d0 d1d1 d2d2 d3d3 s0s0 s1s1 s2s2 s3s3 c in c out a1a1 b1b1 c1c1 sel

24 Placement Based on the VPR placer Use simulated annealing algorithm For super-clusters containing datapath circuits –Move super-clusters only For super-clusters containing non- datapath circuits - Move individual clusters

25 Routing Based on the VPR router Use the path finder algorithm As much as possible –Route buses through coarse-grain routing tracks –Route individual signals through fine-grain routing tracks When necessary –Use coarse-grain routing tracks for individual signals –Use fine-grain routing tracks for buses

26 Area Efficiency Benchmarks –15 datapath circuits from the Pico-java processor Architectural assumptions –Four BLEs per cluster –Four clusters per super-cluster –Four coarse-grain tracks sharing configuration memory –Logic track length of two –Disjoint switch block topology Architectural variables –Number of coarse-grain tracks

27 Area Efficiency % 95.0% 90.0% 0%0%- 10% 10%- 20% 20%- 30% 30%- 40% 40%- 50% 50%- 60% 60%- 70% circuit area in minimum transistor area (x10 6 ) normalized circuit area % of coarse- grain tracks

28 Logic Track Length Vs. Area Architectural assumptions –Four clusters per super-cluster –Four coarse-grain tracks share configuration memory –50% of tracks are coarse-grain tracks –Disjoint switch block topology Architectural variables –Number of BLEs per cluster –Logic track length

29 Logic Track Length Vs. Area track length circuit area in minimum transistor area (x10 6 ) N = 2 N = 4 N = 8 N = 10

30 Conclusion Proposed a datapath-oriented FPGA architecture and its CAD tools Best area is achieved when –40% - 50% of tracks are coarse-grain routing tracks –Four BLEs per cluster –Logic track length of two Best area is 9.6% smaller than conventional FPGAs