CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

ECE 506 Reconfigurable Computing Lecture 6 Clustering Ali Akoglu.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
Chapter 10. Memory, CPLDs, and FPGAs
Lecture 2: Field Programmable Gate Arrays I September 5, 2013 ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Lecture 3: Field Programmable Gate Arrays II September 10, 2013 ECE 636 Reconfigurable Computing Lecture 3 Field Programmable Gate Arrays II.
Build-In Self-Test of FPGA Interconnect Delay Faults Laboratory for Reliable Computing (LaRC) Electrical Engineering Department National Tsing Hua University.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
CS294-6 Reconfigurable Computing Day 2 August 27, 1998 FPGA Introduction.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 33: Array Subsystems (PLAs/FPGAs) Prof. Sherief Reda Division of Engineering,
CS294-6 Reconfigurable Computing Day 14 October 7/8, 1998 Computing with Lookup Tables.
1. 2 FPGAs Historically, FPGA architectures and companies began around the same time as CPLDs FPGAs are closer to “programmable ASICs” -- large emphasis.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
GPGPU platforms GP - General Purpose computation using GPU
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #3 – FPGA.
Figure to-1 Multiplexer and Switch Analog
EGRE 427 Advanced Digital Design Figures from Application-Specific Integrated Circuits, Michael John Sebastian Smith, Addison Wesley, 1997 Chapter 7 Programmable.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
FPGA and CADs Presented by Peng Du & Xiaojun Bao.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n Circuit design for FPGAs: –Logic elements. –Interconnect.
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
ECE 465 Introduction to CPLDs and FPGAs Shantanu Dutt ECE Dept. University of Illinois at Chicago Acknowledgement: Extracted from lecture notes of Dr.
CBSSS 2002: DeHon Costs André DeHon Wednesday, June 19, 2002.
J. Christiansen, CERN - EP/MIC
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Programmable Logic Devices
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Introduction to FPGAs Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
EE3A1 Computer Hardware and Digital Design
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
M.Mohajjel. Why? TTM (Time-to-market) Prototyping Reconfigurable and Custom Computing 2Digital System Design.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
Henry Selvaraj 1 Henry Selvaraj; Henry Selvaraj; Henry Selvaraj; Logic Synthesis EEG 707 Dr Henry Selvaraj Department of Electrical and Computer Engineering.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #12 – Systolic.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #13 – Other.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #23 – Function.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
CBSSS 2002: DeHon Universal Programming Organization André DeHon Tuesday, June 18, 2002.
ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #22 – Multi-Context.
Field Programmable Gate Arrays
Sequential Programmable Devices
ESE534: Computer Organization
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Give qualifications of instructors: DAP
XILINX FPGAs Xilinx lunched first commercial FPGA XC2000 in 1985
ECE 4110– 5110 Digital System Design
CS184a: Computer Architecture (Structure and Organization)
From Silicon to Microelectronics Yahya Lakys EE & CE 200 Fall 2014
CprE / ComS 583 Reconfigurable Computing
We will be studying the architecture of XC3000.
CprE / ComS 583 Reconfigurable Computing
XC4000E Series Xilinx XC4000 Series Architecture 8/98
Topics Circuit design for FPGAs: Logic elements. Interconnect.
ESE534: Computer Organization
CprE / ComS 583 Reconfigurable Computing
Give qualifications of instructors: DAP
CprE / ComS 583 Reconfigurable Computing
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA Technology Mapping

Lect-04.2CprE 583 – Reconfigurable ComputingAugust 30, 2007 Quick Points Lectures are viewable for students via WebCT Quality is higher Use discussion forums Class list created: Less focus on interconnect theory More on interconnects in actual devices Read [AggLew94], [ChaWon96A], [Deh96A] for more details

Lect-04.3CprE 583 – Reconfigurable ComputingAugust 30, 2007 Recap Various FPGA programming technologies (Anti-fuse, (E)EPROM, Flash, SRAM): SRAM most popular

Lect-04.4CprE 583 – Reconfigurable ComputingAugust 30, 2007 LUTs and Digital Logic k inputs  2 k possible input values k-LUT corresponds to 2 k x 1 bit memory Truth table is stored 2 2 k possible functions – O(2 2 k / k!) unique F = A0A1A2 + Ā0A1Ā2 + Ā0 Ā1 Ā A 0 A 1 A

Lect-04.5CprE 583 – Reconfigurable ComputingAugust 30, 2007 Outline Recap General Routing Architectures FPGA Architectural Issues Early Commercial FPGAs Xilinx XC3000 Xilinx XC4000 Technology Mapping using LUTs

Lect-04.6CprE 583 – Reconfigurable ComputingAugust 30, 2007 General Routing Architecture A wire segment is a wire unbroken by programmable switches A track is a sequence of one or more wire segments in a line A routing channel is a group of parallel tracks A connection block provides connectivity from the inputs and outputs of a logic block to the wire segments in the channels A switch block is a block which provides connectivity between the horizontal and vertical wire segments on all four of its sides

Lect-04.7CprE 583 – Reconfigurable ComputingAugust 30, 2007 Switch Boxes F s – connections offered per incoming wire Universal switchbox can connect any set of inputs to their target output channels simultaneously Build-able with F s = 3 Xilinx XC4000 switchbox is F s = 3 but not universal Read [ChaWon96A] for more details

Lect-04.8CprE 583 – Reconfigurable ComputingAugust 30, 2007 Architectural Issues [AhmRos04A] What values of N, I, and K minimize the following parameters? Area Delay Area-delay product Assumptions All routing wires length 4 Fully populated IMUX Wiring is half pass transistor, half tri-state

Lect-04.9CprE 583 – Reconfigurable ComputingAugust 30, 2007 Number of Inputs per Cluster Lots of opportunities for input sharing in large clusters [BetRos97A] Reducing inputs reduces the size of the device and makes it faster Most FPGA devices (Xilinx) have 4 BLE per cluster with more inputs than actually needed

Lect-04.10CprE 583 – Reconfigurable ComputingAugust 30, 2007 Logic Cluster Size Small block cluster more efficient Includes area needed for routing Smallest clusters (e.g. one BLE per cluster) not “CAD friendly” Most commercial devices have 4-8 BLEs per cluster

Lect-04.11CprE 583 – Reconfigurable ComputingAugust 30, 2007 Effect of N and K on Area Cluster size of N = [6-8] is good, K = [4-5]

Lect-04.12CprE 583 – Reconfigurable ComputingAugust 30, 2007 Effect of N and K on Performance Inconclusive: Big K and N > 3 value looks good

Lect-04.13CprE 583 – Reconfigurable ComputingAugust 30, 2007 Effect of N and K on Area-Delay K = 4-6, N= 4-10 looks OK

Lect-04.14CprE 583 – Reconfigurable ComputingAugust 30, 2007 Putting it All Together Area: LUT count decreases with k (slower than exponential) LUT size increases with k (exponential logic area, ~linear interconnect area) Delay: LUT depth decreases with k (logarithmic) LUT delay increases with k (linear) Examples: Xilinx XC3000 family F s = 3 I = 5 N = 2 Xilinx XC4000 family F s = 3 I = 9 N ~ 2.5

Lect-04.15CprE 583 – Reconfigurable ComputingAugust 30, 2007 XC3000 Logic Block 5-LUT, or two 4-LUTs

Lect-04.16CprE 583 – Reconfigurable ComputingAugust 30, 2007 XC4000 Logic Block

Lect-04.17CprE 583 – Reconfigurable ComputingAugust 30, 2007 XC4000 Routing Structure

Lect-04.18CprE 583 – Reconfigurable ComputingAugust 30, 2007 XC4000 Routing Structure (cont.)

Lect-04.19CprE 583 – Reconfigurable ComputingAugust 30, 2007 LUT Computational Limits k-LUT can implement 2 2 k functions Given n such k-LUTs, can implement (2 2 k ) n Since 4-LUTs are efficient, want to find n such that (2 2 4 ) n >= 2 2 M Example – implementing a 7-LUT with 4-LUTs: 4-LUT A 0 –A 3 A4A4 A5A5 A6A6 A6A6

Lect-04.20CprE 583 – Reconfigurable ComputingAugust 30, 2007 LUT Computational Limits (cont.) How much computation can be performed in a table lookup? Upper bound (from previous) – n <= 2 M-3 Need n 4-LUTs to cover a M-LUT: (2 2 4 ) n >= 2 2 M nlog(2 2 4 ) >= log(2 2 M ) n2 4 log(2) >= 2 M log(2) n2 4 >= 2 M n >= 2 M-4 Adding upper bound – 2 M-4 <= n <= 2 M-3

Lect-04.21CprE 583 – Reconfigurable ComputingAugust 30, 2007 LUTs Versus Memories Can also implement (2 2 k ) w as a single large memory with k inputs and w outputs Large memory advantage – no need for interconnect and only one input decoder required Consider a 32K x 8bit memory (170M λ 2, 21ns latency) w = 8 k = 16 (or 2 8-bit inputs to address 2 16 locations) Can implement an 8-bit addition or subtraction Xilinx XC3042 – LUTs (180M λ 2, 13ns CLB delay) 15-bit parity calculation: 5 4-LUTs (<2% of XC4032) – 3.125M λ 2 ) Entire SRAM – 170M λ 2 7-bit addition: 14 4-LUTs (<5% of XC4032) – 8.75M λ 2 ) Entire SRAM – 170M λ 2

Lect-04.22CprE 583 – Reconfigurable ComputingAugust 30, 2007 LUT Technology Mapping Task: map netlist to LUTs, minimizing area and/or delay Similar to technology mapping for traditional designs Library approach not feasible – O(2 2 k / k!) elements in library In general it is NP-hard

Lect-04.23CprE 583 – Reconfigurable ComputingAugust 30, 2007 Area vs. Delay Mapping

Lect-04.24CprE 583 – Reconfigurable ComputingAugust 30, 2007 Decomposition

Lect-04.25CprE 583 – Reconfigurable ComputingAugust 30, 2007 Why Replicate?

Lect-04.26CprE 583 – Reconfigurable ComputingAugust 30, 2007 Reconvergence

Lect-04.27CprE 583 – Reconfigurable ComputingAugust 30, 2007 Dynamic Programming

Lect-04.28CprE 583 – Reconfigurable ComputingAugust 30, 2007 Summary FPGA design issues involve number of logic blocks per cluster, number of inputs per logic block, routing architecture, and k-LUT size Can build M-LUT with n k-LUTs where 2 M-3 <= n <= 2 M-4 Large LUTs generally inefficient Technology mapping is simplified because of 4- LUT properties Techniques – decomposition, replication, reconvergence, dynamic programming Area- or delay-optimal mapping still NP hard