Reconfigurable Computing. Lect-02.2 Course Schedule Introduction to Reconfigurable Computing FPGA Technology, Architectures, and Applications FPGA Design.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
CPT 310 Logic and Computer Design Instructor: David LublinerPhone Engineering Technology Dept.Cell
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
EECE579: Digital Design Flows
Introduction to Reconfigurable Computing CS61c sp06 Lecture (5/5/06) Hayden So.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Configurable System-on-Chip: Xilinx EDK
Evolution of implementation technologies
Programmable logic and FPGA
1/31/20081 Logic devices can be classified into two broad categories Fixed Programmable Programmable Logic Device Introduction Lecture Notes – Lab 2.
February 4, 2002 John Wawrzynek
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Introduction to FPGA’s FPGA (Field Programmable Gate Array) –ASIC chips provide the highest performance, but can only perform the function they were designed.
Field Programmable Gate Array (FPGA) Layout An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
General FPGA Architecture Field Programmable Gate Array.
Dr. Konstantinos Tatas ACOE201 – Computer Architecture I – Laboratory Exercises Background and Introduction.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
Highest Performance Programmable DSP Solution September 17, 2015.
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Open Discussion of Design Flow Today’s task: Design an ASIC that will drive a TV cell phone Exercise objective: Importance of codesign.
Automated Design of Custom Architecture Tulika Mitra
CPLD (Complex Programmable Logic Device)
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
1 Moore’s Law in Microprocessors Pentium® proc P Year Transistors.
J. Christiansen, CERN - EP/MIC
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Programmable Logic Devices
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Lecture #3 Page 1 ECE 4110–5110 Digital System Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.HW#2 assigned Due.
EE3A1 Computer Hardware and Digital Design
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
M.Mohajjel. Why? TTM (Time-to-market) Prototyping Reconfigurable and Custom Computing 2Digital System Design.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #15 – Midterm.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 8: January 27, 2003 Empirical Cost Comparisons.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #15 – Midterm.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Introduction to Field Programmable Gate Arrays (FPGAs) EDL Spring 2016 Johns Hopkins University Electrical and Computer Engineering March 2, 2016.
Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 28, 2005 Empirical Comparisons.
Introduction to the FPGA and Labs
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Instructor: Dr. Phillip Jones
CprE / ComS 583 Reconfigurable Computing
Electronics for Physicists
CprE / ComS 583 Reconfigurable Computing
Lecture 41: Introduction to Reconfigurable Computing
COMS 361 Computer Organization
Electronics for Physicists
(Lecture by Hasan Hassan)
Programmable logic and FPGA
Presentation transcript:

Reconfigurable Computing

Lect-02.2 Course Schedule Introduction to Reconfigurable Computing FPGA Technology, Architectures, and Applications FPGA Design (theory / practice) Hardware computing models Design tools and methodologies HW/SW codesign Other Reconfigurable Architectures and Platforms Emerging Technologies Dynamic / run-time reconfiguration High-level FPGA synthesis Novel architectures

Lect-02.3 Recap Reconfigurable Computing: (1) systems incorporating some form of hardware programmability – customizing how the hardware is used using a number of physical control points [Compton, 2002] (2) computing via a post-fabrication and spatially programmed connection of processing elements [Wawrzynek, 2004] (3) general-purpose custom hardware [Goldstein, 1998]

Lect-02.4 Spatial Mapping A hardware resource (multiplier or adder) is allocated for each operator in the compute graph The compute graph is transformed directly into the implementation template x + hw grade = 0.25*homework *midterm *project 0.25 x mt0.25 x pr0.50 grade + hwmt x pr0.50 x grade

Lect-02.5 Temporal Mapping A hardware resource (ALU) is time-multiplexed to implement the actions of the operators in the compute graph Sequential / general purpose / software solution + hwmt x pr0.50 x grade ALU controller reg1 = [hw] + [mt]; reg1 = 0.25 x reg1; reg2 = 0.50 x [pr]; grade = reg1 + reg2; hwmt pr reg1 reg2

Lect-02.6 Coupling in a Reconfigurable System Standalone Processing Unit I/O Interface Attached Processing Unit Workstation Memory Caches Coprocessor CPU FU Some advantages of each? Some disadvantages?

Lect-02.7 Generic FPGA Architecture CLB IOB IOBc IOB Input/Output Buffers (IOBs) Configurable Logic Blocks (CLBs) Programmable interconnect mesh Island-style FPGA architecture FPGA = Field-Programmable Gate Array

Lect-02.8 LUT-based Logic Element carry logic 4-LUT DFF I1I2I3I4 Cout OUT Each LUT operates on four one-bit inputs Output is one data bit Can perform any Boolean function of four inputs = functions (4096 patterns) The basic logic element can be more complex Coarse v. Fine grained Contains some sort of programmable interconnect

Lect-02.9 Sample FPGA Design Flow (Xilinx) Design Entry Synthesis Implementation Device Programming Functional Simulation Timing Simulation HDL files, schematics EDIF/XNF netlist NGD Xilinx primitives file FPGA bitstream

Lect FPGAs are REconfigurable… …but good luck getting it to work! Commercial tool support is nonexistent Not ready for prime-time Uses for reconfiguration Product life extension Tolerance for manufacturing faults Runtime reconfiguration – time multiplexing specialized circuits to make more efficient use of existing resources Dynamic reconfiguration – hardware sharing (multiplexing) on the fly DPGA v. FPGA Active area of research

Lect Outline Recap Design Exercise: The Quadratic Equation Terms and Definitions Measuring Computing Density Quantitatively Comparing Computing Machines

Lect Design Exercise Consider the function: y = Ax 2 + Bx + C In groups of 2, design an architecture for this function Building blocks – adders, multipliers, muxes Don’t worry too much about control or timing Best circuit design wins a prize

Lect B Various Possible Designs + x x x x + B AC y + x y x B Ax C + A x x BC + y xCA x|+x|+ y Design C Design D Which one is the best? Design BDesign A

Lect Comparing Different Designs Design A Requires 3 multiply and 2 add area units Requires 2 multiply and 1 add time units Design B Requires 2 multiply and 2 add area units Requires 2 multiply and 2 add time units Design C Requires 1 multiply, 1 add, and 2 2:1 mux area units Requires 2 multiply and 2 add time units Design D Requires 1 compound add/multiply unit, 1 3:1 mux, and 1 2:1 mux area units Requires 2 multiply and 2 add time units

Lect Terms and Definitions 1. Computation – calculating predictable data outputs from data inputs Fine space and finite time Variables in computation: time, area, power, security, etc. 2. Process technology – a particular method used to make silicon chips Related to the size of transistors used 3. Feature size – the dimension of the smallest feature actually constructed in the manufacturing process Smallest line or gap that appears in the design Often refers to the length of the silicon channel between the source and drain terminals in Field Effect Transistors (FET) © Computer Desktop Encyclopedia

Lect Computational Density (Qualitative) FPGAs can complete more work per unit time than a processor or DSP: Less instruction overhead More active computation onto the same silicon area (allows for more parallelism) Can control operations at the bit level (as opposed to word level) Intel Pentium 4 Actel ProASIC

Lect Measuring Feature Size Current FPGAs follow a similar technology curve as microprocessors Difficult to compare device sizes across generations so we use a fixed metric, lambda (λ) to represent feature size 8λ 5λ Spacing metal 3 8λ 3λ overlap metal 2+3

Lect Towards Computational Comparison Can look at the peak computations that can be delivered per cycle and normalize to the implementation area and cycle time Feature size λ, minimum unit area λ 2

Lect Computational Density (Quantitative) Alpha processor (1994): Built using 0.35-micron process Two 64-bit ALUs 433 MHz Theoretical computational throughput – 2 x 64 / 2.3ns = 55.7 bit operations / ns Xilinx XC4085XL-09 FPGA (1992): Same 0.35-micron process 3,136 CLBs | 6,272 4-LUTs 217 MHz (peak clock rate) Theoretical computational throughput – 3136 / 4.6ns = 682 bit operations / ns Comments: clunky comparison, very out of date

Lect Computational Density (Quantitative) Intel Pentium 4 “Prescott” processor (2004): Built using 90-nm process 2 simple double-speed ALUs, 1 complex single-speed ALU = approx bit ALUs 3.2 GHz Theoretical computational throughput – 5 x 32 /.3125 ns = 512 bit operations / ns Xilinx XC4VLX200 FPGA (2004): Same 90-nm process 22,272 CLBs | 178,176 4-LUTs 500 MHz (peak clock rate) Theoretical computational throughput – 89,088 / 2.0ns = 44,544 bit operations / ns Too good to be true?

Lect Notes XC4V200 is 87 times faster than Pentium 4? Only simple integer arithmetic Division, sqrt, etc. Microprocessors have dedicated FP logic How efficiently are resources used? Ex: if only 8-bit operations being used, FPGA is an additional 4x more computationally dense than 32-bit CPU Challenges making FPGAs run consistently at their peak rate What about cost?

Lect Storing Instructions Each FPGA bit operator requires an area of 500K–1M λ 2 Static RAM cells: 1,200λ 2 per bit Storing a single 32-bit instruction: 40,000λ 2 25 instructions in space of 1M λ 2 FPGA bit op FPGA 32-bit op unit = 800 instructions CPU must also store data Conclusion: once more than 400 instructions/data words are stored on the CPU then the FPGA becomes more area efficient Prescott P4 has more than 1MB L2 cache alone

Lect Functional Unit Optimization A hardwired functional unit can be made several orders of magnitude faster than a programmable logic version Ex: 16x16 multiplier Balances the density equation Can be too generalized, or not used frequently enough Now included in high-end FPGAs

Lect Summary FPGAs – spatial computation CPU – temporal computation FPGAs are by their nature more computationally “dense” than CPU In terms of number of computations / time / area Can be quantitatively measured and compared Capacity, cost, ease of programming still important issues Numerous challenges to reconfiguration