Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed.

Slides:



Advertisements
Similar presentations
Field Programmable Gate Array
Advertisements

Embedded Systems Design: A Unified Hardware/Software Introduction 1 Chapter 10: IC Technology.
FPGA (Field Programmable Gate Array)
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Lecture 7 FPGA technology. 2 Implementation Platform Comparison.
1/1/ /e/e eindhoven university of technology Microprocessor Design Course 5Z008 Dr.ir. A.C. (Ad) Verschueren Eindhoven University of Technology Section.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
Survey of Reconfigurable Logic Technologies
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
PLD Technology Basics. Basic PAL Architecture DQ Q CLK OE Fuse.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
CS294-6 Reconfigurable Computing Day 5 September 8, 1998 Comparing Computing Devices.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
Lecture 2: Field Programmable Gate Arrays I September 5, 2013 ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
CS294-6 Reconfigurable Computing Day 6 September 10, 1998 Comparing Computing Devices.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Evolution of implementation technologies
Trends toward Spatial Computing Architectures Dr. André DeHon BRASS Project University of California at Berkeley.
CS294-6 Reconfigurable Computing Day 19 October 27, 1998 Multicontext.
ECE 331 – Digital System Design Tristate Buffers, Read-Only Memories and Programmable Logic Devices (Lecture #16) The slides included herein were taken.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Chapter 6 Memory and Programmable Logic Devices
General FPGA Architecture Field Programmable Gate Array.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Reconfigurable Computing. Lect-02.2 Course Schedule Introduction to Reconfigurable Computing FPGA Technology, Architectures, and Applications FPGA Design.
A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.
1 Chapter 1: Introduction.  Embedded systems overview  What are they?  Design challenge – optimizing design metrics  Technologies  Processor technologies.
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
ECE 465 Introduction to CPLDs and FPGAs Shantanu Dutt ECE Dept. University of Illinois at Chicago Acknowledgement: Extracted from lecture notes of Dr.
Lecture 15: Multi-FPGA System Software I November 1, 2004 ECE 697F Reconfigurable Computing Lecture 15 Mid-term Review.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Lecture 2 1 ECE 412: Microcomputer Laboratory Lecture 2: Design Methodologies.
J. Christiansen, CERN - EP/MIC
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 7: January 24, 2003 Instruction Space.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
EE3A1 Computer Hardware and Digital Design
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
M.Mohajjel. Why? TTM (Time-to-market) Prototyping Reconfigurable and Custom Computing 2Digital System Design.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
ECE 551: Digital System Design & Synthesis Motivation and Introduction Lectures Set 1 (3 Lectures)
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 8: January 27, 2003 Empirical Cost Comparisons.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #23 – Function.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
FPGA Technology Overview Carl Lebsack * Some slides are from the “Programmable Logic” lecture slides by Dr. Morris Chang.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 28, 2005 Empirical Comparisons.
9/4/2001 ECE 551 Fall ECE Digital System Design & Synthesis Lecture 1 - Introduction  Overview oCourse Introduction oOverview of Contemporary.
Programmable Logic Devices
Sequential Programmable Devices
Chapter 1: Introduction
HIGH LEVEL SYNTHESIS.
COMS 361 Computer Organization
Unit -4 Introduction to Embedded Systems Tuesday.
Presentation transcript:

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed and Configurable

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Overview Three types of FPGAs -EEPROM -SRAM -Antifuse SRAM FPGA architectural choices. FPGA logic blocks -> size versus performance. FPGA switch boxes State-of-the-art -Research issues in architecture.

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 What is Computation? Calculating predictable data outputs from data inputs. What should we expect from a computing device? Gives correct answer. Takes up finite space Computes in finite time Can solve all problems? - Compilation - Implementation Other issues

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Compilation How long does it take to “map” an idea to hardware? Why is the processor so “easy” to target for compilation? Processor FPGA Gate Array Full Custom Compilation time Performance low high lowhigh

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 What are variables in Computation? Time -> How long does it take to compute the answer? Area -> How much silicon space is required to determined the answer?  Processor generally fixes computing area. Problem evaluated over time through instructions.  FPGA can create flexible amount of computing area. Effectively, the configuration memory is the computing instruction.

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Measuring Feature Size Current FPGAs follow the same technology curve as microprocessors. Difficult to compare device sizes across generations so we use a fixed metric, lambda ( ). Lambda defines basic feature sizes in the VLSI device. λ 8λ 5λ spacing metal 3 8λ 3λ overlap metal 2+3

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Toward Computational Comparison Dehon metrics: Computational density of a device λ 2 x s 4 input gate-evaluations Processor: 2 x N ALU x W ALU A proc x t cycle FPGA: N 4lut A array x t cycle

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Degradation FPGA can’t really be clocked at 1/7 ns due to interconnect. Consider the Bubblesort block from the first class. If (A > B) { H = A; L = B; } else { H = B; L = A; } Ci Ci A A B B S S Co Co AB AB compare H requires 33 LUT delays

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 New Comparison Processor required three cycles at 500 MHz FPGA requires 33 LUTs delays per computation. Could consider other parts of design. Designorganization λ 2 cyclege/λ 2 x s 1994 MIPs 1x321.7G 2 ns Xilinx49 CLB (2 x4LUT)61M 7 ns 230

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Parallelization How this performance factor change over time? – through parallelization. For a given operation ge/(λ 2.s) seems the same -> 7 However, multiple comparisons could be performed in parallel. Now FPGA metric is 28 Of course, device may be only partially filled. A0A1 HL A2A3 HL A4A5 HL A6A7 HL

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Specialization Example: encryption constantvariable

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Instructions Many applications have little parallelism or have variable hardware requirements during execution. Here using more area doesn’t increase computational density. Better to reuse hardware through instructions A B operation +, -, |, x

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Single-Instruction Multiple Data Same instruction distributed to fine-grained cells. Typically organized as 2-D array Ideal for image processing Typically fixed hardware located in cell op multi-bit

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Computation Unit for SIMD Performs different operation on every cycle Easy to distribute instructions on device (use global lines) Some local storage for data in each tile From local state or other array elements To local state or other array elements Global Instruction common to all elements

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Computation Unit for FPGA Performs same operation on every cycle No global distribution of instructions at all (stored locally) Also has local storage for data. From local state or other array elements To local state or other array elements Static instruction distinct for each array element

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Hybrid Architecture Configuration selects operation of computation unit Context identifier changes over time to allow change in functionality DPGA – Dynamically Programmable Gate Array in Computation Unit (LUT) out Address Inputs (Inst. Store) Context Identifier Programming may differ for each element

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 DPGA How many applications require this flexibility Efficient techniques needed to schedule when functionality shifts. + A3A3 B3B3 O3O3 + A2A2 B2B2 O2O2 + A1A1 B1B1 O1O1 + A0A0 B0B0 O0O0 context identifier Added configuration allows for functionality to change quickly Doubles SRAM storage requirement

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Multicontext Organization/Area °A ctxt  80K 2 dense encoding °A base  800K 2 °Slides: courtesy DeHon °A ctxt :A base = 1:10

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Example: DPGA Prototype

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 FPGA vs. DPGA Compare

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Example: DPGA Area

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Configuration Caching What if I swap out some unused configurations while they are not used? Separate hardware to write given locations in hardware (config mem) and not interrupt circuit operation Just like cache prefetching LUT Config Store Context ID Out

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Hierarchical FPGA Predictable Delay Two dimensional layout Limited connectivity

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Buffering Pipelining interconnect comes at an area cost Also could consider buffering s DQ s DQ s Unpipelined Pipelined 18 transistors

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 What about this circuit? Retiming needed for hierarchical device. Number of registers proportional to longest path. Complicates design Software, debugging Need to schedule communication LUT

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 PLD (Programmable Logic Device) °All layers already exist Designers can purchase an IC Connections on the IC are either created or destroyed to implement desired functionality Field-Programmable Gate Array (FPGA) very popular °Benefits Low NRE costs, almost instant IC availability °Drawbacks Penalty on area, cost (perhaps $30 per unit), performance, and power °Acknowledgement: Mishra

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Design Technology °The manner in which we convert our concept of desired system functionality into an implementation Libraries/IP: Incorporates pre-designed implementation from lower abstraction level into higher level. System specification Behavioral specification RT specification Logic specification To final implementation Compilation/Synthesis: Automates exploration and insertion of implementation details for lower level. Test/Verification: Ensures correct functionality at each level, thus reducing costly iterations between levels. Compilation/ Synthesis Libraries/ IP Test/ Verification System synthesis Behavior synthesis RT synthesis Logic synthesis Hw/Sw/ OS Cores RT components Gates/ Cells Model simulat./ checkers Hw-Sw cosimulators HDL simulators Gate simulators

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Design productivity gap °1981 leading edge chip required 100 man-months 10,000 transistors / 100 transistors/month °2002 leading edge chip requires 30K man-months 150,000,000 / 5000 transistors/month °Designer cost increase from $1M to $300M 10,000 1, Logic transistors per chip (in millions) 100,000 10, Productivity (K) Trans./Staff-Mo IC capacity productivity Gap

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 The mythical man-month °In theory, adding designers to team reduces project completion time °In reality, productivity per designer decreases due to complexities of team management and communication overhead °In the software community, known as “the mythical man-month” (Brooks 1975) °At some point, can actually lengthen project completion time! Team Individual Months until completion Number of designers °1M transistors, one designer=5000 trans/month °Each additional designer reduces for 100 trans/month °So 2 designers produce 4900 trans/month each

Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 Summary Interesting similarities between processor and reconfigurable device Processors are reconfigured on every clock cycle using an instruction FPGAs configured once at beginning of computation DPGAs blur the line – run-time reconfiguration Numerous challenges to reconfiguration -When -How -Performance benefit?