CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #23 – Function.

Slides:



Advertisements
Similar presentations
Architecture-Specific Packing for Virtex-5 FPGAs
Advertisements

Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
University of Michigan Electrical Engineering and Computer Science 1 A Distributed Control Path Architecture for VLIW Processors Hongtao Zhong, Kevin Fan,
Lecture 7 FPGA technology. 2 Implementation Platform Comparison.
The Microprocessor is no more General Purpose. Design Gap.
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
TangP187_MAPLD High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
Lecture 2: Field Programmable Gate Arrays I September 5, 2013 ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 26: April 18, 2007 Et Cetera…
Trends toward Spatial Computing Architectures Dr. André DeHon BRASS Project University of California at Berkeley.
CS294-6 Reconfigurable Computing Day 19 October 27, 1998 Multicontext.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #3 – FPGA.
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# ENG*6530 Tues, June, 10,
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
Coarse and Fine Grain Programmable Overlay Architectures for FPGAs
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Amalgam: a Reconfigurable Processor for Future Fabrication Processes Nicholas P. Carter University of Illinois at Urbana-Champaign.
Lecture 15: Multi-FPGA System Software I November 1, 2004 ECE 697F Reconfigurable Computing Lecture 15 Mid-term Review.
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics Memories: –ROM; –SRAM; –DRAM; –Flash. Image sensors. FPGAs. PLAs.
J. Christiansen, CERN - EP/MIC
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 5, 2010 Memory Overview.
Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,
A Reconfigurable Low-power High-Performance Matrix Multiplier Architecture With Borrow Parallel Counters Counters : Rong Lin SUNY at Geneseo
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 7: January 24, 2003 Instruction Space.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
EE3A1 Computer Hardware and Digital Design
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #24 – Reconfigurable.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
“Politehnica” University of Timisoara Course No. 3: Project E MBRYONICS Evolvable Systems Winter Semester 2010.
M.Mohajjel. Why? TTM (Time-to-market) Prototyping Reconfigurable and Custom Computing 2Digital System Design.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #12 – Systolic.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #13 – Other.
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 22: April 16, 2014 Time Multiplexing.
Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed.
Buffering Techniques Greg Stitt ECE Department University of Florida.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #22 – Multi-Context.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Field Programmable Gate Arrays
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Application-Specific Customization of Soft Processor Microarchitecture
ESE534: Computer Organization
CprE / ComS 583 Reconfigurable Computing
ARM ORGANISATION.
Application-Specific Customization of Soft Processor Microarchitecture
CprE / ComS 583 Reconfigurable Computing
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #23 – Function Unit Architectures

Lect-23.2CprE 583 – Reconfigurable ComputingNovember 8, 2007 Quick Points Next week Thursday, project status updates 10 minute presentations per group + questions Upload to WebCT by the previous evening Expected that you’ve made some progress!

Lect-23.3CprE 583 – Reconfigurable ComputingNovember 8, 2007 Allowable Schedules Active LUTs (N A ) = 3

Lect-23.4CprE 583 – Reconfigurable ComputingNovember 8, 2007 Sequentialization Adding time slots More sequential (more latency) Adds slack Allows better balance L=4  N A =2 (4 or 3 contexts)

Lect-23.5CprE 583 – Reconfigurable ComputingNovember 8, 2007 Multicontext Scheduling “Retiming” for multicontext goal: minimize peak resource requirements NP-complete List schedule, anneal How do we accommodate intermediate data? Effects?

Lect-23.6CprE 583 – Reconfigurable ComputingNovember 8, 2007 Signal Retiming Non-pipelined hold value on LUT Output (wire) from production through consumption Wastes wire and switches by occupying For entire critical path delay L Not just for 1/L’th of cycle takes to cross wire segment How will it show up in multicontext?

Lect-23.7CprE 583 – Reconfigurable ComputingNovember 8, 2007 Signal Retiming Multicontext equivalent Need LUT to hold value for each intermediate context

Lect-23.8CprE 583 – Reconfigurable ComputingNovember 8, 2007 Logically three levels of dependence Single Context: K 2 =18.5M 2 Full ASCII  Hex Circuit

Lect-23.9CprE 583 – Reconfigurable ComputingNovember 8, 2007 Three contexts: K 2 =12.5M 2 Pipelining needed for dependent paths Multicontext Version

Lect-23.10CprE 583 – Reconfigurable ComputingNovember 8, 2007 ASCII  Hex Example All retiming on wires (active outputs) Saturation based on inputs to largest stage With enough contexts only one LUT needed Increased LUT area due to additional stored configuration information Eventually additional interconnect savings taken up by LUT configuration overhead Ideal  Perfect scheduling spread + no retime overhead

Lect-23.11CprE 583 – Reconfigurable ComputingNovember 8, depth=4, c=6: 5.5M 2 (compare 18.5M 2 ) ASCII  Hex Example (cont.)

Lect-23.12CprE 583 – Reconfigurable ComputingNovember 8, 2007 General Throughput Mapping If only want to achieve limited throughput Target produce new result every t cycles Spatially pipeline every t stages cycle = t Retime to minimize register requirements Multicontext evaluation w/in a spatial stage Retime (list schedule) to minimize resource usage Map for depth (i) and contexts (c)

Lect-23.13CprE 583 – Reconfigurable ComputingNovember 8, MCNC circuits Area mapped with SIS and Chortle Benchmark Set

Lect-23.14CprE 583 – Reconfigurable ComputingNovember 8, 2007 Area v. Throughput

Lect-23.15CprE 583 – Reconfigurable ComputingNovember 8, 2007 Area v. Throughput (cont.)

Lect-23.16CprE 583 – Reconfigurable ComputingNovember 8, 2007 Reconfiguration for Fault Tolerance Embedded systems require high reliability in the presence of transient or permanent faults FPGAs contain substantial redundancy Possible to dynamically “configure around” problem areas Numerous on-line and off-line solutions

Lect-23.17CprE 583 – Reconfigurable ComputingNovember 8, 2007 Huang and McCluskey Assume that each FPGA column is equivalent in terms of logic and routing Preserve empty columns for future use Somewhat wasteful Precompile and compress differences in bitstreams Column Based Reconfiguration

Lect-23.18CprE 583 – Reconfigurable ComputingNovember 8, 2007 Create multiple copies of the same design with different unused columns Only requires different inter-block connections Can lead to unreasonable configuration count Column Based Reconfiguration

Lect-23.19CprE 583 – Reconfigurable ComputingNovember 8, 2007 Determining differences and compressing the results leads to “reasonable” overhead Scalability and fault diagnosis are issues Column Based Reconfiguration

Lect-23.20CprE 583 – Reconfigurable ComputingNovember 8, 2007 Summary In many cases cannot profitably reuse logic at device cycle rate Cycles, no data parallelism Low throughput, unstructured Dissimilar data dependent computations These cases benefit from having more than one instructions/operations per active element Economical retiming becomes important here to achieve active LUT reduction For c=[4,8], I=[4,6] automatically mapped designs are 1/2 to 1/3 single context size

Lect-23.21CprE 583 – Reconfigurable ComputingNovember 8, 2007 Outline Continuation Function Unit Architectures Motivation Various architectures Device trends

Lect-23.22CprE 583 – Reconfigurable ComputingNovember 8, 2007 Coarse-grained Architectures DP-FPGA LUT-based LUTs share configuration bits Rapid Specialized ALUs, mutlipliers 1D pipeline Matrix 2-D array of ALUs Chess Augmented, pipelined matrix Raw Full RISC core as basic block Static scheduling used for communication

Lect-23.23CprE 583 – Reconfigurable ComputingNovember 8, 2007 DP-FPGA Break FPGA into datapath and control sections Save storage for LUTs and connection transistors Key issue is grain size Cherepacha/Lewis – U. Toronto

Lect-23.24CprE 583 – Reconfigurable ComputingNovember 8, 2007 MC = LUT SRAM bits CE = connection block pass transistors Set MC = 2-3CE Y0Y0 Y1Y1 A0A0 B0B0 C0C0 A1A1 B1B1 C1C1 Configuration Sharing

Lect-23.25CprE 583 – Reconfigurable ComputingNovember 8, 2007 Two-dimensional Layout Control network supports distributed signals Data routed as four-bit values

Lect-23.26CprE 583 – Reconfigurable ComputingNovember 8, 2007 Ideal case would be if all datapath divisible by 4, no “irregularities” Area improvement includes logic values only Shift logic included DP-FPGA Technology Mapping

Lect-23.27CprE 583 – Reconfigurable ComputingNovember 8, 2007 Cell RaPiD Reconfigurable Pipeline Datapath Ebeling –University of Washington Uses hard-coded functional units (ALU, Memory, multiply) Good for signal processing Linear array of processing elements

Lect-23.28CprE 583 – Reconfigurable ComputingNovember 8, 2007 Segmented linear architecture All RAMs and ALUs are pipelined Bus connectors also contain registers RaPiD Datapath

Lect-23.29CprE 583 – Reconfigurable ComputingNovember 8, 2007 RaPiD Control Path In addition to static control, control pipeline allows dynamic control LUTs provide simple programmability Cells can be chained together to form continuous pipe

Lect-23.30CprE 583 – Reconfigurable ComputingNovember 8, 2007 Measure system response to input impulse Coefficients used to scale input Running sum determined total FIR Filter Example

Lect-23.31CprE 583 – Reconfigurable ComputingNovember 8, 2007 FIR Filter Example (cont.) Chain multiple taps together (one multiplier per tap)

Lect-23.32CprE 583 – Reconfigurable ComputingNovember 8, 2007 MATRIX Dehon and Mirsky -> MIT 2-dimensional array of ALUs Each Basic Functional Unit contains “processor” (ALU + SRAM) Ideal for systolic and VLIW computation 8-bit computation Forerunner of SiliconSpice product

Lect-23.33CprE 583 – Reconfigurable ComputingNovember 8, 2007 Basic Functional Unit Two inputs from adjacent blocks Local memory for instructions, data

Lect-23.34CprE 583 – Reconfigurable ComputingNovember 8, 2007 MATRIX Interconnect Near-neighbor and quad connectivity Pipelined interconnect at ALU inputs Data transferred in 8-bit groups Interconnect not pipelined

Lect-23.35CprE 583 – Reconfigurable ComputingNovember 8, 2007 Functional Unit Inputs Each ALU inputs come from several sources Note that source is locally configurable based on data values

Lect-23.36CprE 583 – Reconfigurable ComputingNovember 8, 2007 K=8 FIR Filter Example For k-weight filter 4K cells needed One result every 2 cycles K/2 8x8 multiplies per cycle

Lect-23.37CprE 583 – Reconfigurable ComputingNovember 8, 2007 Chess HP Labs – Bristol, England 2-D array – similar to Matrix Contains more “FPGA-like” routing resources No reported software or application results Doesn’t support incremental compilation

Lect-23.38CprE 583 – Reconfigurable ComputingNovember 8, 2007 Chess Interconnect More like an FPGA Takes advantage of near-neighbor connectivity

Lect-23.39CprE 583 – Reconfigurable ComputingNovember 8, 2007 Switchbox memory can be used as storage ALU core for computation Chess Basic Block

Lect-23.40CprE 583 – Reconfigurable ComputingNovember 8, 2007 Reconfigurable Architecture Workstation MIT Computer Architecture Group Full RISC processor located as processing element Routing decoupled into switch mode Parallelizing compiler used to distribute work load Large amount of memory per tile

Lect-23.41CprE 583 – Reconfigurable ComputingNovember 8, 2007 RAW Tile Full functionality in each tile Static router located for near-neighbor communication

Lect-23.42CprE 583 – Reconfigurable ComputingNovember 8, 2007 RAW Datapath

Lect-23.43CprE 583 – Reconfigurable ComputingNovember 8, 2007 Raw Compiler Parallelizes compilation across multiple tiles Orchestrates communication between tiles Some dynamic (data dependent) routing possible

Lect-23.44CprE 583 – Reconfigurable ComputingNovember 8, 2007 Summary Architectures moving in the direction of coarse-grained blocks Latest trend is functional pipeline Communication determined at compile time Software support still a major issue