TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

Hao wang and Jyh-Charn (Steve) Liu
Programmable FIR Filter Design
1 ECE734 VLSI Arrays for Digital Signal Processing Chapter 3 Parallel and Pipelined Processing.
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
Floating-Point FPGA (FPFPGA) Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009.
A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
PipeRench: A Coprocessor for Streaming Multimedia Acceleration Seth Goldstein, Herman Schmit et al. Carnegie Mellon University.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
Spring 08, Jan 15 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
CMOL overview ● CMOS / nanowire / MOLecular hybrids ● Uses combination of Micro – Nano – Nano implements regular blocks (ie memory) – CMOS used for logic,
Programmable logic and FPGA
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Chapter 6 Memory and Programmable Logic Devices
Dynamic Power Consumption In Large FPGAs WILLIAM GARCIA, ANDREW MORTELLARO.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
GPGPU platforms GP - General Purpose computation using GPU
A Compact and Efficient FPGA Implementation of DES Algorithm Saqib, N.A et al. In:International Conference on Reconfigurable Computing and FPGAs, Sept.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# ENG*6530 Tues, June, 10,
ASIC 120: Digital Systems and Standard-Cell ASIC Design Tutorial 4: Digital Systems Concepts November 16, 2005.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
High Speed, Low Power FIR Digital Filter Implementation Presented by, Praveen Dongara and Rahul Bhasin.
Automated Design of Custom Architecture Tulika Mitra
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
Software Defined Radio 長庚電機通訊組 碩一 張晉銓 指導教授 : 黃文傑博士.
DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000.
J. Christiansen, CERN - EP/MIC
Programmable Logic Devices
J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
EE3A1 Computer Hardware and Digital Design
ELEC692 VLSI Signal Processing Architecture Lecture 2 Pipelining and Parallel Processing.
EE5970 Computer Engineering Seminar Spring 2012 Michigan Technological University Based on: A Low-Power FPGA Based on Autonomous Fine-Grain Power Gating.
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Bi-CMOS Prakash B.
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
ADC – FIR Filter – DAC KEVIN COOLEY. Overview  Components  Schematic  Hardware Design Considerations  Digital Filters/FPGA Design Tools  Questions.
Exploiting Parallelism
ECE 551: Digital System Design & Synthesis Motivation and Introduction Lectures Set 1 (3 Lectures)
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
Digital Circuits Introduction Memory information storage a collection of cells store binary information RAM – Random-Access Memory read operation.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.
Low Power, High-Throughput AD Converters
Low Power IP Design Methodology for Rapid Development of DSP Intensive SOC Platforms T. Arslan A.T. Erdogan S. Masupe C. Chun-Fu D. Thompson.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #23 – Function.
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
Linear Analysis and Optimization of Stream Programs Masterworks Presentation Andrew A. Lamb 4/30/2003 Professor Saman Amarasinghe MIT Laboratory for Computer.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
An FFT for Wireless Protocols Dr. J. Greg Nash Centar ( HAWAI'I INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES Mobile.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.
System on a Programmable Chip (System on a Reprogrammable Chip)
Buffering Techniques Greg Stitt ECE Department University of Florida.
Programmable Hardware: Hardware or Software?
Embedded Systems Design
SEU Mitigation Techniques for Virtex FPGAs in Space Applications
Electronics for Physicists
Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.
CprE / ComS 583 Reconfigurable Computing
Programmable logic and FPGA
Presentation transcript:

TangP187_MAPLD High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept. of EE, Univ. of Washington

TangP187_MAPLD Outline The RADAR architecture Why coarse-grained programmable architectures Features of the RADAR architecture Examples of FIR filter Benchmarks Radiation Hardening of RADAR SETs in combinational logic and pipeline registers Register filtering technique

TangP187_MAPLD Current Commercial FPGAs – “One Size Fits All” Flexibility -- they can implement any digital function Commodities – not cheap ones, but not near as expensive as ASICs to design and fabricate Fewer man hours to design than ASICs Reprogrammable in situ – allowing updates and bug fixes to be made easily

TangP187_MAPLD Downside of “One Size Fits All” Power can be 10X that of an ASIC that performs the same function Area/weight can be many times an equivalent ASIC Performance may not meet requirements Varying degrees of susceptibility to radiation effects –Particularly as process feature sizes decrease

TangP187_MAPLD A Critical Observation! An FPGA in a given system will generally be used only for a limited set of related functions Example: an FPGA that performs high-throughput DSP applications, e.g. a FIR filter - May be reprogrammed to perform a variant of the FIR, e.g. different number of taps, or IIR - But not a totally different operation, e.g. random logic required for a control block - Result for this example is that all the fine-grained, bit-level flexibility in an FPGA is wasted

TangP187_MAPLD Is There a Better Way? If we can identify the domain of applications that will be used in a given environment ….. Then we can create a customized programmable device (CPD) that will : Approach ASIC performance in terms of power, area and throughput Retain sufficient programmability to enable all applications within the domain

TangP187_MAPLD ASIC/CPD/FPGA Comparison Flexibility ASICs FPGAs Area/Power ASICs FPGAs Customized PD

TangP187_MAPLD RADAR is a Programmable Device Customized for DSP Based upon Reconfigurable Pipelined Datapaths (RAPID) Linear bus-based datapath (as opposed to crossbar) –Provides efficient local interconnect, which is dominant in DSP applications Many registers (in the right places) to allow intensive pipelining Combination of static and dynamic control –Static to determine the particular application –Dynamic to control multiple phases within the application D. Cronquist, P. Franklin, C. Fisher, M. Figueroa and C. Ebeling, “Architecture Design of Reconfigurable Pipelined Datapaths,” 20 th Anniversary Conf. On Advanced Research in VLSI, 1999.

TangP187_MAPLD Example of RADAR Datapath 4 cells – each containing local memory, multiply, ALU and register plus input and output streams

TangP187_MAPLD Bus Multiplexor and Drivers

TangP187_MAPLD Bus Connectors

TangP187_MAPLD Example #1 – 4 Tap FIR Filter Given a vector of coefficient weights Compute the dot product of the coefficient weights and a vector of inputs Easily maps to a linear pipeline Following slides courtesy of Carl Ebeling, Dept. of CSE, UW

TangP187_MAPLD RADAR Datapath Programmed for 4-tap FIR filter

TangP187_MAPLD RADAR Performance Benchmarks Assume 16 RaPiD cells each containing a 16X16 multiplier, and 16 bit buses in communication network Applications: 8x8 DCT, motion estimation, FIR filter, matrix multiply, 2D Convolution Experiments so far in 0.18 micron CMOS show that 1GHz is achievable, giving 16 GOPs

TangP187_MAPLD Common Techniques for SETs TMR-in-Hardware for logic and memory 3X in power/area Voting circuitry must be hardened Using larger gate widths Increased current flow suppresses transients Also increases power/area Equivalent to using feature sizes of previous generation processes Adding resistors and capacitors Low pass filtering of SETs Increases power/area In general, circuit design techniques such as these increase area, delay and power, are difficult to design, and do not transfer well between processes!

TangP187_MAPLD TMR-in-Time and SETs A single event transient in a pipelined computation may be filtered using TMR-in-time, a simple temporal voting scheme: Same data is applied on successive clock cycles, resulting in three threads of computation followed by a majority function

TangP187_MAPLD TMR-in-Time This simple scheme works -- providing transients are no longer than the clock period It suffers from a ~3X latency relative to the singlet (unhardened) circuit, but requires one third the hardware of the TMR-in-hardware approach. In the RADAR architecture (where throughput is determined by the number of clock cycles that critical functional units are busy), throughput is the same for TMR-in-time and TMR-in-hardware TMR-in-time approaches the singlet (unhardened) case in energy consumption per computation. Data switching activity occurs only during the first of three cycles! Of course, clock power is 3X that of the singlet.

TangP187_MAPLD Filtering SETs at Registers Sampling data at every register and applying the majority function yields an optimized form of TMR-in-time. D. Mavis and P. Eaton, “Soft Error Rate Mitigation Techniques for Modern Microcircuits,” Proc. of the 40 th Annual Int. Reliability Physics Symposium, 2002, pp

TangP187_MAPLD Filtering SETs at Registers (cont.) If the delay of the transient is less than the clock separation time DT, only one of the three registers will latch incorrect data and the majority function will filter it out Note that SETs created in majority function itself will be filtered out at the following register. By increasing DT, the circuit can be made immune to transients caused by radiation of increasing LET values. The means of generating clocks delayed by DT can be made a programmable feature in the architecture. i.e. the degree of radiation hardening is programmable!

TangP187_MAPLD Power/Throughput Comparisons of Hardening Techniques  As applied to a fixed size RADAR array  Implementation assumes static CMOS  Throughput is measured in output data values / unit time

TangP187_MAPLD Application of Register Filtering to RADAR Register filtering is well suited to the RADAR architecture Better power/throughput characteristics than other methods The degree of radiation hardening can be programmable through adjustment of DT

TangP187_MAPLD Summary RADAR –Programmable architecture customized for DSP applications –Capable of 16 GOPS in 0.18 micron CMOS Radiation hardening of combinational logic –Using register filtering –Achieves near-ideal power/throughput characteristics –Degree of radiation hardness programmable