Fault Tolerance in Reconfigurable Computing / FPGAs Bayram Kurumahmut CMPE 516 MS Computer Engineering Bogazici University 27.04.2006.

Slides:



Advertisements
Similar presentations
Introduction to DFT Alexander Gnusin.
Advertisements

IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
Field Programmable Gate Array
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.
Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 31/22alt1 Lecture 31 System Test (Lecture 22alt in the Alternative Sequence) n Definition n Functional.
Apr. 20, 2001VLSI Test: Bushnell-Agrawal/Lecture 311 Lecture 31 System Test n Definition n Functional test n Diagnostic test  Fault dictionary  Diagnostic.
Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 261 Lecture 26 Logic BIST Architectures n Motivation n Built-in Logic Block Observer (BILBO) n Test.
Scrubbing Approaches for Kintex-7 FPGAs
CMP238: Projeto e Teste de Sistemas VLSI Marcelo Lubaszewski Aula 2 - Teste PPGC - UFRGS 2005/I.
Fault-Tolerant Systems Design Part 1.
ICAP CONTROLLER FOR HIGH-RELIABLE INTERNAL SCRUBBING Quinn Martin Steven Fingulin.
FAULT TOLERANCE IN FPGA BASED SPACE-BORNE COMPUTING SYSTEMS Niharika Chatla Vibhav Kundalia
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Self-Checking Carry-Select Adder Design Based on Two-Rail Encoding
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Logic Simulation 4 Outline –Fault Simulation –Fault Models –Parallel Fault Simulation –Concurrent Fault Simulation Goal –Understand fault simulation problem.
02/02/20091 Logic devices can be classified into two broad categories Fixed Programmable Programmable Logic Device Introduction Lecture Notes – Lab 2.
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
Build-In Self-Test of FPGA Interconnect Delay Faults Laboratory for Reliable Computing (LaRC) Electrical Engineering Department National Tsing Hua University.
1/31/20081 Logic devices can be classified into two broad categories Fixed Programmable Programmable Logic Device Introduction Lecture Notes – Lab 2.
FPGA Defect Tolerance: Impact of Granularity Anthony YuGuy Lemieux December 14, 2005.
Configuration. Mirjana Stojanovic Process of loading bitstream of a design into the configuration memory. Bitstream is the transmission.
Introduction to FPGA’s FPGA (Field Programmable Gate Array) –ASIC chips provide the highest performance, but can only perform the function they were designed.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
공과대학 > IT 공학부 Embedded Processor Design Chapter 8: Test EMBEDDED SYSTEM DESIGN 공과대학 > IT 공학부 Embedded Processor Design Presenter: Yvette E. Gelogo Professor:
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.
IBM S/390 Parallel Enterprise Server G5 fault tolerance: A historical perspective by L. Spainhower & T.A. Gregg Presented by Mahmut Yilmaz.
Chapter 7. Testing of a digital circuit
Computer Engineering Group Brandenburg University of Technology at Cottbus 1 Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in VLIW-Processors.
THE TESTING APPROACH FOR FPGA LOGIC CELLS E. Bareiša, V. Jusas, K. Motiejūnas, R. Šeinauskas Kaunas University of Technology LITHUANIA EWDTW'04.
Design for Testability By Dr. Amin Danial Asham. References An Introduction to Logic Circuit Testing.
ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTEMS
Fault-Tolerant Systems Design Part 1.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
EE3A1 Computer Hardware and Digital Design
CprE 458/558: Real-Time Systems
Analytical Approach for Soft Error Rate Estimation of SRAM-Based FPGAs Ghazanfar (Hossein) Asadi and Mehdi B. Tahoori Why Soft Error Rate (SER) Estimation?
Fault-Tolerant Systems Design Part 1.
Section 1  Quickly identify faulty components  Design new, efficient testing methodologies to offset the complexity of FPGA testing as compared to.
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
Silicon Programming--Testing1 Completing a successful project (introduction) Design for testability.
Paper by F.L. Kastensmidt, G. Neuberger, L. Carro, R. Reis Talk by Nick Boyd 1.
Defect-tolerant FPGA Switch Block and Connection Block with Fine-grain Redundancy for Yield Enhancement Anthony J. YuGuy G.F. Lemieux August 25, 2005.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Gill 1 MAPLD 2005/234 Analysis and Reduction Soft Delay Errors in CMOS Circuits Balkaran Gill, Chris Papachristou, and Francis Wolff Department of Electrical.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
Chandrasekhar 1 MAPLD 2005/204 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan.
MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Hayri Uğur UYANIK Very Large Scale Integration II - VLSI II
VLSI Testing Lecture 14: System Diagnosis
CFTP ( Configurable Fault Tolerant Processor )
MAPLD 2005 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan Dr. V. Kamakoti.
XILINX FPGAs Xilinx lunched first commercial FPGA XC2000 in 1985
SEU Mitigation Techniques for Virtex FPGAs in Space Applications
COUPING WITH THE INTERCONNECT
CPE/EE 428/528 VLSI Design II – Intro to Testing (Part 2)
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
CPE/EE 428/528 VLSI Design II – Intro to Testing (Part 3)
ECE 434 Advanced Digital System L18
VLSI Testing Lecture 15: System Diagnosis
We will be studying the architecture of XC3000.
Sequential circuits and Digital System Reliability
Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.
Analytical Approach for Soft Error Rate Estimation of SRAM-Based FPGAs
Hardware Assisted Fault Tolerance Using Reconfigurable Logic
Lecture 26 Logic BIST Architectures
Presentation transcript:

Fault Tolerance in Reconfigurable Computing / FPGAs Bayram Kurumahmut CMPE 516 MS Computer Engineering Bogazici University

Outline Introduction Modify Configurable Logic Block (CLB) Dynamic Serial Testing Built-In Self Healing (BISH) Hardware Voter Configurable Fault Tolerant Processor (CFTP) Self-Checking Logic Design (SCLD) CLB Functional Testing

Introduction Configurable Logic Block (CLB) Interconnect Wires Interconnect Switches Configured by SRAM contents Configuration SRAM

Modify CLB [4] Consider faults only in CLB Shift configuration data –Means load only one configuration for test Very slow process –Shift this configuration for next tests Do not change physical design of running application No intervention at hardware level –Faster –Better results in test diagnosis and defect/fault tolerance

Modify CLB [4] (Cont’d) SRAM –Assume this as faulty free –Has configuration data –Modify this to enable shifting configuration Adding a multiplexer –Decide shifting direction Shifting to east/west/north/south

Modify CLB [4] (Cont’d) Hardware overhead –Calculate additional transistor count –Calculate device transistor count –Compare them

Dynamic Serial vs Parallel [5] Reduce test configuration time Require less i/o pin Faster and easier

Dynamic Serial vs Parallel [5] (Cont’d) Consider unprogrammed FPGAs to test –No a specific user designed application configuration –Consider all configurations Generate and download configurations –Time consuming Decompose number of configurations Find test patterns

Dynamic Serial Test [5] (Cont’d) Function unit –Multiplexers and one D-Type Flip Flop –Test Pattern requirements for multiplexers Detect stuck-on/off faults of them Stuck-at faults of all their i/o nets Bridge faults of data inputs

Dynamic Serial Test [5] (Cont’d) 11 Test configuration (TC) for function unit Provide an efficient way to test many function units in short time –11 TC * 4096 = TC for XC6216 –Apply parallel testing after this step

Dynamic Serial Test [5] (Cont’d) Direct Parallel Testing –Test row or column cells at the same time –TC count increases with FPGA size, 11 TC per test unit –Not so efficient Two – Phase Parallel Testing –Reed-Muller Propagation Chain (RMPC) –22 TC per test unit, constant –Single faulty function unit location with 4 TC

Dynamic Serial Test [5] (Cont’d) Proposed Method –Link all function units into a chain –Test chain integrity in baypass mode –Test function unit with its 11 TCs and corresponding test patterns (TP) –Return to bypass mode –Repeat for the next function unit

Dynamic Serial Test [5] (Cont’d) Compare with parallel testing –Required less TC 13 TCs, not 22 TCs –Locate fault without additional TC –Use less i/o pin Simplify test observation

Dynamic Serial Test [5] (Cont’d) Disadvantage –Propagation path length Depends on array size –Integrate with parallel approach for large arrays Additional i/o pins

Built-In Self Healing (BISH) [8] Run time self configuration Implement a soft-processor –Manage and execute all procedures Fault detection/location/repair Modular redundancy for assurance of working correctly

BISH - Submicron technology problems [8] Single event upsets (SEU) –Radiation-induced transient errors caused by neutrons from cosmic rays –Alpha particles from packing material –do not physically damage the chip –Changes in memory cell values Incorrect data Improper instruction for processor Increase threat of electromigration –Physical damage to chip

BISH - Tasks [8] Detection –Scan chain Regulary capture net values Analyze them in soft-processor Diagnosis, Repair –Controlled also by soft-processor Applied for only SEUs

BISH - Fault Causes [8] SEU changing a circuit register value –Possibly a transient error –Invalid in next capture after register update SEU changing configuration memory cell –Wrong functionality assignment on FPGA –Readback configuration –CRC check –Partial reconfiguration if incohorency exits Permanent physical defect on FPGA –Mark down this defected area

Hardware Voter [6] Detect and correct single errors on inputs Bypass double errors in X1, X2, X3 by substuting errornous data with spare one, X4 Spare Detect and correct single errors Bypass double error by substituting errornous data with spare one Congruency level of accepted SEs Unrecoverable error signal

Configurable Fault Tolerant Processor (CFTP) [2] Applied for spacecraft onboard processing Triple Modular Redundancy (TMR) for soft processor on FPGA –Mitigate bit errors in computation by detecting and correcting them using voting logic –On orbit updates, reconfigurations, modifications Detect SEU-induced configuration faults

Self-Checking Logic Design (SCLD) [3] Map boolean functions into FPGA Functional cell Generate complementary outputs Checker cell –Verify correctness of final outputs Fault: same value at outputs Increase number of CLBs used but incorporate self-checking or testability features

SCLD – Fault Types [3] Single stuck-at faults in RAM cells Single stuck-at faults on any line of a CLB Functional faults in any multiplexer within a single CLB Functional faults in any D-Type Flip Flop within a single CLB Single stuck-at faults in any pass transistor connecting CLBs

SCLD [3] k-feasible –4 inputs for functional cells 4-feasible boolean functions required If not, decompose boolean function before map it on FPGA

SCLD – Algorithm [3] Decompose a sum-of-products expression into 4-feasible expression. Choose the expression with the minimum number of nodes Map each expression directly into a 4-input function cell Connect outputs of a pair of intermediate function cells to the inputs of a checker cell, and generate the equations for each output of the checker cell Cascade the checker cells to form a checker tree. The outputs of the function cell at the last stage are outputs circuit.

SCLD – Example [3]

SCLD – Implementation [3]

CLB Functional Testing [1] Gate level testing not required Use CLB functional property –AND, OR gate or any boolean expression Additional hardware to apply test –Multiplexer –Example for 2-inputs CLB

CLB Functional Testing - Redundant Faults [1] CLB function = AND gate –Sa0 on first data input of a multiplexer –Sa0 on second data input of a multiplexer –Sa0 on third data input of a multiplexer –Sa1 on fourth data input of a multiplexer CLB function = OR gate –Sa0 on first data input of a multiplexer –Sa1 on second data input of a multiplexer –Sa1 on third data input of a multiplexer –Sa1 on fourth data input of a multiplexer

CLB Functional Testing [1] Exhaustive testing applied Long test length but high fault coverage –99.81%, compare with 87.90% of gate-level testing

Conclusion Dynamic reconfigurable environments –Use flexible test of circuits –Repair errors by partial reconfiguration Do not disturb normal operation in defect on partial hardware –Design your processor on them to provide self-test on circuit

References [1] Testing of FPGA Logic Cells, E. Bareisa, V.Jusas, K.Motiejunas, R.Seinauskas, 2004 ISSN Elektronika IR Elektrotechnica. [2] Configurable Fault-Tolerant Processor (CFTP) for SpaceCraft Onboard Processing, Charles A. Hulme, Herschel H. Loomis, Alan A. Ross, Rong Yuan, 2004 IEEE Aerospace Conference Proceedings [3] Self-Checking Logic Design for FPGA Implementation, Parag K. Lala, Alfred L. Burress, 2003 IEEE Transactions on Instrumentation and Measurement [4] FPGAs and Fault Tolerance, Abderrahim Doumar, Hideo Ito, 2001 The 13th International Conference on Microelectronics [5] Fault Detection and Location of Dynamic Reconfigurable FPGAs, Chi-Feng Wu, Cheng-Wen Wu [6] FPGA Implementation of Hardware Voter, Milos D. Krstic, Mile K. Stojcev, TELSIKS 2001 IEEE [7] Testing the Configurability of Dynamic FPGAs, N. Park, S. J. Ruiwale, F. Lombardi, 2000 IEEE [8] A Self –Healing Real-Time System Based on Run-Time Self Reconfiguration, Manuel G. Gericota, Gustavo R. Alves, Jose M. Ferreira, 2005 IÊEE [9] Testing Approach within FPGA-based Fault Tolerant Systems, Abderrahim Doumar, Hideo Ito, 2000 IEEE