Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs

Slides:



Advertisements
Similar presentations
Interconnect Testing in Cluster Based FPGA Architectures Research by Ian G.Harris and Russel Tessier University of Massachusetts. Presented by Alpha Oumar.
Advertisements

Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.
Advertisement In this work we presents novel and efficient methods for on- line CLB testing in FPGA’s. We use a ROving Tester (ROTE) which unlike any prior.
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Scrubbing Approaches for Kintex-7 FPGAs
Fault-Tolerant Systems Design Part 1.
ICAP CONTROLLER FOR HIGH-RELIABLE INTERNAL SCRUBBING Quinn Martin Steven Fingulin.
Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Optimizing Dynamic.
FAULT TOLERANCE IN FPGA BASED SPACE-BORNE COMPUTING SYSTEMS Niharika Chatla Vibhav Kundalia
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
3. Hardware Redundancy Reliable System Design 2010 by: Amir M. Rahmani.
Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök.
DSD 2007 Concurrent Error Detection for FSMs Designed for Implementation with Embedded Memory Blocks of FPGAs Andrzej Krasniewski Institute of Telecommunications.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
BIST for Logic and Memory Resources in Virtex-4 FPGAs Sachin Dhingra, Daniel Milton, and Charles Stroud Electrical and Computer Engineering Auburn University.
Slide 1/20 Fault Tolerant Approaches to Nanoelectronic Programmable Logic Arrays Authors: Wenjing Rao, Alex Orailoglu, Ramesh Karri Conference: DSN 2007.
2. Introduction to Redundancy Techniques Redundancy Implies the use of hardware, software, information, or time beyond what is needed for normal system.
Build-In Self-Test of FPGA Interconnect Delay Faults Laboratory for Reliable Computing (LaRC) Electrical Engineering Department National Tsing Hua University.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.6 Reconfiguration in Multiprocessors Focused on permanent and transient faults detection. Three.
1 Advanced Digital Design Asynchronous Design: Research Concept by A. Steininger and M. Delvai Vienna University of Technology.
FPGA Defect Tolerance: Impact of Granularity Anthony YuGuy Lemieux December 14, 2005.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
BIST vs. ATPG.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Nadpis 1 Nadpis 2 Nadpis 3 Jméno Příjmení Vysoké učení technické v Brně, Fakulta informačních technologií v Brně Božetěchova 2, Brno
FPGA Fault Emulator Jiří Kvasnička, Pavel Kubalík, Hana Kubátová.
Rawad N. Al-Haddad, Carthik A. Sharma, Ronald F. DeMara University of Central Florida Performance Evaluation of Two Allocation Schemes for Combinatorial.
Power Reduction for FPGA using Multiple Vdd/Vth
A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.
Paper Review: XiSystem - A Reconfigurable Processor and System
Reconfiguration Based Fault-Tolerant Systems Design - Survey of Approaches Jan Balach, Jan Balach, Ondřej Novák FIT, CTU in Prague MEMICS 2010.
POLITECNICO DI MILANO Reconfiguration 4 Reliability design methodology for reliability assessment and enhancement of FPGA-based systems Dynamic Reconfigurability.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Fault-Tolerant Systems Design Part 1.
MAPLD 2005/202 Pratt1 Improving FPGA Design Robustness with Partial TMR Brian Pratt 1,2 Michael Caffrey, Paul Graham 2 Eric Johnson, Keith Morgan, Michael.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Synthesis Of Fault Tolerant Circuits For FSMs & RAMs Rajiv Garg Pradish Mathews Darren Zacher.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 8: February 4, 2004 Fault Detection.
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
CprE 458/558: Real-Time Systems
Efficient On-line Interconnect BIST in FPGAs with Provable Detectability for Multiple Faults Vishal Suthar and Shantanu Dutt Dept. of ECE University of.
Fault-Tolerant Systems Design Part 1.
Section 1  Quickly identify faulty components  Design new, efficient testing methodologies to offset the complexity of FPGA testing as compared to.
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
Varadarajan Srinivasan, Julian W. Farquharson,
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
IPR: In-Place Reconfiguration for FPGA Fault Tolerance Zhe Feng 1, Yu Hu 1, Lei He 1 and Rupak Majumdar 2 1 Electrical Engineering Department 2 Computer.
Mixed PLB and Interconnect BIST for FPGAs Without Fault-Free Assumptions Vishal Suthar and Shantanu Dutt Electrical and Computer Engineering University.
Architecture and algorithm for synthesizable embedded programmable logic core Noha Kafafi, Kimberly Bozman, Steven J. E. Wilton 2003 Field programmable.
Paper by F.L. Kastensmidt, G. Neuberger, L. Carro, R. Reis Talk by Nick Boyd 1.
Defect-tolerant FPGA Switch Block and Connection Block with Fine-grain Redundancy for Yield Enhancement Anthony J. YuGuy G.F. Lemieux August 25, 2005.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
Seminar On Rain Technology
Chandrasekhar 1 MAPLD 2005/204 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan.
MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.
Fault-Tolerant Resynthesis for Dual-Output LUTs Roy Lee 1, Yu Hu 1, Rupak Majumdar 2, Lei He 1 and Minming Li 3 1 Electrical Engineering Dept., UCLA 2.
Robust FPGA Resynthesis Based on Fault-Tolerant Boolean Matching
MAPLD 2005 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan Dr. V. Kamakoti.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.
TECHNICAL SEMINAR PRESENTATION
Mi Zhou, Li-Hong Shang Yu Hu, Jing Zhang
RECONFIGURABLE NETWORK ON CHIP ARCHITECTURE FOR AEROSPACE APPLICATIONS
Hardware Assisted Fault Tolerance Using Reconfigurable Logic
Xilinx Kintex7 SRAM-based FPGA
Seminar on Enterprise Software
Presentation transcript:

Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs Dan Fisher, Addison Floyd

Outline Introduction Fault Detection - Motivation, Methods, etc. Fault Diagnosis - Motivation, Methods, etc. Fault Tolerance Single FPGA Multiple FPGAs Single Faults Multiple Faults Conclusion

Introduction FPGA Background Importance Applications Motivation for Fault Tolerance http://en.wikipedia.org/wiki/Field-programmable_gate_array

Fault Detection - Motivation Main Causes of Faults Degradation Manufacturing Defects Single Event Upsets(SEUs)

Fault Detection - Judgement Criteria Detection Methods are judged on: Speed of Detection Coverage Resource Overhead Performance Overhead Detection Granularity

Fault Detection - Criteria In-Depth Detection Granularity - how specific one is when detecting an error. FPGA made up of Tiles containing: Logic Blocks Connection Blocks - connect tiles Switch Blocks - connect tiles, allow for direction change

Fault Detection - Comparison

Fault Detection - SEDC Method The Method Explained Partition data and Encode with SEDC codes Calculate and Store check bits Generate check bits as circuit operates Compare calculated and generated values Better than Berger and TMR

Fault Detection - Nazar Method CED method providing single error detection Takes advantage of properties of LUTs Major Drawback - LUT insertion Area Improvement over DWC

Nazar Method - LUT Properties Explained* 1st Advantage: A LUT can be viewed as combinational circuit independent from others. Area overhead is avoided since you don’t need to replicate sub-expressions that form circuit outputs 2nd Advantage: A K-input LUT can compute any function with up to K inputs. So as long as our selected group is no more than K different inputs than the parity can be calculated using just one LUT. If the selected group also has no more than K-1 different outputs, then the checker can be made of just one LUT(with the last input the parity bit). This picture shows upside-down triangles as LUTs, with a one parity LUT for each K-1 outputs. Also show is the checker which would be composed of just one LUT. Separate LUTs in the same checker group can’t overlap (otherwise they wouldn’t be independent) but in order to provide coverage different checker group LUTs can overlap. *Note:This slide wasn’t in the original presentation but was added to try to better explain the method since some mentioned wanting to know more

Fault Detection - Roving Stars New method for online detection Detected faults do not affect working logic STARs and BISTERs Better than other methods *Picture added after presentation to attempt to help clear up any confusion.

Fault Detection - Injection Topic 1 Which modules most sensitive to SEU 1.4% sensitive(83% routing/16% logic) Density matrix

Fault Detection - Injection Topic 2 HW module to test efficiency of SEU mitigation schemes How to emulate SEUs - 2 step process Example Results Scrubbing Rate

Fault Diagnosis - Roving Stars Diagnose both interconnect & plb faults Partial Reuse Future - Do we allow for retest of fault?

Fault Diagnosis - More Abramovici BIST-based method in 2000 2004 paper further extending Roving Stars

Fault Diagnosis - Niamat - MATS++ Diagnose multiple stuck at faults Use of MATS++ algorithm Goal of speeding up diagnosis

Fault Diagnosis - Tahoori’s Method Diagnose a single fault in interconnect or logic Application Dependent Basic Idea

Fault Tolerance Single FPGA platform Multi FPGA platform Single Fault Multiple Faults

Fault Tolerance - Single FPGA Dynamic Fault Tolerance via Partial Reconfiguration online - handles faulty PLBs without system stopping uses spare logic cells Stroud et al

Fault Tolerance - Single FPGA Online Fault Tolerance for FPGA Logic Blocks reuse defective blocks to increase the number of spares and extend mission life uses commercial CAD tools to implement Stroud et al

Fault Tolerance - Single FPGA Using Relocatable Bitstreams for Fault Tolerance combines passive and active techniques standardized relocatable modules, which are copied and stored Montminy et al

Fault Tolerance - Multi FPGA A Reliable Reconfiguration Controller for Fault-Tolerant Embedded Systems on Multi-FPGA platforms multiple FPGAs in a mesh topology hardening achieved by TMR distributed solution Bolchini et al

Fault Tolerance - Single Fault Designing Fault Tolerant Systems into SRAM-based FPGAs for use in space Duplication with Comparison and Concurrent Error Detection Lima et al

Fault Tolerance - Single Fault TMR and Partial Dynamic Reconfiguration to Mitigate SEU Faults in FPGAs passive Triple Modular Redundancy Bolchini et al

Fault Tolerance - Single Fault IPR: In-Place Reconfiguration for FPGA Fault Tolerance preserves function and topology of LUT-based logic network algorithm applied post- layout Zhe et al

Fault Tolerance - Single Fault A Novel SRAM-Based FPGA Architecture for Efficient TMR Fault Tolerance Support Architectural level augments LUTs with TMR minimize number of reconfigurations Kyriakoulakos et al

Fault Tolerance - Multiple Faults Placement of Repair Circuits for In-Field FPGA Repair utilize unused FPGA resources repair circuits identified before faults occur alternate repair circuits cached locally or remotely Wirthlin et al

Fault Tolerance - Multiple Faults Reconfigurable Fault Tolerance: A Comprehensive Framework for Reliable and Adaptive FPGA- Based Space Computing dynamic self- adaptation high reliability vs. high performance Jacobs et al

Fault Tolerance - Multiple Faults Exploiting Partially Defective LUTs: Why You Don’t Need Perfect Fabrication because of shrinking feature size, transistor variability and failure rates are going up identifies partially defective LUTs for reuse DeHon et al

Conclusion Importance of FPGAs FPGA applications Future of FPGA fault tolerance

Questions?