Ronald F. DeMara, Carthik A. Sharma University of Central Florida Self-Checking Fault Detection Self-Checking Fault Detection using Discrepancy Mirrors.

Slides:



Advertisements
Similar presentations
Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
Advertisements

Minimization of Circuits
ECE 3110: Introduction to Digital Systems Chapter 6 Combinational Logic Design Practices XOR, Parity Circuits, Comparators.
10/14/2005Caltech1 Reliable State Machines Dr. Gary R Burke California Institute of Technology Jet Propulsion Laboratory.
1 Combinational Logic Design&Analysis. 2 Introduction We have learned all the prerequisite material: – Truth tables and Boolean expressions describe functions.
CMP238: Projeto e Teste de Sistemas VLSI Marcelo Lubaszewski Aula 2 - Teste PPGC - UFRGS 2005/I.
ICAP CONTROLLER FOR HIGH-RELIABLE INTERNAL SCRUBBING Quinn Martin Steven Fingulin.
ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTES Logic Simulation.
Self-Checking Circuits
Combinational Logic and Verilog. XORs and XNORs XOR.
1/28 ECE th May 2014 H ardware Implementation of Self-checking circuits on FPGA Project Team #1 Chandru Loganathan Sakshi Gupta Vignesh Chandrasekaran.
CS 151 Digital Systems Design Lecture 25 State Reduction and Assignment.
Copyright 2001, Agrawal & BushnellDay-1 PM Lecture 4a1 Design for Testability Theory and Practice Lecture 4a: Simulation n What is simulation? n Design.
FPGA structure and programming - Eli Kaminsky 1 FPGA structure and programming.
Logic Simulation 4 Outline –Fault Simulation –Fault Models –Parallel Fault Simulation –Concurrent Fault Simulation Goal –Understand fault simulation problem.
Team Morphing Architecture Reconfigurable Computational Platform for Space.
Lecture 5 Fault Modeling
Overview Recall Combinational Logic Sequential Logic Storage Devices
1 Introduction VLSI Testing. 2 Overview First digital products (mid 1940's) Complexity:low MTTF:hours Cost:high Present day products (mid 1980's) Complexity:high.
Evolution of implementation technologies
Embedded Systems Laboratory Informatics Institute Federal University of Rio Grande do Sul Porto Alegre – RS – Brazil SRC TechCon 2005 Portland, Oregon,
Rewiring – Review, Quantitative Analysis and Applications Matthew Tang Wai Chung CUHK CSE MPhil 10/11/2003.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
ENGIN112 L25: State Reduction and Assignment October 31, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 25 State Reduction and Assignment.
ELEN 468 Lecture 231 ELEN 468 Advanced Logic Design Lecture 23 Testing.
1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTES Fault Modeling.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Memory and Programmable Logic
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Validation - Simulation and test pattern generation (TPG) -
Rawad N. Al-Haddad, Carthik A. Sharma, Ronald F. DeMara University of Central Florida Performance Evaluation of Two Allocation Schemes for Combinatorial.
Unit V Fault Diagnosis.
Memory and Programmable Logic Dr. Ashraf Armoush © 2010 Dr. Ashraf Armoush.
Power Reduction for FPGA using Multiple Vdd/Vth
Information Coding in noisy channel error protection:-- improve tolerance of errors error detection: --- indicate occurrence of errors. Source.
Lecture 18: Dynamic Reconfiguration II November 12, 2004 ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II.
J. Christiansen, CERN - EP/MIC
Programmable Logic Devices
THE TESTING APPROACH FOR FPGA LOGIC CELLS E. Bareiša, V. Jusas, K. Motiejūnas, R. Šeinauskas Kaunas University of Technology LITHUANIA EWDTW'04.
A Reconfigurable Low-power High-Performance Matrix Multiplier Architecture With Borrow Parallel Counters Counters : Rong Lin SUNY at Geneseo
Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn.
Universität Dortmund Chapter 6A: Validation Simulation and test pattern generation (TPG) EECE **** Embedded System Design.
Analytical Approach for Soft Error Rate Estimation of SRAM-Based FPGAs Ghazanfar (Hossein) Asadi and Mehdi B. Tahoori Why Soft Error Rate (SER) Estimation?
12-14 September 2005 Consensus-based Evaluation for Fault Isolation and On-line Evolutionary Regeneration K. Zhang, R. F. DeMara, and C. A. Sharma University.
CAS 721 Course Project Minimum Weighted Clique Cover of Test Set By Wei He ( )
Section 1  Quickly identify faulty components  Design new, efficient testing methodologies to offset the complexity of FPGA testing as compared to.
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
D_160 / MAPLD Burke 1 Fault Tolerant State Machines Gary Burke, Stephanie Taft Jet Propulsion Laboratory, California Institute of Technology.
An introduction to Fault Detection in Logic Circuits By Dr. Amin Danial Asham.
Ronald F. DeMara, Carthik A. Sharma University of Central Florida A Combinatorial Group Testing Method A Combinatorial Group Testing Method for FPGA Fault.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
Jan. 26, 2001VLSI Test: Bushnell-Agrawal/Lecture 51 Lecture 5 Fault Modeling n Why model faults? n Some real defects in VLSI and PCB n Common fault models.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 51 Lecture 5 Fault Modeling n Why model faults? n Some real defects in VLSI and PCB n Common fault.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
Digital Block Design & Layout Logic gate (3INPUT NAND GATE) 구자연.
Self-Checking Circuits
ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTES
Overview: Fault Diagnosis
Scalable Memory-Less Architecture for String Matching With FPGAs
Information Redundancy Fault Tolerant Computing
Mi Zhou, Li-Hong Shang Yu Hu, Jing Zhang
332:437 Lecture 3 Hardware Design Methodology and Advanced Logic Design Hardware design.
Software Verification and Validation
Software Verification and Validation
Software Verification and Validation
Test Data Compression for Scan-Based Testing
Presentation transcript:

Ronald F. DeMara, Carthik A. Sharma University of Central Florida Self-Checking Fault Detection Self-Checking Fault Detection using Discrepancy Mirrors PDPTA 2005 Las Vegas

Fault Handling Overview FailureFailure  Manifestation of a fault  Deviation from expected behavior DetectionDetection  Identify occurrence of fault Fully articulating inputs Intermittently articulating inputs  Methods Coding based schemes Redundancy IsolationIsolation  Physical location of fault PCI-based card used for Xilinx Virtex II-Pro Based Autonomous Repair Testbed

Ideal Detection Characteristics Faults in the detector are covered by itselfFaults in the detector are covered by itself  Fault-secure  Self-testing  No “Golden Elements” Multiple types of faults handled by same detectorMultiple types of faults handled by same detector  Transient and Permanent faults  Logic and Interconnect faults Minimum number of false-positivesMinimum number of false-positives  Accuracy and reliability Minimal power consumptionMinimal power consumption Verifiable correctnessVerifiable correctness Practical AssessmentPractical Assessment  Fitness assessment should be tractable

Discrepancy Mirror Fault Coverage Mechanism for Checking-the-Checker (“golden element” problem) Makes checker part of configuration that competes for correctness [DeMara PDPTA-05]

Discrepancy Mirror Circuit Fault Coverage ComponentFault ScenariosFault-Free Function Output AFaultCorrect Function Output BCorrectFaultCorrect XNOR A Disagree (0) Fault : Disagree(0)Agree (1) XNOR B Disagree (0) Agree (1)Fault : Disagree(0)Agree (1) Buffer A 00High-Z01 Buffer B 000High-Z1 Match Output00001

Discrepancy Mirror Truth Table ABXNOR A XNOR B ENB A ENB B TRI A TRI B MATCH Discrepancy Mirror Truth Table ensures complete coverage of detector. Single Point of Failure reduced to a stuck-at fault exposure for MATCH output (Wired-Or)

Discrepancy-Enabled Isolation

Discrepancy Mirror Approach Selection PhaseSelection Phase  Two candidates chosen from population  Use mutually exclusive resources  Carry out computation in tandem Detection PhaseDetection Phase  Discrepancy Mirror compares outputs  MATCH output signifies fault free configurations  Faults in the detector also covered Preference Adjustment ProcessPreference Adjustment Process  Detector output over time indicates relative fitness  Relative fitness can be used to choose candidates

CRR Arrangement in SRAM FPGA Configurations in Population C = C L  C R C L = subset of left-half configurations C R = subset of right-half configurations |C L |=|C R |= |C|/2 Discrepancy Operator Baseline Discrepancy Operator  is dyadic operator with binary output: Z(C i ) is FPGA data throughput output of configuration C i Each half-configuration evaluates  using embedded checker (XNOR gate) within each individual Any fault in checker lowers that individual’s fitness so that individual is no longer preferred and eventually undergoes repair  = RS: (Hamming Distance)  = WTA: (Equivalence)

Overview of FPGA operation Competing Configurations Configurations A and B are physically distinct C A = subset consisting of ‘A’ configurations C B = subset consisting of ‘B’ configurations |C A |=|C B |= |C|/2 Discrepancy Operator Baseline Discrepancy Operator  is dyadic operator with binary output: Z(C i ) is FPGA data throughput output of configuration C i Each half-configuration evaluates  using embedded checker (XNOR gate) within each individual Any fault in checker or functional logic lowers fitness of resources used by that individual leading to isolation Reconfiguration Algorithm ` SRAM-based FPGA Configuration A Discrepancy Mirror A Discrepancy Mirror B Function Logic A CONFIGURATION BIT STREAM INPUT DATA Function Logic B DATA OUTPUT FEEDBACK Configuration B CONTROL OFF-CHIP EEPROM ( NOTE: a non-volatile memory is already required to boot any SRAM FPGA from cold start... this is not an additional chip )

Discrepancy Mirror Schematic: CMOS Pspice Schematic 44 p- and n-channel MOS Transistors 1.5 micron minimum width 600 nm length Width of p-mos transistors = 3*width of n-mos trans.

Discrepancy Mirror Schematic: Xilinx Xilinx Schematic Virtex-II Pro FPGA ModelSim-II Simulator Emulated (digital) Pull-down Resistor

Discrepancy Mirror Simulation: CMOS Circuit Transient Response Behavior conforms to specifications Correct identification of Discrepancy

Discrepancy Mirror Simulation: Xilinx ModelSim-II Circuit Response  Output ‘ High ’ == 1 when input q1 == q2  Output ‘ Low ’ when input q1 != q2. In Xilinx FPGAs, ‘ Low ’ is not exactly equal to zero, but is a Logic ‘ zero ’ nevertheless.

Fault Location Experiments Two experiments conductedTwo experiments conducted  C-language program simulator  Locate fault by successive intersections v-subsets or groups of resources Fault identified after m comparisons – what is the value of m?  Identify number of iterations required to identify single-fault  Random inputs, Single stuck-at fault  Expected number of pairings over 100 simulations  One ‘resource’ equivalent to one CLB ( > 10 gates) Experiment 1Experiment 1  Perpetually articulating inputs Experiment 2Experiment 2  Intermittently articulating inputs

Fault Location Using Dueling U Let U denote the set of all logic resources on the FPGA S S denote the pool of resources suspected of being faulty Initially denotes the set of resources used by i th configuration. To isolate the fault, m successive intersections, are performed at the end of which |S| = 1 With pre-designed partitions to achieve maximal isolation Isolation can be completed in 2n iterations, where n = | |

Analysis with Perpetually Articulating Inputs Perpetually Articulating Inputs No observed discrepancy implies fault-free resources Best Case (50% Utilized Capacity): 11.1 pairings for 1,000 resources 17.6 pairings for 100,000 resources Most Demanding Case: 63.7 pairings for 100,000 resources with 5% capacity utilization.

Analysis with Intermittently Articulating Inputs Intermittently Articulating Inputs Inputs may be such that fault is not articulated at the outputs No observed discrepancy does not imply fault-free resources Only discrepant outputs provide fault-location information Best Case (45% Utilized Capacity): 42 pairings for 1,000 resources 64.1 pairings for 100,000 resources Most Demanding Case: 478 pairings for 100,000 resources with 95% capacity utilization. 50% of the inputs articulate the fault

Experimental Results Summary Number of iterations to detect faults depends on Utilized CapacityNumber of iterations to detect faults depends on Utilized Capacity  Designs that utilize only a very few resources ( 80%) the resources on the FPGA pose difficult isolation problems  Each intersection exonerates (implicates) fewer individual resources Method scales wellMethod scales well  11.1, 14.9, 17.6 pairings required for 1,000, 10,000, and 100,000 resources. Sub-linear increase in location time. Current WorkCurrent Work  Competitive Runtime Reconfiguration (CRR) framework under development which will utilize methods outlined  Investigation of Competitive Group Testing methods to enable faster fault isolation  Analysis of characteristics of isolation, dependency on parameters, optimal partitioning methods.

Backup Slides Follow

Accommodating Multi-bit Word Widths Proof of conceptProof of concept  The present circuit works efficiently  Demonstrates important Dueling-enabled isolation method StrategiesStrategies  Use an array of detectors attempt to minimize points of failure as word-width increases Number of logic resources used is acceptable for smaller circuits  Create new circuit or scheme, combining fault tolerant coding-based methods with single-fault secure circuit  Current research focused on improving detector by investigating codes, and fault-secure circuits

Pull-down Resistor Considerations Proof of conceptProof of concept  The present circuit works in a verifiable correct manner  Can utilize synthesized (digital) pull-down resistor which simulate the behavior of analog resistors  Demonstrates Dueling-enabled isolation method  Can be utilized without implementation problems for Custom-VLSI designs Alternative ApproachAlternative Approach  Alternate detector circuits for FPGA implementation are under investigation  Avoid using Tri-state buffers, pull-down resistors and use native digital components available on FPGAs