Download presentation
Presentation is loading. Please wait.
Published byCamilla Skinner Modified over 9 years ago
1
Ronald F. DeMara, Carthik A. Sharma University of Central Florida Self-Checking Fault Detection Self-Checking Fault Detection using Discrepancy Mirrors PDPTA 2005 Las Vegas
2
Fault Handling Overview FailureFailure Manifestation of a fault Deviation from expected behavior DetectionDetection Identify occurrence of fault Fully articulating inputs Intermittently articulating inputs Methods Coding based schemes Redundancy IsolationIsolation Physical location of fault PCI-based card used for Xilinx Virtex II-Pro Based Autonomous Repair Testbed
3
Ideal Detection Characteristics Faults in the detector are covered by itselfFaults in the detector are covered by itself Fault-secure Self-testing No “Golden Elements” Multiple types of faults handled by same detectorMultiple types of faults handled by same detector Transient and Permanent faults Logic and Interconnect faults Minimum number of false-positivesMinimum number of false-positives Accuracy and reliability Minimal power consumptionMinimal power consumption Verifiable correctnessVerifiable correctness Practical AssessmentPractical Assessment Fitness assessment should be tractable
4
Discrepancy Mirror Fault Coverage Mechanism for Checking-the-Checker (“golden element” problem) Makes checker part of configuration that competes for correctness [DeMara PDPTA-05]
5
Discrepancy Mirror Circuit Fault Coverage ComponentFault ScenariosFault-Free Function Output AFaultCorrect Function Output BCorrectFaultCorrect XNOR A Disagree (0) Fault : Disagree(0)Agree (1) XNOR B Disagree (0) Agree (1)Fault : Disagree(0)Agree (1) Buffer A 00High-Z01 Buffer B 000High-Z1 Match Output00001
6
Discrepancy Mirror Truth Table ABXNOR A XNOR B ENB A ENB B TRI A TRI B MATCH 001111111 010000000 100000000 111111111 Discrepancy Mirror Truth Table ensures complete coverage of detector. Single Point of Failure reduced to a stuck-at fault exposure for MATCH output (Wired-Or)
7
Discrepancy-Enabled Isolation
8
Discrepancy Mirror Approach Selection PhaseSelection Phase Two candidates chosen from population Use mutually exclusive resources Carry out computation in tandem Detection PhaseDetection Phase Discrepancy Mirror compares outputs MATCH output signifies fault free configurations Faults in the detector also covered Preference Adjustment ProcessPreference Adjustment Process Detector output over time indicates relative fitness Relative fitness can be used to choose candidates
9
CRR Arrangement in SRAM FPGA Configurations in Population C = C L C R C L = subset of left-half configurations C R = subset of right-half configurations |C L |=|C R |= |C|/2 Discrepancy Operator Baseline Discrepancy Operator is dyadic operator with binary output: Z(C i ) is FPGA data throughput output of configuration C i Each half-configuration evaluates using embedded checker (XNOR gate) within each individual Any fault in checker lowers that individual’s fitness so that individual is no longer preferred and eventually undergoes repair = RS: (Hamming Distance) = WTA: (Equivalence)
10
Overview of FPGA operation Competing Configurations Configurations A and B are physically distinct C A = subset consisting of ‘A’ configurations C B = subset consisting of ‘B’ configurations |C A |=|C B |= |C|/2 Discrepancy Operator Baseline Discrepancy Operator is dyadic operator with binary output: Z(C i ) is FPGA data throughput output of configuration C i Each half-configuration evaluates using embedded checker (XNOR gate) within each individual Any fault in checker or functional logic lowers fitness of resources used by that individual leading to isolation Reconfiguration Algorithm ` SRAM-based FPGA Configuration A Discrepancy Mirror A Discrepancy Mirror B Function Logic A CONFIGURATION BIT STREAM INPUT DATA Function Logic B DATA OUTPUT FEEDBACK Configuration B CONTROL OFF-CHIP EEPROM ( NOTE: a non-volatile memory is already required to boot any SRAM FPGA from cold start... this is not an additional chip )
11
Discrepancy Mirror Schematic: CMOS Pspice Schematic 44 p- and n-channel MOS Transistors 1.5 micron minimum width 600 nm length Width of p-mos transistors = 3*width of n-mos trans.
12
Discrepancy Mirror Schematic: Xilinx Xilinx Schematic Virtex-II Pro FPGA ModelSim-II Simulator Emulated (digital) Pull-down Resistor
13
Discrepancy Mirror Simulation: CMOS Circuit Transient Response Behavior conforms to specifications Correct identification of Discrepancy
14
Discrepancy Mirror Simulation: Xilinx ModelSim-II Circuit Response Output ‘ High ’ == 1 when input q1 == q2 Output ‘ Low ’ when input q1 != q2. In Xilinx FPGAs, ‘ Low ’ is not exactly equal to zero, but is a Logic ‘ zero ’ nevertheless.
15
Fault Location Experiments Two experiments conductedTwo experiments conducted C-language program simulator Locate fault by successive intersections v-subsets or groups of resources Fault identified after m comparisons – what is the value of m? Identify number of iterations required to identify single-fault Random inputs, Single stuck-at fault Expected number of pairings over 100 simulations One ‘resource’ equivalent to one CLB ( > 10 gates) Experiment 1Experiment 1 Perpetually articulating inputs Experiment 2Experiment 2 Intermittently articulating inputs
16
Fault Location Using Dueling U Let U denote the set of all logic resources on the FPGA S S denote the pool of resources suspected of being faulty Initially denotes the set of resources used by i th configuration. To isolate the fault, m successive intersections, are performed at the end of which |S| = 1 With pre-designed partitions to achieve maximal isolation Isolation can be completed in 2n iterations, where n = | |
17
Analysis with Perpetually Articulating Inputs Perpetually Articulating Inputs No observed discrepancy implies fault-free resources Best Case (50% Utilized Capacity): 11.1 pairings for 1,000 resources 17.6 pairings for 100,000 resources Most Demanding Case: 63.7 pairings for 100,000 resources with 5% capacity utilization.
18
Analysis with Intermittently Articulating Inputs Intermittently Articulating Inputs Inputs may be such that fault is not articulated at the outputs No observed discrepancy does not imply fault-free resources Only discrepant outputs provide fault-location information Best Case (45% Utilized Capacity): 42 pairings for 1,000 resources 64.1 pairings for 100,000 resources Most Demanding Case: 478 pairings for 100,000 resources with 95% capacity utilization. 50% of the inputs articulate the fault
19
Experimental Results Summary Number of iterations to detect faults depends on Utilized CapacityNumber of iterations to detect faults depends on Utilized Capacity Designs that utilize only a very few resources ( 80%) the resources on the FPGA pose difficult isolation problems Each intersection exonerates (implicates) fewer individual resources Method scales wellMethod scales well 11.1, 14.9, 17.6 pairings required for 1,000, 10,000, and 100,000 resources. Sub-linear increase in location time. Current WorkCurrent Work Competitive Runtime Reconfiguration (CRR) framework under development which will utilize methods outlined Investigation of Competitive Group Testing methods to enable faster fault isolation Analysis of characteristics of isolation, dependency on parameters, optimal partitioning methods.
20
Backup Slides Follow
21
Accommodating Multi-bit Word Widths Proof of conceptProof of concept The present circuit works efficiently Demonstrates important Dueling-enabled isolation method StrategiesStrategies Use an array of detectors attempt to minimize points of failure as word-width increases Number of logic resources used is acceptable for smaller circuits Create new circuit or scheme, combining fault tolerant coding-based methods with single-fault secure circuit Current research focused on improving detector by investigating codes, and fault-secure circuits
22
Pull-down Resistor Considerations Proof of conceptProof of concept The present circuit works in a verifiable correct manner Can utilize synthesized (digital) pull-down resistor which simulate the behavior of analog resistors Demonstrates Dueling-enabled isolation method Can be utilized without implementation problems for Custom-VLSI designs Alternative ApproachAlternative Approach Alternate detector circuits for FPGA implementation are under investigation Avoid using Tri-state buffers, pull-down resistors and use native digital components available on FPGAs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.