Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ronald F. DeMara, Carthik A. Sharma University of Central Florida Self-Checking Fault Detection Self-Checking Fault Detection using Discrepancy Mirrors.

Similar presentations


Presentation on theme: "Ronald F. DeMara, Carthik A. Sharma University of Central Florida Self-Checking Fault Detection Self-Checking Fault Detection using Discrepancy Mirrors."— Presentation transcript:

1 Ronald F. DeMara, Carthik A. Sharma University of Central Florida Self-Checking Fault Detection Self-Checking Fault Detection using Discrepancy Mirrors PDPTA 2005 Las Vegas

2 Fault Handling Overview FailureFailure  Manifestation of a fault  Deviation from expected behavior DetectionDetection  Identify occurrence of fault Fully articulating inputs Intermittently articulating inputs  Methods Coding based schemes Redundancy IsolationIsolation  Physical location of fault PCI-based card used for Xilinx Virtex II-Pro Based Autonomous Repair Testbed

3 Ideal Detection Characteristics Faults in the detector are covered by itselfFaults in the detector are covered by itself  Fault-secure  Self-testing  No “Golden Elements” Multiple types of faults handled by same detectorMultiple types of faults handled by same detector  Transient and Permanent faults  Logic and Interconnect faults Minimum number of false-positivesMinimum number of false-positives  Accuracy and reliability Minimal power consumptionMinimal power consumption Verifiable correctnessVerifiable correctness Practical AssessmentPractical Assessment  Fitness assessment should be tractable

4 Discrepancy Mirror Fault Coverage Mechanism for Checking-the-Checker (“golden element” problem) Makes checker part of configuration that competes for correctness [DeMara PDPTA-05]

5 Discrepancy Mirror Circuit Fault Coverage ComponentFault ScenariosFault-Free Function Output AFaultCorrect Function Output BCorrectFaultCorrect XNOR A Disagree (0) Fault : Disagree(0)Agree (1) XNOR B Disagree (0) Agree (1)Fault : Disagree(0)Agree (1) Buffer A 00High-Z01 Buffer B 000High-Z1 Match Output00001

6 Discrepancy Mirror Truth Table ABXNOR A XNOR B ENB A ENB B TRI A TRI B MATCH 001111111 010000000 100000000 111111111 Discrepancy Mirror Truth Table ensures complete coverage of detector. Single Point of Failure reduced to a stuck-at fault exposure for MATCH output (Wired-Or)

7 Discrepancy-Enabled Isolation

8 Discrepancy Mirror Approach Selection PhaseSelection Phase  Two candidates chosen from population  Use mutually exclusive resources  Carry out computation in tandem Detection PhaseDetection Phase  Discrepancy Mirror compares outputs  MATCH output signifies fault free configurations  Faults in the detector also covered Preference Adjustment ProcessPreference Adjustment Process  Detector output over time indicates relative fitness  Relative fitness can be used to choose candidates

9 CRR Arrangement in SRAM FPGA Configurations in Population C = C L  C R C L = subset of left-half configurations C R = subset of right-half configurations |C L |=|C R |= |C|/2 Discrepancy Operator Baseline Discrepancy Operator  is dyadic operator with binary output: Z(C i ) is FPGA data throughput output of configuration C i Each half-configuration evaluates  using embedded checker (XNOR gate) within each individual Any fault in checker lowers that individual’s fitness so that individual is no longer preferred and eventually undergoes repair  = RS: (Hamming Distance)  = WTA: (Equivalence)

10 Overview of FPGA operation Competing Configurations Configurations A and B are physically distinct C A = subset consisting of ‘A’ configurations C B = subset consisting of ‘B’ configurations |C A |=|C B |= |C|/2 Discrepancy Operator Baseline Discrepancy Operator  is dyadic operator with binary output: Z(C i ) is FPGA data throughput output of configuration C i Each half-configuration evaluates  using embedded checker (XNOR gate) within each individual Any fault in checker or functional logic lowers fitness of resources used by that individual leading to isolation Reconfiguration Algorithm ` SRAM-based FPGA Configuration A Discrepancy Mirror A Discrepancy Mirror B Function Logic A CONFIGURATION BIT STREAM INPUT DATA Function Logic B DATA OUTPUT FEEDBACK Configuration B CONTROL OFF-CHIP EEPROM ( NOTE: a non-volatile memory is already required to boot any SRAM FPGA from cold start... this is not an additional chip )

11 Discrepancy Mirror Schematic: CMOS Pspice Schematic 44 p- and n-channel MOS Transistors 1.5 micron minimum width 600 nm length Width of p-mos transistors = 3*width of n-mos trans.

12 Discrepancy Mirror Schematic: Xilinx Xilinx Schematic Virtex-II Pro FPGA ModelSim-II Simulator Emulated (digital) Pull-down Resistor

13 Discrepancy Mirror Simulation: CMOS Circuit Transient Response Behavior conforms to specifications Correct identification of Discrepancy

14 Discrepancy Mirror Simulation: Xilinx ModelSim-II Circuit Response  Output ‘ High ’ == 1 when input q1 == q2  Output ‘ Low ’ when input q1 != q2. In Xilinx FPGAs, ‘ Low ’ is not exactly equal to zero, but is a Logic ‘ zero ’ nevertheless.

15 Fault Location Experiments Two experiments conductedTwo experiments conducted  C-language program simulator  Locate fault by successive intersections v-subsets or groups of resources Fault identified after m comparisons – what is the value of m?  Identify number of iterations required to identify single-fault  Random inputs, Single stuck-at fault  Expected number of pairings over 100 simulations  One ‘resource’ equivalent to one CLB ( > 10 gates) Experiment 1Experiment 1  Perpetually articulating inputs Experiment 2Experiment 2  Intermittently articulating inputs

16 Fault Location Using Dueling U Let U denote the set of all logic resources on the FPGA S S denote the pool of resources suspected of being faulty Initially denotes the set of resources used by i th configuration. To isolate the fault, m successive intersections, are performed at the end of which |S| = 1 With pre-designed partitions to achieve maximal isolation Isolation can be completed in 2n iterations, where n = | |

17 Analysis with Perpetually Articulating Inputs Perpetually Articulating Inputs No observed discrepancy implies fault-free resources Best Case (50% Utilized Capacity): 11.1 pairings for 1,000 resources 17.6 pairings for 100,000 resources Most Demanding Case: 63.7 pairings for 100,000 resources with 5% capacity utilization.

18 Analysis with Intermittently Articulating Inputs Intermittently Articulating Inputs Inputs may be such that fault is not articulated at the outputs No observed discrepancy does not imply fault-free resources Only discrepant outputs provide fault-location information Best Case (45% Utilized Capacity): 42 pairings for 1,000 resources 64.1 pairings for 100,000 resources Most Demanding Case: 478 pairings for 100,000 resources with 95% capacity utilization. 50% of the inputs articulate the fault

19 Experimental Results Summary Number of iterations to detect faults depends on Utilized CapacityNumber of iterations to detect faults depends on Utilized Capacity  Designs that utilize only a very few resources ( 80%) the resources on the FPGA pose difficult isolation problems  Each intersection exonerates (implicates) fewer individual resources Method scales wellMethod scales well  11.1, 14.9, 17.6 pairings required for 1,000, 10,000, and 100,000 resources. Sub-linear increase in location time. Current WorkCurrent Work  Competitive Runtime Reconfiguration (CRR) framework under development which will utilize methods outlined  Investigation of Competitive Group Testing methods to enable faster fault isolation  Analysis of characteristics of isolation, dependency on parameters, optimal partitioning methods.

20 Backup Slides Follow

21 Accommodating Multi-bit Word Widths Proof of conceptProof of concept  The present circuit works efficiently  Demonstrates important Dueling-enabled isolation method StrategiesStrategies  Use an array of detectors attempt to minimize points of failure as word-width increases Number of logic resources used is acceptable for smaller circuits  Create new circuit or scheme, combining fault tolerant coding-based methods with single-fault secure circuit  Current research focused on improving detector by investigating codes, and fault-secure circuits

22 Pull-down Resistor Considerations Proof of conceptProof of concept  The present circuit works in a verifiable correct manner  Can utilize synthesized (digital) pull-down resistor which simulate the behavior of analog resistors  Demonstrates Dueling-enabled isolation method  Can be utilized without implementation problems for Custom-VLSI designs Alternative ApproachAlternative Approach  Alternate detector circuits for FPGA implementation are under investigation  Avoid using Tri-state buffers, pull-down resistors and use native digital components available on FPGAs


Download ppt "Ronald F. DeMara, Carthik A. Sharma University of Central Florida Self-Checking Fault Detection Self-Checking Fault Detection using Discrepancy Mirrors."

Similar presentations


Ads by Google