Download presentation
Presentation is loading. Please wait.
Published byHaven Seman Modified over 9 years ago
1
DESIGN AND EVALUATION OF HYBRID FAULT-DETECTION SYSTEMS Qing Xu Kevin Wang
2
OUTLINE Background Motivation Key Ideas Introduction to CRAFT Summary and Discussion Points
3
0 1 BACKGROUND Smaller and Faster Transistors Lower threshold voltage Tighter noise margins Less reliable Results Incorrect program execution Recovery Alpha Particle Transient Faults Software Only Hardware Only REDUNDENCY Int main() { cout << “Hello\n”; } Int main() { cout << “Hello\n”; }
4
MOTIVATION AND GOAL Software Only Inadequate coverage Slow Hardware Only Large Overhead/Area High cost Hybrid Solution Better Reliability and Performance Lower Hardware Area and Cost
5
KEY IDEA: COMPILER ASSISTED FAULT TOLERANCE (CRAFT) Characteristics : - Based on software technique - Minimal hardware adaptations - Take advantages from Software and Hardware solution Benefits : - Nearly perfect reliability - Low performance degradation - Low hardware cost Software Hardware
6
CRAFT: HYBRID OF EXISTING METHODS Hardware Method Software Method Redundant Multithreading Technique (RMT) Error Correcting Codes (ECC) Software Implemented Fault Tolerance (SWIFT) Error Detection by Duplicating Instructions (EDDI) Advantages Almost-perfect fault coverage Low performance cost Advantages High fault coverage Modest performance cost Zero hardware cost
7
EXISTING METHOD: HARDWARE RMT RMT makes use of SMT resource through loosely synchronized redundant threads Components not covered by redundant execution must employ alternative techniques, such as Error Correction Code (ECC) Original Thread Checker Thread Redundant Multi- threading (RMT)
8
EXISTING METHOD: SOFTWARE SWIFT A compiler based transformation Store instruction is the synchronization point Assumes that Error Correction Code (ECC) guards correctness of memory subsystem ld r3 = [r4] add r1 = r2, r3 st m[r1] = r2 (Original Code) ld r3 = [r4] mov r3’ = r3 add r1 = r2, r3 add r1’ = r2’, r3’ br Fault, r1 != r1’ br Fault, r2 != r2’ br Fault, r3 != r3’ st m[r1] = r2 (SWIFT Code)
9
CRAFT: SUITE OF THREE DETECTION SYSTEM Preliminaries List of the Suite: 1.Checking Store Buffer (CSB) 2.Load Value Queue (LVQ) 3.CSB + LVQ Assume Single Event Upset fault model Architecturally Correct Execution (ACE) Detected Unrecoverable Error (DUE) Silent Data Corruption (SDC)
10
SUITE 1: CHECKING STORE BUFFER (CSB) Solution: Add a Store Buffer to perform checks Problem to Improve: SWIFT: Vulnerable to faults in the time interval between the validation and use of a register value Use of validated valuesValidated values Vulnerable to Faults
11
CSB : IMPLEMENTATION.................. Basic Idea: Commit a store when two copies of store data match Method : Create CSB to keep track of all original and duplicated instructions Step 1: st [r1] = r2 Compiler duplicates the stores with single- bit version name st 1 [r1] = r2 st 2 [rt’] = r2’ Step 2: New store entries are put into CSB Duplicate entries discarded if match, marks OK to execute Step 3: Unchecked stores will be clogged at head of CSB Fault detected when CSB is filled
12
.................. CSB #0123 Address-- 0xFF0xEE Value-- 0x80x1 Validated-- NN 0xFF 0x8 0xEE 0x2 Compiler duplicates stores st [r1] = r2 st1 [r1] = r2 st2 [r1’] = r2’ Not match, not OK to go to MEM CSB : IMPLEMENTATION Basic Idea: Commit a store when two copies of store data match Method : Create CSB to keep track of all original and duplicated instructions Table will fill up and structural hazard Insn duplicate #1 Insn duplicate #2 Y N Store Value Checks Out! Send to MEM.
13
CSB : ADVANTAGES/ DISADVANTAGES Checking implemented in hardware level No longer need validation code; reduces code size Store instructions are no longer synchronization points (SWIFT) Exploit more dynamic scheduling Advantages Disadvantages Additional compiler requirements: distance between duplicated instruction should not exceed size of CSB
14
SUITE 2: LOAD VALUE QUEUE (LVQ) Problem to Improve: SWIFT: Verify loads by generating move instruction after load, keep a copy of value Solution: Add a load value queue br faultDet, r2 != r2’ ld r1 = [r2] mov r1’ = r1
15
SUITE 2: LOAD VALUE QUEUE (LVQ) Problem to Improve: SWIFT: Window of vulnerability between load instruction and value duplication. Solution: Add a load value queue Vulnerable to Faults Copying valuesLoading values
16
LVQ : IMPLEMENTATION PROCEDURE Threadmill: Branch to TEST1.................. Basic Idea: Duplicate load to enable redundant computation Method : LVQ provides redundant load instruction execution Step 1: ld [r1] = r2 Compiler duplicates the stores with single- bit version name ld 1 [r1] = r2 ld 2 [r1’] = r2’ Step 2: Duplicated load bypassed from LVQ De-allocate entry from LVQ when two copies match Step 3: Fault detected when two duplicated copies fail to match
17
LVQ : IMPLEMENTATION PROCEDURE Threadmill: Branch to TEST1.................. Basic Idea: Duplicate load to enable redundant computation Method : LVQ provides redundant load instruction execution LVQ #0123 Address-- 0xAA0xBB Value-- 0x20x1 Validated-- NN 0xAA 0x2 0xBB 0x1 Compiler duplicates loads ld [r1] = r2 ld1 [r1] = r2 ld2 [r1’] = r2’ Error Detected! Load Value Checks Out! Insn #1 Insn #2 N Y
18
LVQ : IMPLEMENTATION PROCEDURE Threadmill: Branch to TEST1.................. Basic Idea: Duplicate load to enable redundant computation Method : LVQ provides redundant load instruction execution LVQ #0123 Address-- Value-- 0xAA Compiler duplicates loads ld [r1] = r2 ld1 [r1] = r2 ld2 [r1’] = r2’ ld insn ld insn duplicate 0xAA 0x2
19
LVQ : ADVANTAGES/ DISADVANTAGES Advantages Disadvantages Extra hardware to enforce loads and their duplicates access same entry in LVQ Reduces window of vulnerability by issuing duplicated load instruction Keep memory traffic low by bypassing load value
20
SUITE 3: CSB + LVQ Implements both CSB and LVQ simultaneously to software-only solutions like SWIFT
21
COMPARISON OF DIFFERENT APPROACHES TechniqueCategoryOpcode/ Control Load/ Store MemroyHardware Requirement RMTHWAll NoneSMT Base Machine + CSB + LVQ SWIFTSWSome None CRAFT: CSB + LVQ HybridSomeAllNoneCSB + LVQ
22
EXPERIMENTAL EVALUATION Evaluation Method – Performance vs. Reliability: Inject randomly chosen faults to detailed microarchitectural simulation Each chosen bit-flip is tracked until completion of program Analyze final result to determine: - How much SDC is converted to DUE - How much work (# of application) did program complete before encountering SDC
23
EXPERIMENTAL EVALUATION Results: Measures # of applications the program completed before encountering an SDC ImplementationPerformance CSBEnable better performance as it eliminates scheduling constraints LVQImpact varies by benchmark
24
SUMMARY AND CONCLUSION CRAFT, as compared to: Hybrid technique can provide better reliability with relatively low cost Software-only TechniqueHardware-only Technique Execution time reduction by 5%Significantly reduce area overhead SDC to DUE conversion rate increase by 75% Maintain comparable reliability
25
DISCUSSION POINTS CRAFT detects fault when CSB is clogged Tradeoff between detection latency and more flexible scheduling? Recovery method? Evaluation in terms of coverage?
26
CRAFT Ad: Maintain all SWIFT ad and increase reliability Low cost, relatively high reliability Better performance than SWIFT DisAd: No recovery method CSB much higher performance than LVQ No evaluation on coverage Compiler is ISA&Microarch dependent
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.