Download presentation
Presentation is loading. Please wait.
Published byCecil Cummings Modified over 9 years ago
1
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin 2 1 Fault Aware Computing Technology (FACT), VSSAD, Intel 2 University of Michigan
2
Overview Background Previous reliability estimation methodology Proposed methodology for early reliability estimates Sample analysis Conclusion
3
Strike Changes State 0 1
4
Failure Rate Definitions Interval-based MTBF = Mean Time Between Failures Rate-based FIT = Failure in Time = 1 failure in a billion hours 1 year MTBF = 10 9 / (24 * 365) FIT = 114,155 FIT Additive Total of 228K FIT + Cache: 0 FIT IQ: 114K FIT FU: 114K FIT
5
Motivation
6
Results of precise & early analysis If we meet goal we are done If we don’t meet goal add error protection schemes
7
Objectives Determine which bits matter Compute FIT rate
8
Strike on state bit Bit Read Bit has error protection Error is only detected (e.g., parity + no recovery) Error can be corrected (e.g, ECC) yes no Does bit matter? Silent Data Corruption (SDC) yes no Detected, but unrecoverable error (DUE) no error yes no benign fault no error benign fault no error * We only focus on SDC FIT
9
Architectural Vulnerability Factor (AVF) AVF bit = Probability Bit Matters = # of Visible Errors # of Bit Flips from Particle Strikes FIT bit = intrinsic FIT bit * AVF bit
10
Previous AVF Methodology Statistical Fault Injection with RTL Logic 1 0 Simulate Strike on Latch 0 output Does Fault Propagate to Architectural State
11
Characteristics of SFI with RTL Naturally characterizes all logical structures RTL not till late in the design cycle Numerous experiments to flip all bits Generally done at the chip level Limited structural insight
12
Objectives Determine which bits matter Earlier in the design cycle With fewer experiments At the structural-level Compute FIT rate Intrinsic FIT per bit Architectural Vulnerability Factor
13
Our Analysis: Which bits matter? Branch Predictor Doesn’t matter at all (AVF = 0%) Program Counter Almost always matters (AVF ~ 100%)
14
Architecturally Correct Execution (ACE) ACE path requires only a subset of values to flow correctly through the program’s data flow graph (and the machine) Anything else (un-ACE path) can be derated away Program Input Program Outputs
15
Example of un-ACE instruction: Dynamically Dead Instruction Dynamically Dead Instruction Most bits of an un-ACE instruction do not affect program output
16
Dynamic Instruction Breakdown Average across all of Spec2K slices
17
Mapping ACE & un-ACE Instructions to the Instruction Queue Architectural un-ACEMicro-architectural un-ACE Wrong- Path Inst Idle NOPPrefetch ACE Inst Ex- ACE Inst
18
T = 3ACE% = 0/4T = 2ACE% = 1/4 Vulnerability of a structure AVF = fraction of cycles a bit contains ACE state T = 1ACE% = 2/4 Average number of ACE bits in a cycle Total number of bits in the structure = T = 4ACE% = 3/4 ( 2 + 1 + 0 + 3 ) / 4 ( 2 + 1 + 0 + 3 ) / 44 =
19
Little’s Law for ACEs
20
Computing AVF Our approach is conservative We assume every bit is ACE unless proven otherwise Data Analysis Try to prove that data held in a structure is un- ACE Timing Analysis Tracks the time this data spent in the structure
21
Computing FIT rate of a Chip Total FIT = (FIT per bit i X # of bits i X AVF i ) StructureFIT per bit# of bitsAVFTotal FIT Branch Predictor.001*1K00 Program Counter.001*6410.064 Instruction Queue.001*6400?? Funtional Units.001*4000?? … … Total FIT of whole chip = column * Intrinsic FIT per bit from externally published data
22
Results: Experimental Setup Used ASIM modeling infrastructure Model of a Itanium ® 2-like processor Ran all Spec2K benchmarks Compiled with highest level of optimization with the Intel electron compiler Simulated under a full OS Simulation points chosen using SimPoint (Sherwood et al)
23
Instruction Queue ACE percentage = AVF = 29%
24
Functional Units ACE percentage = AVF = 9%
25
Computing FIT rate of Chip StructureFIT per bit# of bitsAVFTotal FIT Branch Predictor.001*1K00 Program Counter.001*6410.064 Instruction Queue.001*6400.291.856 Funtional Units.001*4000.090.360 … … Total FIT of whole chip = column * Intrinsic FIT per bit from externally published data
26
Summary Determine which bits matter ACE (Architecturally Correction Execution) Compute FIT rate Intrinsic FIT per bit AVF (Architectural Vulnerability Factor)
27
Questions?
28
Statistical Fault Injection (SFI) Algorithm Find a statistically significant set of bits Randomly select a bit Flip the bit Run two simulations: one with bit flip and one without bit flip Run for pre-defined # cycles Compare architectural state of two simulations (e.g., register file) If mismatch, declare an error Repeat algorithm with different bit flip AVF = # mismatches observed / total # experiments Used widely + has provided useful AVF numbers till date
29
SFI vs. ACE analysis SFIACE Accuracy of Microarchitectural un- ACE Better than ACE analysis Conservative Accuracy of Archirectural un-ACE ConservativeBetter than SFI (e.g., covers dynamically dead instructions) InsightPer-structure insights harder Little’s Law & per- structure breakdown easier # of experimentsLarge # required to be statistically significant Small # of experiments can give good accuracy
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.