Presentation is loading. Please wait.

Presentation is loading. Please wait.

Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.

Similar presentations


Presentation on theme: "Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin."— Presentation transcript:

1 Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin 2 1 Fault Aware Computing Technology (FACT), VSSAD, Intel 2 University of Michigan

2 Overview Background Previous reliability estimation methodology Proposed methodology for early reliability estimates Sample analysis Conclusion

3 Strike Changes State 0 1

4 Failure Rate Definitions Interval-based MTBF = Mean Time Between Failures Rate-based FIT = Failure in Time = 1 failure in a billion hours 1 year MTBF = 10 9 / (24 * 365) FIT = 114,155 FIT Additive Total of 228K FIT + Cache: 0 FIT IQ: 114K FIT FU: 114K FIT

5 Motivation

6 Results of precise & early analysis If we meet goal we are done If we don’t meet goal add error protection schemes

7 Objectives Determine which bits matter Compute FIT rate

8 Strike on state bit Bit Read Bit has error protection Error is only detected (e.g., parity + no recovery) Error can be corrected (e.g, ECC) yes no Does bit matter? Silent Data Corruption (SDC) yes no Detected, but unrecoverable error (DUE) no error yes no benign fault no error benign fault no error * We only focus on SDC FIT

9 Architectural Vulnerability Factor (AVF) AVF bit = Probability Bit Matters = # of Visible Errors # of Bit Flips from Particle Strikes FIT bit = intrinsic FIT bit * AVF bit

10 Previous AVF Methodology Statistical Fault Injection with RTL Logic 1 0 Simulate Strike on Latch 0 output Does Fault Propagate to Architectural State

11 Characteristics of SFI with RTL Naturally characterizes all logical structures RTL not till late in the design cycle Numerous experiments to flip all bits Generally done at the chip level Limited structural insight

12 Objectives Determine which bits matter Earlier in the design cycle With fewer experiments At the structural-level Compute FIT rate Intrinsic FIT per bit Architectural Vulnerability Factor

13 Our Analysis: Which bits matter? Branch Predictor Doesn’t matter at all (AVF = 0%) Program Counter Almost always matters (AVF ~ 100%)

14 Architecturally Correct Execution (ACE) ACE path requires only a subset of values to flow correctly through the program’s data flow graph (and the machine) Anything else (un-ACE path) can be derated away Program Input Program Outputs

15 Example of un-ACE instruction: Dynamically Dead Instruction Dynamically Dead Instruction Most bits of an un-ACE instruction do not affect program output

16 Dynamic Instruction Breakdown Average across all of Spec2K slices

17 Mapping ACE & un-ACE Instructions to the Instruction Queue Architectural un-ACEMicro-architectural un-ACE Wrong- Path Inst Idle NOPPrefetch ACE Inst Ex- ACE Inst

18 T = 3ACE% = 0/4T = 2ACE% = 1/4 Vulnerability of a structure AVF = fraction of cycles a bit contains ACE state T = 1ACE% = 2/4 Average number of ACE bits in a cycle Total number of bits in the structure = T = 4ACE% = 3/4 ( 2 + 1 + 0 + 3 ) / 4 ( 2 + 1 + 0 + 3 ) / 44 =

19 Little’s Law for ACEs

20 Computing AVF Our approach is conservative We assume every bit is ACE unless proven otherwise Data Analysis Try to prove that data held in a structure is un- ACE Timing Analysis Tracks the time this data spent in the structure

21 Computing FIT rate of a Chip Total FIT =  (FIT per bit i X # of bits i X AVF i ) StructureFIT per bit# of bitsAVFTotal FIT Branch Predictor.001*1K00 Program Counter.001*6410.064 Instruction Queue.001*6400?? Funtional Units.001*4000?? … … Total FIT of whole chip =  column * Intrinsic FIT per bit from externally published data

22 Results: Experimental Setup Used ASIM modeling infrastructure Model of a Itanium ® 2-like processor Ran all Spec2K benchmarks Compiled with highest level of optimization with the Intel electron compiler Simulated under a full OS Simulation points chosen using SimPoint (Sherwood et al)

23 Instruction Queue ACE percentage = AVF = 29%

24 Functional Units ACE percentage = AVF = 9%

25 Computing FIT rate of Chip StructureFIT per bit# of bitsAVFTotal FIT Branch Predictor.001*1K00 Program Counter.001*6410.064 Instruction Queue.001*6400.291.856 Funtional Units.001*4000.090.360 … … Total FIT of whole chip =  column * Intrinsic FIT per bit from externally published data

26 Summary Determine which bits matter ACE (Architecturally Correction Execution) Compute FIT rate Intrinsic FIT per bit AVF (Architectural Vulnerability Factor)

27 Questions?

28 Statistical Fault Injection (SFI) Algorithm Find a statistically significant set of bits Randomly select a bit Flip the bit Run two simulations: one with bit flip and one without bit flip Run for pre-defined # cycles Compare architectural state of two simulations (e.g., register file) If mismatch, declare an error Repeat algorithm with different bit flip AVF = # mismatches observed / total # experiments Used widely + has provided useful AVF numbers till date

29 SFI vs. ACE analysis SFIACE Accuracy of Microarchitectural un- ACE Better than ACE analysis Conservative Accuracy of Archirectural un-ACE ConservativeBetter than SFI (e.g., covers dynamically dead instructions) InsightPer-structure insights harder Little’s Law & per- structure breakdown easier # of experimentsLarge # required to be statistically significant Small # of experiments can give good accuracy


Download ppt "Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin."

Similar presentations


Ads by Google