Download presentation
Presentation is loading. Please wait.
Published byEvan Golden Modified over 8 years ago
1
Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj
2
Motivation u Soft errors – issue for correct operation of CMOS circuits u Problem becomes more severe – ITRS 2009 Smaller device sizes Low supply voltages u Effect of soft errors on circuits Karnik 2004, Nguyen 2003 u Effect of soft errors on software and processors Li et al 2005, Wang et al 2004
3
Motivation u Traditional notion of correctness Every last bit of every variable in a program should be correct Referred to as numerical correctness Referred to as numerical correctness u Application-level correctness Several applications can tolerate a degree of error Image viewer, video decoding etc u However, there exist critical instructions even in such applications Example: state machine in video decoder
4
Motivation u Goal: Detect all “critical” instructions in the program u Protect “critical” instructions in the program against soft errors Using duplication
5
Outline u Motivation u Definition of critical instructions u Program representation u Static analysis to detect critical instructions u Profiling and runtime monitoring u Results
6
Outline u Motivation u Definition of critical instructions u Program representation u Static analysis to detect critical instructions u Profiling and runtime monitoring u Results
7
Defining critical instructions u Elastic outputs – program outputs which can tolerate a certain amount of error Media applications – image, video etc Heuristics – Support vector machine u Characterizing quality of elastic outputs – Fidelity metric Example: PSNR (peak signal to noise ratio) for JPEG, bit error rate,
8
Defining critical instructions u Given application A : I is the input to the application A set of outputs O c - numerical correctness required A set of elastic outputs O Fidelity metric F(I,O) for elastic outputs T – threshold for acceptable output u An execution of A is said to satisfy application-level correctness if: All outputs ε O c are numerically correct F(I,O) ≥ T for elastic outputs u N min – the minimum number of elements of O that need to erroneous for F(I,O) to fall below T
9
Example: JPEG decoder u PSNR of 35dB is assumed to be good quality MSE = 20.56 u Using 8-bit pixel values (MAX=255), Max error = 255 u For a 1024x768 pixel image, N min ~ 251
10
Defining critical instructions u An instruction X is said to be critical if X affects one of the outputs of O c (numerical correctness required) OR X affects N min elastic output elements O
11
Outline u Motivation u Definition of critical instructions u Program representation u Static analysis to detect critical instructions u Profiling and runtime monitoring u Results
12
Program representation u LLVM compiler infrastructure LLVM intermediate representation u Weighted program dependence graph (PDG) – G
13
Example LLVM IR – 3 address code
14
Example PDG - based on LLVM IR
15
Example Node for computing X
16
Example Node (out_i) to compute C[Z]+X Node (so) to store C[Z]+X into array output
17
Example Node for computing X Node (so) to write to output array Edge to represent dependence between X and out_i Node (so) to store C[Z]+X into array output Edge to represent dependence between out_i and so
18
Assigning edge weights u Edge weight u→v - how many instances of node v are affected by 1 instance of u ? u Example: u X outside the loop, out_i inside the loop Edge weight N u Nodes out_i and so are in the same basic block – Edge weight 1
19
Outline u Motivation u Definition of critical instructions u Program representation u Static analysis to detect critical instructions u Profiling and runtime monitoring u Results
20
Static analysis for detecting critical instructions u Find how many instances of output O are affected by node x u propagate(x →v) is the number of instances of v that are affected by an instance of x
21
Example u propagate(u→v) initialized to edge weight for all edges (u →v) u propagate(X →out_i) = N u w(out_i →so) = 1 u propagate(X →so) = propagate(X →out_i) * w(out_i →so) w(out_i →so) u More formally
22
Outline u Motivation u Definition of critical instructions u Program representation u Static analysis to detect critical instructions u Profiling and runtime monitoring u Results
23
Profiling and runtime monitoring u Static analysis is conservative in nature May produce overly pessimistic results Main reason – edge weights are initialized too high u Profiling with test inputs to estimate edge weights
24
Example u Assum static analysis overestimates edge weight between sc and c_z u Profiling gives value of 1 u Node sc is likely non-critical (LNC) u Contrast this with node X which is static critical
25
Profiling and runtime monitoring u Likely critical instructions – duplicated and checked in software Using the SWIFT method proposed by Reis et al 2005 u Likely non-critical instructions – monitored using lightweight runtime monitoring technique u Static non-critical instructions – no error checking
26
Outline u Motivation u Definition of critical instructions u Program representation u Static analysis to detect critical instructions u Profiling and runtime monitoring u Results
27
Results u Benchmarks for Mediabench, SPEC, Mibench u Simics/GEMS simulation infrastructure
28
Static instruction classification u Significant number of instructions are non-critical u Profiling helps to determine likely non-critical instructions
29
Comparison with previous work u Significant savings over approach proposed by Thaker et al Protects all instructions which compute memory addresses and control flow
30
Conclusion u Static + dynamic technique for detecting critical instructions u Detect several non-critical instructions u Reduce overall energy by 25%
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.