Presentation is loading. Please wait.

Presentation is loading. Please wait.

Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj.

Similar presentations


Presentation on theme: "Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj."— Presentation transcript:

1 Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj

2 Motivation u Soft errors – issue for correct operation of CMOS circuits u Problem becomes more severe – ITRS 2009  Smaller device sizes  Low supply voltages u Effect of soft errors on circuits  Karnik 2004, Nguyen 2003 u Effect of soft errors on software and processors  Li et al 2005, Wang et al 2004

3 Motivation u Traditional notion of correctness  Every last bit of every variable in a program should be correct Referred to as numerical correctness Referred to as numerical correctness u Application-level correctness  Several applications can tolerate a degree of error  Image viewer, video decoding etc u However, there exist critical instructions even in such applications  Example: state machine in video decoder

4 Motivation u Goal: Detect all “critical” instructions in the program u Protect “critical” instructions in the program against soft errors  Using duplication

5 Outline u Motivation u Definition of critical instructions u Program representation u Static analysis to detect critical instructions u Profiling and runtime monitoring u Results

6 Outline u Motivation u Definition of critical instructions u Program representation u Static analysis to detect critical instructions u Profiling and runtime monitoring u Results

7 Defining critical instructions u Elastic outputs – program outputs which can tolerate a certain amount of error  Media applications – image, video etc  Heuristics – Support vector machine u Characterizing quality of elastic outputs – Fidelity metric  Example: PSNR (peak signal to noise ratio) for JPEG, bit error rate,

8 Defining critical instructions u Given application A :  I is the input to the application  A set of outputs O c - numerical correctness required  A set of elastic outputs O  Fidelity metric F(I,O) for elastic outputs  T – threshold for acceptable output u An execution of A is said to satisfy application-level correctness if:  All outputs ε O c are numerically correct  F(I,O) ≥ T for elastic outputs u N min – the minimum number of elements of O that need to erroneous for F(I,O) to fall below T

9 Example: JPEG decoder u PSNR of 35dB is assumed to be good quality  MSE = 20.56 u Using 8-bit pixel values (MAX=255),  Max error = 255 u For a 1024x768 pixel image, N min ~ 251

10 Defining critical instructions u An instruction X is said to be critical if  X affects one of the outputs of O c (numerical correctness required) OR  X affects N min elastic output elements O

11 Outline u Motivation u Definition of critical instructions u Program representation u Static analysis to detect critical instructions u Profiling and runtime monitoring u Results

12 Program representation u LLVM compiler infrastructure  LLVM intermediate representation u Weighted program dependence graph (PDG) – G

13 Example LLVM IR – 3 address code

14 Example PDG - based on LLVM IR

15 Example Node for computing X

16 Example Node (out_i) to compute C[Z]+X Node (so) to store C[Z]+X into array output

17 Example Node for computing X Node (so) to write to output array Edge to represent dependence between X and out_i Node (so) to store C[Z]+X into array output Edge to represent dependence between out_i and so

18 Assigning edge weights u Edge weight u→v - how many instances of node v are affected by 1 instance of u ? u Example: u X outside the loop, out_i inside the loop  Edge weight N u Nodes out_i and so are in the same basic block –  Edge weight 1

19 Outline u Motivation u Definition of critical instructions u Program representation u Static analysis to detect critical instructions u Profiling and runtime monitoring u Results

20 Static analysis for detecting critical instructions u Find how many instances of output O are affected by node x u propagate(x →v) is the number of instances of v that are affected by an instance of x

21 Example u propagate(u→v) initialized to edge weight for all edges (u →v) u propagate(X →out_i) = N u w(out_i →so) = 1 u propagate(X →so) = propagate(X →out_i) * w(out_i →so) w(out_i →so) u More formally

22 Outline u Motivation u Definition of critical instructions u Program representation u Static analysis to detect critical instructions u Profiling and runtime monitoring u Results

23 Profiling and runtime monitoring u Static analysis is conservative in nature  May produce overly pessimistic results  Main reason – edge weights are initialized too high u Profiling with test inputs to estimate edge weights

24 Example u Assum static analysis overestimates edge weight between sc and c_z u Profiling gives value of 1 u Node sc is likely non-critical (LNC) u Contrast this with node X which is static critical

25 Profiling and runtime monitoring u Likely critical instructions – duplicated and checked in software  Using the SWIFT method proposed by Reis et al 2005 u Likely non-critical instructions – monitored using lightweight runtime monitoring technique u Static non-critical instructions – no error checking

26 Outline u Motivation u Definition of critical instructions u Program representation u Static analysis to detect critical instructions u Profiling and runtime monitoring u Results

27 Results u Benchmarks for Mediabench, SPEC, Mibench u Simics/GEMS simulation infrastructure

28 Static instruction classification u Significant number of instructions are non-critical u Profiling helps to determine likely non-critical instructions

29 Comparison with previous work u Significant savings over approach proposed by Thaker et al  Protects all instructions which compute memory addresses and control flow

30 Conclusion u Static + dynamic technique for detecting critical instructions u Detect several non-critical instructions u Reduce overall energy by 25%


Download ppt "Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj."

Similar presentations


Ads by Google