Presentation is loading. Please wait.

Presentation is loading. Please wait.

NC STATE UNIVERSITY DSN 2007 © Vimal Reddy 2007 1 Inherent Time Redundancy (ITR): Using Program Repetition for Low-Overhead Fault Tolerance Vimal Reddy.

Similar presentations


Presentation on theme: "NC STATE UNIVERSITY DSN 2007 © Vimal Reddy 2007 1 Inherent Time Redundancy (ITR): Using Program Repetition for Low-Overhead Fault Tolerance Vimal Reddy."— Presentation transcript:

1 NC STATE UNIVERSITY DSN 2007 © Vimal Reddy 2007 1 Inherent Time Redundancy (ITR): Using Program Repetition for Low-Overhead Fault Tolerance Vimal Reddy Eric Rotenberg Center for Efficient, Secure and Reliable Computing (CESR) Dept. of Electrical & Computer Engineering North Carolina State University Raleigh, NC

2 NC STATE UNIVERSITY DSN 2007 © Vimal Reddy 2007 2 Processor Fault Tolerance Main approaches: Time or space redundancy –Explicitly duplicate in time or space –Match values/signals –E.g., AR-SMT (time), IBM S/390 G5 (space) High overheads –Area/power/performance –Unsuitable for commodity high-performance processors

3 NC STATE UNIVERSITY DSN 2007 New Idea: Inherent Time Redundancy © Vimal Reddy 2007 3 == program program duplicateprogram Conventional time redundancyInherent time redundancy

4 NC STATE UNIVERSITY DSN 2007 Outline Exploiting ITR for Fault Tolerance ITR Cache for Checking Decode Signals ITR Cache Hit/Miss Scenarios Microarchitecture Extensions to Superscalar Results –Repetition –Fault Coverage Based on Cache Hits/Misses –Fault Coverage Based on Fault Injection Area and Power Overhead Conclusion © Vimal Reddy 2007 4

5 NC STATE UNIVERSITY DSN 2007 Exploiting ITR for Fault Tolerance Key ideas: –Record input-independent signals for instructions –Confirm their correctness upon repetition This paper focuses on decode signals –Provides protection for fetch and decode stages Longer term goal –Apply ITR to other input-independent stages –My thesis combines ITR with other microarch.- level checks for a comprehensive fault regimen © Vimal Reddy 2007 5

6 NC STATE UNIVERSITY DSN 2007 Outline Exploiting ITR for Fault Tolerance ITR Cache for Checking Decode Signals ITR Cache Hit/Miss Scenarios Microarchitecture Extensions to Superscalar Results –Repetition –Fault Coverage Based on Cache Hits/Misses –Fault Coverage Based on Fault Injection Area and Power Overhead Conclusion © Vimal Reddy 2007 6

7 NC STATE UNIVERSITY DSN 2007 ITR Cache for Checking Decode Signals Create decode signatures at trace granularity –Trace ends at branch or 16 instructions (arbitrary) –Combine decode signals of instructions in a trace Record decode signatures in an ITR cache –Indexed by start PC (program counter) of a trace For each new trace, compare with ITR cache –Fault if new signature mismatches ITR cache © Vimal Reddy 2007 7

8 NC STATE UNIVERSITY DSN 2007 Outline Exploiting ITR for Fault Tolerance ITR Cache for Checking Decode Signals ITR Cache Hit/Miss Scenarios Microarchitecture Extensions to Superscalar Results –Repetition –Fault Coverage Based on Cache Hits/Misses –Fault Coverage Based on Fault Injection Area and Power Overhead Conclusion © Vimal Reddy 2007 8

9 NC STATE UNIVERSITY DSN 2007 pc(I) Scenario 1: ITR Cache Hit © Vimal Reddy 2007 9 Signature Generation IJK S2 (I, J, K) Tags Signatures S1 (I, J, K)pc(I) Hit I ≠ Fault Detected Notes: -- Faults on traces that hit (S2) are detectable -- These faults are recoverable by flushing the processor pipeline and restarting from the faulting trace ITR Cache

10 NC STATE UNIVERSITY DSN 2007 Scenario 2: ITR Cache Miss Followed by Hit © Vimal Reddy 2007 10 Signature Generation IJK S1 (I, J, K) Tags Signatures S1 (I, J, K)pc(I) Miss pc(I) I ITR Cache

11 NC STATE UNIVERSITY DSN 2007 pc(I) © Vimal Reddy 2007 11 Signature Generation IJK S2 (I, J, K) Tags Signatures S1 (I, J, K)pc(I) Hit ≠ Scenario 2: ITR Cache Miss Followed by Hit Fault Detected Notes: -- Faults on traces that miss but later hit (faulty S1) are detectable -- Faults on S1 need checkpoint for recovery or must abort ITR Cache

12 NC STATE UNIVERSITY DSN 2007 Scenario 3: ITR Cache Miss Followed by Eviction © Vimal Reddy 2007 12 Signature Generation IJK S1 (I, J, K) Tags Signatures S1 (I, J, K)pc(I) Miss pc(I) I ITR Cache

13 NC STATE UNIVERSITY DSN 2007 pc(M) © Vimal Reddy 2007 13 Signature Generation MNO S2 (M, N, O) Tags Signatures Evict Notes: -- Faults on missed & evicted traces (S1) are not detected Scenario 3: ITR Cache Miss Followed by Eviction S2 (M, N, O)S1 (I, J, K)pc(M)pc(I) Fault not detected ITR Cache

14 NC STATE UNIVERSITY DSN 2007 Outline Exploiting ITR for Fault Tolerance ITR Cache for Checking Decode Signals ITR Cache Hit/Miss Scenarios Microarchitecture Extensions to Superscalar Results –Repetition –Fault Coverage Based on Cache Hits/Misses –Fault Coverage Based on Fault Injection Area and Power Overhead Conclusion © Vimal Reddy 2007 14

15 NC STATE UNIVERSITY DSN 2007 Microarchitecture Extensions to Superscalar © Vimal Reddy 2007 15 Decode signals redirected for signature generation and ITR cache access

16 NC STATE UNIVERSITY DSN 2007 Microarchitecture Extensions to Superscalar © Vimal Reddy 2007 16 Signature generation: Bitwise XOR decode signals of instr. from a trace

17 NC STATE UNIVERSITY DSN 2007 Microarchitecture Extensions to Superscalar © Vimal Reddy 2007 17 Signatures dispatched into ITR ROB (ITR cache more effective without wrong-path trace signatures)

18 NC STATE UNIVERSITY DSN 2007 Microarchitecture Extensions to Superscalar © Vimal Reddy 2007 18 Access ITR cache with trace start PC

19 NC STATE UNIVERSITY DSN 2007 Microarchitecture Extensions to Superscalar © Vimal Reddy 2007 19 Signature mismatch, set retry bit -- Flush pipeline and retry from faulting trace

20 NC STATE UNIVERSITY DSN 2007 Microarchitecture Extensions to Superscalar © Vimal Reddy 2007 20 Fault re-detected, old signature is faulty -- Abort or rollback to a safe checkpoint Fault disappears, successful recovery

21 NC STATE UNIVERSITY DSN 2007 Microarchitecture Extensions to Superscalar © Vimal Reddy 2007 21 ITR cache miss -- Write new signature to ITR cache

22 NC STATE UNIVERSITY DSN 2007 Microarchitecture Extensions to Superscalar © Vimal Reddy 2007 22 No fault detected (or possible loss of detection due to miss-followed-by- eviction case)

23 NC STATE UNIVERSITY DSN 2007 Outline Exploiting ITR for Fault Tolerance ITR Cache for Checking Decode Signals ITR Cache Hit/Miss Scenarios Microarchitecture Extensions to Superscalar Results –Repetition –Fault Coverage Based on Cache Hits/Misses –Fault Coverage Based on Fault Injection Area and Power Overhead Conclusion © Vimal Reddy 2007 23

24 NC STATE UNIVERSITY DSN 2007 Outline Exploiting ITR for Fault Tolerance ITR Cache for Checking Decode Signals ITR Cache Hit/Miss Scenarios Microarchitecture Extensions to Superscalar Results –Repetition –Fault Coverage Based on Cache Hits/Misses –Fault Coverage Based on Fault Injection Area and Power Overhead Conclusion © Vimal Reddy 2007 24

25 NC STATE UNIVERSITY DSN 2007 Results: Repetition in SPEC2K INT © Vimal Reddy 2007 25 % of total dynamic instructions

26 NC STATE UNIVERSITY DSN 2007 © Vimal Reddy 2007 26 Results: Repetition in SPEC2K FP % of total dynamic instructions

27 NC STATE UNIVERSITY DSN 2007 © Vimal Reddy 2007 27 Results: Proximity in SPEC2K INT % of total dynamic instructions # dynamic instructions separating repetitive traces

28 NC STATE UNIVERSITY DSN 2007 © Vimal Reddy 2007 28 Results: Proximity in SPEC2K FP # dynamic instructions separating repetitive traces % of total dynamic instructions

29 NC STATE UNIVERSITY DSN 2007 Outline Exploiting ITR for Fault Tolerance ITR Cache for Checking Decode Signals ITR Cache Hit/Miss Scenarios Microarchitecture Extensions to Superscalar Results –Repetition –Fault Coverage Based on Cache Hits/Misses –Fault Coverage Based on Fault Injection Area and Power Overhead Conclusion © Vimal Reddy 2007 29

30 NC STATE UNIVERSITY DSN 2007 Fault Coverage Based on ITR Cache Misses Loss in fault detection coverage –Miss-followed-by-eviction causes loss in fault detection –(# of missed & evicted traces) x instructions/trace Loss in fault recovery coverage (w/o checkpoints) –Misses cause loss in fault recovery –(# of missed traces) x instructions/trace Tried various ITR cache configurations –Direct-mapped (DM), 2,4,8,16-way, fully-associative (FA) –256, 512 and 1024 signatures –Bzip, gzip, art, mgrid and wupwise had <1% loss in coverage (results not shown) © Vimal Reddy 2007 30

31 NC STATE UNIVERSITY DSN 2007 Results: Loss in Fault Detection Coverage © Vimal Reddy 2007 31 Coverage loss correlates with proximity of repetition e.g., perl and vortex Capacity is important for poor performers For 2-way, 1024 signatures: 8% max. (vortex) and 1.3% avg. loss in detection coverage % of total dynamic instructions

32 NC STATE UNIVERSITY DSN 2007 Results: Loss in Fault Recovery Coverage © Vimal Reddy 2007 32 Recovery coverage loss is bigger than loss in detection coverage For 2-way, 1024 signatures: 15% max. (vortex) and 2.5% avg. loss in recovery coverage % of total dynamic instructions

33 NC STATE UNIVERSITY DSN 2007 Outline Exploiting ITR for Fault Tolerance ITR Cache for Checking Decode Signals ITR Cache Hit/Miss Scenarios Microarchitecture Extensions to Superscalar Results –Repetition –Fault Coverage Based on Cache Hits/Misses –Fault Coverage Based on Fault Injection Area and Power Overhead Conclusion © Vimal Reddy 2007 33

34 NC STATE UNIVERSITY DSN 2007 Fault Injection Experiments © Vimal Reddy 2007 34 Injected faults on decode signals of a cycle- accurate simulator Based on MIPS R10K processor 1000 random faults per benchmark

35 NC STATE UNIVERSITY DSN 2007 Decode Signals Modeled © Vimal Reddy 2007 35

36 NC STATE UNIVERSITY DSN 2007 Fault Injection Outcomes © Vimal Reddy 2007 36 Fault SDCMask ITR+SDC MayITR+SDC Undet+SDC ITR+Mask Undet+Mask MayITR+Mask 92% 97% 7% 1% 37% 3% 0% 63% Refer paper for other interesting fault injection results

37 NC STATE UNIVERSITY DSN 2007 Outline Exploiting ITR for Fault Tolerance ITR Cache for Checking Decode Signals ITR Cache Hit/Miss Scenarios Microarchitecture Extensions to Superscalar Results –Repetition –Fault Coverage Based on Cache Hits/Misses –Fault Coverage Based on Fault Injection Area and Power Overhead Conclusion © Vimal Reddy 2007 37

38 NC STATE UNIVERSITY DSN 2007 Vimal Reddy © 2007 38 Area Overhead The IBM S/390 G5 replicates the fetch and decode units (I-unit) ITR cache configuration similar to BTB in the picture (2K entries, 2-way assoc., 35 bits per entry) ITR cache is 1/7 th the I-unit *From [slegel-IEEE-Micro-1999]

39 NC STATE UNIVERSITY DSN 2007 Vimal Reddy © 2007 39 Power Overhead Redundant fetching is a major power overhead Compare I-cache of IBM power4 (64KB, dm, 128B line, 1 rd./wr. port) with ITR cache (16KB, 2-way, 4B line, 1 rd. and 1wr. port) Excluded redundant decoding energy (more savings in reality)

40 NC STATE UNIVERSITY DSN 2007 Outline Exploiting ITR for Fault Tolerance ITR Cache for Checking Decode Signals ITR Cache Hit/Miss Scenarios Microarchitecture Extensions to Superscalar Results –Repetition –Fault Coverage Based on Cache Hits/Misses –Fault Coverage Based on Fault Injection Area and Power Overhead Conclusion © Vimal Reddy 2007 40

41 NC STATE UNIVERSITY DSN 2007 Conclusion Inherent Time Redundancy is a new opportunity for efficient fault tolerance 96% of faults on decode signals detected by small ITR cache (~16 KB) Future work: –Handling benchmarks with less repetition –Exploiting ITR for other input-independent parts of the processor –Evaluating a comprehensive fault regimen that includes ITR © Vimal Reddy 2007 41


Download ppt "NC STATE UNIVERSITY DSN 2007 © Vimal Reddy 2007 1 Inherent Time Redundancy (ITR): Using Program Repetition for Low-Overhead Fault Tolerance Vimal Reddy."

Similar presentations


Ads by Google