Download presentation
Presentation is loading. Please wait.
Published byAvis Horn Modified over 9 years ago
1
Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Nam Sung Kim, Mikko Lipasti University of Wisconsin-Madison Dept. of Electrical and Computer Engineering
2
2 Motivation o Technology scaling beyond 32nm degrades yield o Circuit solutions: impose restrictive design rules or use regular fabrics o Redundancy-based solutions: o Different granularities possible: processor core, ALU, bit slice o Inefficient! Needs extra logic. o Multiplexors may impact performance
3
3 Overview C7C7 C4C4 01234567 Defective prefix node impacts C 4 and C 7 0000…11010…0110 Minimized vectors Recover Checker The checker detects fault-triggering inputs at runtime.
4
4 Checker Unit: Comparison with A Redundancy-based Alternative Checker UnitRedundancy-based Recover 0xxx11x0 TCAM Operand Checker Does not affect the critical path Flexible checker unit (can be updated) Can detect failures in multiple units Affects the critical path (large muxes) Fixed design approach (can not be updated) Two out of three ALUs should always be fault-free
5
5 TCAM Overview TCAM can store test cubes with don’t-care bits Conventional TCAM supports arbitrary updates –Requires log N-to-N decoder with N entries –No decoder in our checker: sequentially initialize on power-on Can’t store all test vectors for a fault (TCAM too large) –Cube minimization helps, but not sufficient –Solution: add “false alarm” vectors to reduce cubes and TCAM entries needed
6
6 False Alarm Insertion to Minimize the TCAM Size: Example A B C D E ABCDE x10xx 0110x 11100 11101 x00x1 x0101 V1V1 V2V2 V3V3 V4V4 V5V5 V6V6 Identify cubes which excite fault
7
7 ABCDE x10xx 0110x 11100 11101 x00x1 x0101 V1V1 V2V2 V3V3 V4V4 V5V5 V6V6 ABCDE xx0x1 x10xx xxx01 x1x0x V1V1 V2V2 V3V3 V4V4 Test cube minimization False Alarm Insertion to Minimize the TCAM Size: Example
8
8 ABCDE xx0x1 x10xx xxx0x V1V1 V2V2 V3V3 ABCDE xx0x1 x10xx xxx01 x1x0x V1V1 V2V2 V3V3 V4V4 We reduce the number of test cubes from 6 to 3 Identify cubes which excite fault Test cube minimization Further minimization with False Alarm Insertion False Alarm Insertion to Minimize the TCAM Size: Example
9
9 False Alarm Insertion Problem Definition Too many “false alarms” makes checker useless Reduce the number of cubes by adding as few false alarm vectors as possible Why we need False Alarms? The number of test cubes translates to the number of entries in the TCAM Due to area budget, number of entries in TCAM is limited
10
10 Using Two-Level Logic Minimization Two-level logic minimization can be used to minimize the number of test cubes We expand the ESPRESSO* tool by inserting false alarm vectors to achieve higher minimization *ESPRESSO. http://embedded.eecs.berkeley.edu/pubs/downloads/espresso/.
11
11 False Alarm Insertion by Extending ESPRESSO F = IRREDUNDANT (F ON, F DC ) F = REDUCE (F ON, F DC ) F = EXPAND (F ON, F OFF ) F = IRREDUNDANT (F ON, F DC ) F = REDUCE (F ON, F DC ) F = EXPAND (F ON, F OFF ) Stop Minimization? Test cubes Minimized test cubes Overview of the main loop of ESPRESSO F = EXPAND-FA (F ON, F OFF ) F = IRREDUNDANT (F ON, F DC ) F = REDUCE (F ON, F DC ) F = EXPAND (F ON, F OFF ) F = EXPAND-FA (F ON, F OFF ) F = IRREDUNDANT (F ON, F DC ) F = REDUCE (F ON, F DC ) F = EXPAND (F ON, F OFF ) # vectors < threshold Minimized cubes Minimized cubes with false alarm Extension with False Alarm insertion
12
12 False Alarm Insertion Example EXPAND-FA IRREDUNDANT REDUCE EXPAND
13
13 False Alarm Insertion for One Cube A0A1 A2 A3 000x ON 1 xx1x x1xx 1xx1 OFF 1 OFF 2 OFF 3 Offset Matrix False Alarm Matrix 002- 020- 100- B1B1 B2B2 B3B3 122- 1 100 0 000 0 000 1 000 OFF 1 OFF 2 OFF 3 ON 1 ON 2 ON’ 2 False Alarm Matrix (i, j) –Entry (i, j) indicates false alarms between the off-set cube i and (the expanded) cube when literal j is dropped A0A1A0A1 A2A3A2A3
14
14 Simulation Configuration Single-failure scenarios in various nodes of 32-bit prefix adder (Brent-Kung) Generate all the test vectors for two failing cases modeled by a stuck-at-0 and stuck-at-1 using ATALANTA* ATPG toolset Using SPEC2006 suite for workload-dependent case Record the input arguments to the adder by running each benchmark on an X86 simulator Analyzing area overhead in 2-issue and 4-issue microprocessors *H.K. Lee and D.S. Ha. Atalanta: an efficient ATPG for combinational circuits. Technical Report; Department of Electrical Engineering, Virginia Polytechnic Institute and State University, pages 93 12, 1993.
15
15 Comparison of Probability of Detection Probability of detection: fraction of times that checker unit activates the recovery signal (false alarm or true positive) Average PoD degrades with decrease in the number of test cubes Average PoD after inserting false alarms does not degrade significantly in FA-128 or FA-64 or FA-32 compared to W/O FA This behavior is true for both workload-dependent and random cases
16
16 Comparing False Alarm Algorithms Default algorithm: only add false alarms to cubes with fewest don’t-cares Aggressive algorithm: can add false alarms to all cubes Each table entry indicates the fraction of false alarms ( ) –FA: num. false alarms –TP: num. true positives (input excites fault) Higher false positives with aggressive algorithm
17
17 Area Overhead Implemented approaches –Baseline: k+1. K-issue processor with 1 redundant component –K+TCAM: K-issue processor with checker implemented as TCAM –K+FPGA: K -issue processor with checker implemented as FPGA 2+TCAM has better area than 2+1 for 32 and 48 cubes 2+FPGA always has more area than baseline Similar behavior for 4+TCAM and 4+FPGA
18
18 Conclusion A new framework for online detection of failures at the operand level Design a flexible TCAM-based checker unit Propose a false alarm insertion algorithm to reduce the number TCAM entries needed Benefits: No impact on critical path ~10% area reduction Future work: Use checker unit for other modules Detect other failure modes such as delay path faults
19
19 Questions?
20
20 False Alarm Insertion Procedure Each call EXPAND-FA function expands multiple test cubes –How I sequentially go through the on-set? –Look at the paper –Which cube is selected to be expanded? –Same section – stopping criteria (when you reach the target number of cubes)
21
21 Introduction Technology scaling beyond 32nm degrades yield o To address: impose restrictive design rules or use regular fabrics [T. Jhaveri, SPIE’06] o Can also use configurable logic blocks for post-silicon corrections [Y. Ran, TVLSI’06] o Redundancy based techniques can also be used: o Exploiting existing redundancy in high performance processors [P. Shivakumar, ICCD’12][S. Shyam, ASPLOS’06][J. Srinivasan, ISCA’05] o Incorporate redundancy at the granularity of a bit slice [K. Namba, PRDC’05]
22
22 Contributions Checker Unit ModuleFalse Alarm Vectors Flexible option for online and operand-level fault detection Update faulty vectors over the time TCAM-based implementation which can store cubes with don’t care No extra logic on the critical paths Efficient use of false alarm vectors to reduce the number of vectors to be checked, thus reducing the TCAM area Integrate the false alarm insertion into ESPRESSO 2-level logic minimization tool The recovery flag is not falsely activated frequently
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.