Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.

Similar presentations


Presentation on theme: "University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan."— Presentation transcript:

1 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Overcoming Hard-Faults in High-Performance Microprocessors I2PC Talk Sept 15, 2011 Presented by: Amin Ansari

2 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Significance of Reliability 2 Mission Critical SystemsCommodity Systems Engine or Break CU Full Authority Digital Engine Financial Analysis or Transactions HP Tandem NonStop IBM z series Equipped with: Triple Modular Redundancy Watchdog Timer Error Correction Code Fault-Tolerant Scheduling Equipped with: Triple Modular Redundancy Watchdog Timer Error Correction Code Fault-Tolerant Scheduling Desktop or Server Processor ECCRAID Hard-Faults  Core disabling

3 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science  Main sources o Manufacturing defects o Process variation induced o In-field wearout o Ultra low-power operation  Have a direct impact on o Manufacturing yield o Performance o Lifetime throughput o Dependability of semiconductor parts Hard-Faults 3

4 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science  Happen due to o Silicon crystal defects o Random particles on the wafer o Fabrication impreciseness  ITRS: One defect per five 100mm 2 dies expected o A real threat for yield Manufacturing Defects 4

5 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science  Protecting HPµPs against hard- faults is more challenging o Contain billions of transistors  ↑ transistors per core (no core disabling) o Complex connectivity + many stages  Fine-grained redundancy is not cost-eff. o Higher clock frequency, voltage, temp.  ↑ operational stress accelerates aging o Operating at most aggressive V/F curve  Usage of high V/F guard-bands o Large on-chip caches  Bit-cell with worst timing characteristics dictates the V&F of the SRAM array Challenges with High-Performance µPs 5 [AMD, Phenom] [Intel, Nehalem] [IBM, POWER7]

6 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Archipelago [HPCA’11] Necromancer [ISCA’10, IEEE Micro’10] Archipelago protect on-chip caches against: ● Near-threshold failures ● Process variation ● Wearout and defects Outline 6 Necromancer protects general core area (non- cache parts) against: ● Manufacturing defects ● Wearout failures Objective: overcome hard-faults in high-performance µPs, with comprehensive, low-cost solutions for protecting the on-chip caches and also the non-cache parts of the core. Objective: overcome hard-faults in high-performance µPs, with comprehensive, low-cost solutions for protecting the on-chip caches and also the non-cache parts of the core. Archipelago [HPCA’11] Necromancer [ISCA’10, IEEE Micro’10]

7 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science NT Operation: SRAM Bit-Error-Rate  Extremely fast growth in failure rate with decreasing V dd 7

8 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Our Goal  Enabling DVS to push core’s V dd down to o Ultra low voltage region ( < 650mV ) o While preserving correct functionality of on-chip caches 8  Proposing a highly flexible and FT cache architecture that can efficiently tolerate these SRAM failures  Minimizing our overheads in high- power mode

9 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Archipelago (AP) 1 2 3 4 5 6 7 8 data chunk sacrificial line This particular cache has only a single functional line. This particular cache has only a single functional line. By forming autonomous islands, AP saves 6 out of 8 lines. By forming autonomous islands, AP saves 6 out of 8 lines. Island 1 Island 2 9

10 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Baseline AP Architecture 10 MUXing layer First BankSecond Bank Functional Line G3 Fault Map (10T) -- G3 Input Address Data line Sacrificial line Fault map address Memory Map (10T) Added modules: ● Memory map ● Fault map ● MUXing layer Two type of lines: ● data line ● sacrificial line  Two lines have collision, if they have at least one faulty chunk in the same position (blue and orange are collision free)  There should be no collision between lines within a group [Group 3 (G3) contains green, blue, and orange lines] S

11 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science AP with Relaxed Group Formation 11  Sacrificial lines do not contribute to the effective capacity o We want to minimize the total number of groups First Bank Second Bank S S First Bank Second Bank S

12 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Semi-Sacrificial Lines 12 First Bank Second Bank Sacrificial line Semi-sacrificial line MUXing Layer Accessed Line Lending Reclaiming  Semi-sacrificial line guarantees the parallel access  In contrast to a sacrificial line, it also contributes to the effective cache capacity  Semi-sacrificial line guarantees the parallel access  In contrast to a sacrificial line, it also contributes to the effective cache capacity

13 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science AP with Semi-Sacrificial Lines 13 S

14 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science AP Configuration  We model the problem as a graph: o Each node is a line of the cache. o Edge when there is no collision between nodes  A collision free group forms a clique o Group formation  Finding the cliques 14  To maximize the number of functional lines, we need to minimize the number of groups. o minimum clique cover (MCC).

15 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science AP Configuration Example 15 1 3 72 4 6 8 G2(1) G2(2) G1(2) G2(S) 5 9 10 1 2 3 4 5 8 7 6 9 G1(1) G2(3) G2(4) G1(S) G1(3) D First BankSecond Bank Island or Group 1 Island or Group 2 Disabled

16 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Operation Modes 16 High power mode ( AP is turned off)  There is no non-functional lines in this case  Clock gating to reduce dynamic power of SRAM structures Low power mode o During the boot time in low-power mode  BIST scans cache for potential faulty cells  Processor switches back to high power mode  Forms groups and configure the HW

17 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Minimum Achievable V dd 17

18 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Performance Loss  One extra cycle latency for L1 and 2 cycles for L2 18

19 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 19 Comparison with Alternative Methods Conventional Recently Proposed 10T : [Verma, ISSCC’08] ZC : [Ansari, MICRO’09] BF : [Wilkerson, ISCA’08]

20 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Archipelago: Summary  DVS is widely used to deal with high power dissipation o Minimum achievable voltage is bounded by SRAM structures  We proposed a highly flexible cache architecture o To tolerate failures when operating in near-threshold region  Using our approach o V dd of processor can be reduced to 375mV o 79% dynamic power saving and 51% leakage power saving o < 10% area overhead and performance overheads 20

21 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Archipelago protect on-chip caches against: ● Near-threshold failures ● Process variation ● Wearout and defects Outline 21 Necromancer protects general core area (non- cache parts) against: ● Manufacturing defects ● Wearout failures Archipelago [HPCA’11] Necromancer [ISCA’10, IEEE Micro’10]

22 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Necromancer (NM)  Given a CMP system, Necromancer o Utilizes a dead core (i.e., a core with a hard-fault) to do useful work o Enhances system throughput 22  There are proper techniques to protect caches  To maintain an acceptable level of yield, the processing cores need to be protected o More challenging due to inherent irregularity

23 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Impact of Hard-Faults on Program Execution 23  Distribution of injected hard-faults that manifest as architectural state mismatches across different latencies o Based on number of committed instructions before mismatch happening when starting from a valid architectural state More than 40% of the injected faults cause an immediate (less than 10K) architectural state mismatch. Thus, a faulty core cannot be trusted to provide correct functionality even for short periods of program execution.

24 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Relaxing Absolute Correctness Constraint 24  Distribution of injected faults resulting into similarity index mismatch across different latencies  Similarity Index: % of PCs matching between the faulty and golden execution (sample @1K instruction intervals) For an SI threshold of 90%, in more than 85% of cases, the dead core can successfully commit at least 100K instructions before its execution differs by more than 10%

25 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Using the Undead Core to Generate Hints 25  The execution behavior of a dead core coarsely matches the intact program execution for long time periods o How to exploit the program execution on the dead core?  Accelerating the execution of another core! o We extract useful information from the execution of the program on the dead core and sending this information (hints) to the other core (the animator core), running the same program. Undead Core Animator Core Hints Hard-fault Performance

26 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Opportunities for Acceleration 26  Perfect hints: o Perfect branch prediction and o No L1 cache miss Increasing complexity/resources  IPC of several Alpha cores, normalized to EV4’s IPC. In most cases, by providing perfect hints for the simpler cores (EV4, EV5, and EV4 (OoO)), these cores can achieve a performance comparable to that achieved by a 6-issue OoO EV6.

27 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Necromancer Architecture 27 A robust heterogeneous core coupling execution technique L1-Data Shared L2 cache Read-Only The Animator Core L1-Data Hint Gathering FET Memory Hierarchy Queue tailhead DECRENDISEXEMEMCOM FEDEREDIEXMECO Hint Distribution L1-Inst Cache Fingerprint Hint Disabling Resynchronization signal and hint disabling information The Undead Core ● No communication for L2 warm-up ● Most communications are from the undead core to the animator core except resynchronization and hint disabling signals. ● A single queue for sending hints and cache fingerprints ● Animator core is an older generation with the same ISA and less resources ● A 2-issue OoO EV4 (evaluation) ● Handles exceptions in NM coupled cores ● Treats $ hints as prefetching info ● Fuzzy hint disabling approach based on cont. monitoring of hints effectiveness ● PC & arch. registers for resynch ● Undead core executes the same program to provide hints for the AC. ● It works as “an external run-ahead engine for the AC”. ● A 6-issue OoO EV6 (evaluation) ● I$ hints: PC of committed instructions ● D$ hints: address of committed ld/strs ● Branch prediction hints: BP updates ● D$ dirty lines are dropped when they required to be replaced ● It can proceed on data L2 misses

28 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Example: Branch Prediction Hints 28 L1-Data Shared L2 cache Read-Only The Animator Core L1-Data Hint Gathering FET Memory Hierarchy Queue tailhead DECRENDISEXEMEMCOM FEDEREDIEXMECO Hint Distribution L1-Inst Cache Fingerprint Hint Disabling Resynchronization signal and hint disabling information The Undead Core PCNPC PC * NPCTypeAge Hint H Hint Format HHH Buffer H PC * NPCTypeAge Age tag ≤ num committed instructions + BP release window size Hint Disabling PC * NPC Original BP of AC PC * NPC NM Predictor PC * SC 1 SC 2 Tournament Predictor NPC Prediction Outcomes Original BPNM BPAction  --     Counter > Threshold Disable Hint

29 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science NM Design for CMP Systems 29

30 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Impact of Hard-Fault Location 30 Program Counter Instruction Fetch Queue Integer ALU

31 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Overheads 31

32 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Performance Gain 32 88% 71%

33 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Necromancer: Summary  Enhancing system throughput by exploiting dead cores  Necromancer leverages a set of microarchitectural techniques to provide o Intrinsically robust hints o Fine and coarse-grained hint disabling o Online monitoring of hints effectiveness o Dynamic state resynchronization between cores  Applying Necromancer to a 4-core CMP o On average, 88% of the original performance of the undead core can be retrieved o Modest area and power overheads of 5.3% and 8.5% 33

34 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Takeaways 34  To achieve efficient, reliable solutions o Runtime adaptability o High degree of re-configurability o Fine-grained spare substitution  Mission-critical and conventional reliability solutions are too expensive for modern high-perf. processors  AP: low-cost cache protection against major reliability threats in nanometer technologies  For processing core, redundancy  o NM an alternative to utilize dead cores

35 University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Thank You 35


Download ppt "University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan."

Similar presentations


Ads by Google