Presentation is loading. Please wait.

Presentation is loading. Please wait.

UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

Similar presentations


Presentation on theme: "UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou."— Presentation transcript:

1

2 UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou Nomura Karthikeyan Sankaralingam

3 ISCA 2010 - 3 Executive Summary  Problem  Technology is driving simple hardware  Fault recovery requires complex hardware  Software Recovery  Enables simple hardware  High energy efficiency  Relax: An Architectural Framework for Software Recovery  ISA:a well-defined interface for software recovery  Software: support to use the ISA  Hardware:support to implement the ISA

4 ISCA 2010 - 4 Architecture Trend Energy efficiency Hardware simplification

5 ISCA 2010 - 5 Search Computer Vision Data Mining Media Processing Scientific Computing … Applications Trend Data-intensive, error- tolerant applications Architecture Trend Energy efficiency Hardware simplification 100110101101 001011001010 111001010111 000100001101

6 ISCA 2010 - 6 Vdd OutIn CMOS Trend Device variability, wear-out, soft errors Search Computer Vision Data Mining Media Processing Scientific Computing … Applications Trend Data-intensive, error- tolerant applications Architecture Trend Energy efficiency Hardware simplification

7 CMOS Trend Device variability, wear-out, soft errors Hardware Recovery Software Recovery Applications Trend Data-intensive, error- tolerant applications Inefficient No flexibility Checkpoints conservative Efficient Error tolerance Natural recovery points ISCA 2010 - 7 Vdd OutIn Search Computer Vision Data Mining Media Processing Scientific Computing … Architecture Trend Energy efficiency Hardware simplification Simple Hardware No speculative state Recovery Support Is Needed Complex Hardware Speculative state

8 ISCA 2010 - 8 Relax Software Recovery Hardware Detection ISA

9 ISCA 2010 - 9 ISA Software Hardware Relax

10 ISCA 2010 - 10 ISA SIMPLE HARDWARE application error tolerance software-defined recovery simplicity energy efficiency flexibility Software defines recovery handler Hardware detects and jumps to handler on fault and is allowed to commit corrupted state * rlx RECOVER... RECOVER:... rlx RECOVER... RECOVER:... * Details in paper

11 ISCA 2010 - 11 ISA Software Hardware

12 ISCA 2010 - 12 Software int sad(int *left, int *right, int len) int sum = 0; for (int i = 0; i < len; ++i) { sum += abs(left[i] - right[i]); } return sum; } SAD (Sum of Absolute Differences) Example (adapted from a H.264 video encoder)

13 ISCA 2010 - 13 Software int sad(int *left, int *right, int len) int sum = 0; for (int i = 0; i < len; ++i) { sum += abs(left[i] - right[i]); } return sum; } SAD (Sum of Absolute Differences) Example int sad(int *left, int *right, int len) int sum = 0; for (int i = 0; i < len; ++i) { sum += abs(left[i] - right[i]); return sum; } (adapted from a H.264 video encoder) raw encoded 1.No writes to memory 2.Idempotent 3.Recoverable by re-execution SIMPLE + INTUITIVE + FLEXIBLE

14 ISCA 2010 - 14 ISA Hardware Software

15 ISCA 2010 - 15  Microarchitecture 1.Fine-grained hardware detection (e.g. Argus) 2.Recovery PC register + control logic Hardware SIMPLE MICROARCHITECTURE

16 ISCA 2010 - 16 Homogenous Relax All cores with no hardware recovery support Hardware Organization “Relaxed” cores No hardware recovery Normal cores Hardware recovery Dynamically Heterogeneous Relax Hardware recovery adaptively disabled Statically Heterogeneous Relax Some cores with; some cores without FLEXIBLE DESIGN

17 ISCA 2010 - 17 ISA Software Hardware Evaluation

18 ISCA 2010 - 18 Evaluation Is it useful? How useful is it?

19 ISCA 2010 - 19 Is it Useful? Application NamePercent Execution Time Contribution of Function BarnesHut (Lonestar)>99.9% bodytrack (PARSEC)21.9% canneal (PARSEC)89.4% ferret (PARSEC)15.7% kmeans (MineBench)83.3% raytrace (PARSEC)49.4% x264 (PARSEC)49.2% Language support using LLVM One relax region per application (most dominant function) Retry and discard behavior 7 Applications IT WORKS!

20 ISCA 2010 - 20 How Useful Is It? Software recovery for timing speculation

21 ISCA 2010 - 21 Methodology  Instruction-level fault injection  Execution time model  Statically Heterogeneous Architecture  Energy model  Energy-delay product (EDP)  Analytical model for hardware efficiency

22 ISCA 2010 - 22 Results – Execution Time * error rates range from 10 -3 to 10 -6 errors/cycle Execution time overhead is less than 10% and 1% typical Discard performance is comparable to retry

23 ISCA 2010 - 23 Results – Energy-delay * error rates range from 10 -3 to 10 -6 errors/cycle Relax achieves energy improvements for timing speculation

24 ISCA 2010 - 24 Future Work  Better software support  Compiler automation?  Binary instrumentation?  Nesting relax blocks?  Hardware support  What are the chip-level area and power savings?  Is Relax hardware truly simpler?  Other domains  Software rollback for hardware transactional memory?  Tools to assist analysis of “discard”  Discard is hard to reason about; non-deterministic

25 ISCA 2010 - 25 Summary  Emerging Architectures  Many-core architectures are simple  Hardware fault recovery is complex  Emerging Applications  Error tolerant  Large idempotent regions  Software Recovery is a natural fit  Relax : an architectural framework for software recovery  ISA:an interface to define it  Software: support for applications to use it  Hardware:hardware that enables it

26 ISCA 2010 - 26 ?

27 ISCA 2010 - 27 ISA Semantics  Errors must be “spatially contained” to the target resources of a relax block  Misdirected stores and register not recoverable by Relax!  Errors must be “temporally contained” to the scope of a relax block  ECC (or other technique) necessary for memory  Cache coherence, cache writeback, etc. require other mechanisms  Control flow must be “legal” (follow static control flow edges)  Includes hardware exceptions (must wait on detection before trap)  Atomic operations (e.g. atomic increment) are problematic  Not supported (sorry) ISCA 2010 - 27

28 ISCA 2010 - 28 Fault Detection  Short latencies important for  Detecting misdirected stores  Detecting misdirected register writes  Otherwise, latencies depend on region sizes  50 cycle regions + 5 cycle latency = 10% overhead  Average region sizes in paper = 1000 cycles  Then, 10 cycle latency = 1% overhead

29 ISCA 2010 - 29 “Optimal” Error Rate Error rate EDP Time EDP Hardware Efficiency Execution Time Overall Efficiency optimum


Download ppt "UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou."

Similar presentations


Ads by Google