University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Encore: Low-Cost,

Slides:



Advertisements
Similar presentations
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Advertisements

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.
Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
Idempotent Code Generation: Implementation, Analysis, and Evaluation Marc de Kruijf ( ) Karthikeyan Sankaralingam CGO 2013, Shenzhen.
Alias Speculation using Atomic Regions (To appear at ASPLOS 2013) Wonsun Ahn*, Yuelu Duan, Josep Torrellas University of Illinois at Urbana Champaign.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
Rung-Bin Lin Chapter 4: Exploiting Instruction-Level Parallelism with Software Approaches4-1 Chapter 4 Exploiting Instruction-Level Parallelism with Software.
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical.
Program Representations. Representing programs Goals.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Shoestring: Probabilistic.
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
Hardware Fault Recovery for I/O Intensive Applications Pradeep Ramachandran, Intel Corporation, Siva Kumar Sastry Hari, NVDIA Manlap (Alex) Li, Latham.
Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Self-calibrated.
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
University of Michigan Electrical Engineering and Computer Science 1 StageNet: A Reconfigurable CMP Fabric for Resilient Systems Shantanu Gupta Shuguang.
University of Michigan Electrical Engineering and Computer Science 1 Online Timing Analysis for Wearout Detection Jason Blome, Shuguang Feng, Shantanu.
Efficient Software-Based Fault Isolation—sandboxing Presented by Carl Yao.
TASK ADAPTATION IN REAL-TIME & EMBEDDED SYSTEMS FOR ENERGY & RELIABILITY TRADEOFFS Sathish Gopalakrishnan Department of Electrical & Computer Engineering.
Presenter: Jyun-Yan Li Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors Pramod Subramanyan, Virendra.
Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible.
University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems.
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Eliminating Silent Data Corruptions caused by Soft-Errors Siva Hari, Sarita Adve, Helia Naeimi, Pradeep Ramachandran, University of Illinois at Urbana-Champaign,
Architectural Optimizations Ed Carlisle. DARA: A LOW-COST RELIABLE ARCHITECTURE BASED ON UNHARDENED DEVICES AND ITS CASE STUDY OF RADIATION STRESS TEST.
CML CML Compiler Optimization to Reduce Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture Lab Department of Computer.
Title of Selected Paper: IMPRES: Integrated Monitoring for Processor Reliability and Security Authors: Roshan G. Ragel and Sri Parameswaran Presented by:
Relyzer: Exploiting Application-level Fault Equivalence to Analyze Application Resiliency to Transient Faults Siva Hari 1, Sarita Adve 1, Helia Naeimi.
Idempotent Processor Architecture Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group UW-Madison MICRO 2011, Porto Alegre.
Limits of Instruction-Level Parallelism Presentation by: Robert Duckles CSE 520 Paper being presented: Limits of Instruction-Level Parallelism David W.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Bundled Execution.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Adaptive Online Testing.
ECE 4100/6100 Advanced Computer Architecture Lecture 2 Instruction-Level Parallelism (ILP) Prof. Hsien-Hsin Sean Lee School of Electrical and Computer.
Efficient software-based fault isolation Robert Wahbe, Steven Lucco, Thomas Anderson & Susan Graham Presented by: Stelian Coros.
Harnessing Soft Computation for Low-Budget Fault Tolerance Daya S Khudia Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan,
Low-cost Program-level Detectors for Reducing Silent Data Corruptions Siva Hari †, Sarita Adve †, and Helia Naeimi ‡ † University of Illinois at Urbana-Champaign,
D A C U C P Speculative Alias Analysis for Executable Code Manel Fernández and Roger Espasa Computer Architecture Department Universitat Politècnica de.
EnerJ: Approximate Data Types for Safe and General Low-Power Computation (PLDI’2011) Adrian Sampson, Werner Dietl, Emily Fortuna Danushen Gnanapragasam,
UT-Austin CART 1 Mechanisms for Streaming Architectures Stephen W. Keckler Computer Architecture and Technology Laboratory Department of Computer Sciences.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Efficient Soft Error.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
GangES: Gang Error Simulation for Hardware Resiliency Evaluation Siva Hari 1, Radha Venkatagiri 2, Sarita Adve 2, Helia Naeimi 3 1 NVIDIA Research, 2 University.
University of Michigan Electrical Engineering and Computer Science Dynamic Voltage/Frequency Scaling in Loop Accelerators using BLADES Ganesh Dasika 1,
University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.
Optimistic Hybrid Analysis
Raghuraman Balasubramanian Karthikeyan Sankaralingam
idempotent (ī-dəm-pō-tənt) adj
Chapter 10 The Stack.
Daya S Khudia, Griffin Wright and Scott Mahlke
Hwisoo So. , Moslem Didehban#, Yohan Ko
Fault Injection: A Method for Validating Fault-tolerant System
InCheck: An In-application Recovery Scheme for Soft Errors
Sampoorani, Sivakumar and Joshua
How to improve (decrease) CPI
Loop-Level Parallelism
COMP755 Advanced Operating Systems
Overview of Exception Handling Implementation in Open64
Presentation transcript:

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Encore: Low-Cost, Fine-Grained Transient Fault Recovery Authors:Shuguang Feng* Shantanu Gupta Amin Ansari Scott Mahlke David August University of Michigan *Currently with Northrop Grumman, Information Systems Sector

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 2 Negative Bias Temperature Instability Oxide Breakdown Electromigration Packaging Impurities Cosmic Radiation PVT Variation [ Gupta`09 ] …many ways to fail [ Dreslinski`10 ] NTC Computing “Failure to prepare is preparing to fail…” - Benjamin Franklin The distinction between a transient and permanent fault is becoming blurred Transient (“soft”) Faults RareContinuousPeriodic Permanent (“hard”) Faults  Many permanent faults, particularly wearout-induced faults, initially manifest as timing errors.

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 3 The Future of Soft Errors Past Present Future Aggressive voltage scaling (near-threshold computing) One failure per MONTH per 100 chips One failure per DAY per 100 chips One failure per DAY per chip

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 4 Realizing a Reliability “Pipeline”  Recent interest in low-cost fault detection  ReStore [DSN`05]  SWAT [ASPLOS`08]  Shoestring [ASPLOS`10]  Not perfect…but very low-cost  Generally involves some form of rollback/re-execution 1)Identify fault site 2)Restore processor to pre-fault state, before 1) 3)Resume execution from 1)  Many low-cost detection techniques rely on hardware speculation support Commodity systems present both challenges and opportunities Challenge: HW speculation support (if it exists) is limited Challenge: Cannot afford expensive, heavyweight SW checkpointing Opportunity: Typically not running mission-critical applications  Sacrifice a small degree of reliability Exploit (probabilistic) idempotence in program execution Commodity systems present both challenges and opportunities Challenge: HW speculation support (if it exists) is limited Challenge: Cannot afford expensive, heavyweight SW checkpointing Opportunity: Typically not running mission-critical applications  Sacrifice a small degree of reliability Exploit (probabilistic) idempotence in program execution

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 5 The Role of Idempotence  Mathematical Definition:  an operation that can be applied multiple times without changing the result  Computer Science Definition:  a region of code without any exposed write-after-read (WAR, anti-) dependencies Non-idempotentIdempotent … … … = X … … X++ … … X = … X Idempotent code regions can be safely re-executed without additional checkpointing

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 6 Does Idempotence Exist? Selectively checkpointing a *few* offending stores

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 7 Challenges to Exploiting Idempotence  Must identify where to resume execution 1)Control flow 2)Rollback distance  Statically identifying optimal rollback distance is inherently intractable  ↑ rollback dist. → ↑ Pr(recoverable)  ↓ rollback dist. → ↑ Pr(idempotent)  Simplifying engineering solution based on single-entry, multiple-exit (SEME) regions Execution Path X bb’ bb 7 bb 3 bb 4 bb 6 bb 5 bb 2 bb 1 bb 6 X X 

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Code Partitioning (CFG-based) 8 Encore Vision Source Code Idempotence Analysis (per region) …= X X++ … … … = X Idempotent Non-idempotent X++ …= X X++ … … … = X Chkpt X Recovery Runtime Behavior (post-fault) Recovery Chkpt X Instrumentation (per region) Fault Detected Redirect Control Restore State

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 9 Identifying Idempotence (High-level) bb 2 bb7 bb 1 bb 8 bb 6 bb 5 bb 3bb 4  With respect to a point, p, in the CFG…  Reachable Stores (RS)  A store that may execute after p  Guarded Addresses (GA)  An address that is guaranteed to be overwritten before reaching p  Exposed Addresses (EA)  An address that may be referenced by an unguarded load prior to p  Idempotent IFF EA ∩ RS = Ø bb 6 bb7 bb 8 bb 2 bb 3bb 4bb 3bb 4 bb 1 Additional Details… 1)Applies to both memory and registers  Static, conservative alias analysis 2)Scalable hierarchical analysis  Handles cyclic code Additional Details… 1)Applies to both memory and registers  Static, conservative alias analysis 2)Scalable hierarchical analysis  Handles cyclic code

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science *Restore B Restore R1 Restore R2 … Restore Rn bb r 10 Code Instrumentation MemCopy B Save Address[B] “On-demand” Checkpointing Recovery Code *Restore B bb r Save R1 Save R2 … Save Rn Live-in Checkpointing bb 0 Upon Fault Detection bb 2 bb7 bb 1 bb 8 bb 6 bb 5 bb 3bb 4 … 1: Store A … 6: Load B … 2: Store B … 3: Store C … 4: Load A … 5: Store C … 9: Store A … 10: Store B … 11: Load C … 7: Load B … 8: Load C … 12: Store C … # # + + Encore Heuristics 1)Selectively prune dynamically-dead code  ↓ offending stores → ↑ Pr(idempotent) 2)Selectively fuse adjacent regions  ↑ region size → ↑ Pr(recoverable) 3)Selectively instrument profitable regions Encore Heuristics 1)Selectively prune dynamically-dead code  ↓ offending stores → ↑ Pr(idempotent) 2)Selectively fuse adjacent regions  ↑ region size → ↑ Pr(recoverable) 3)Selectively instrument profitable regions

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 11 Lightweight Checkpointing STACK data_1 addr_1 data_1 addr_1 data_N addr_N data_N addr_N data_0 addr_0 data_0 addr_0 Live-in Registers Local Variables Return Address Input Parameters Traditional Call Stack Encore Extensions Frame Pointer Stack Pointer 1 reg2mem store 1 mem2mem copy 1 stack ptr increment 1 reg2mem store 1 mem2mem copy 1 stack ptr increment Stack grows dynamically to accommodate checkpoint storage

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 12 Evaluation Methodology  Program analysis/instrumentation performed in the LLVM compiler  In-order, single-issue, embedded-class processor  Dynamic instruction model based on profiled execution  Reliability coverage  Analytical model in lieu of traditional fault injection  Decouples evaluation from microarchitectural details

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 13 Inherent Idempotence 0% (dynamically-dead) <5% <10% 76% of application code is naturally idempotent

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 14 Dynamic Execution Breakdown Impact of detection latency  If control has left the region containing the original fault site, re-execution cannot correct the error Impact of detection latency  If control has left the region containing the original fault site, re-execution cannot correct the error 91% of execution time is spent within recoverable regions

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Existing (~100 instrs) Future (~10 instrs) Future (~1000 instrs) 15 Full System “Coverage” 93% − 99.99% coverage, highly application dependent

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 16 Overheads 3% − 22% performance degradation

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 17 Summary  Large portions of applications, across domains, are (probabilistically) idempotent  Encore is a software-only solution that exploits this property to provide low-cost fault recovery  97% of faults on average are recoverable with current detection schemes 15% performance penalty  Implementing Encore in a runtime system / virtual machine has the potential to yield even better results  Larger dynamic traces v. static intervals  Dynamic v. static memory analysis

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Questions? 18