UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.

Slides:



Advertisements
Similar presentations
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Advertisements

IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.
Idempotent Code Generation: Implementation, Analysis, and Evaluation Marc de Kruijf ( ) Karthikeyan Sankaralingam CGO 2013, Shenzhen.
Alias Speculation using Atomic Regions (To appear at ASPLOS 2013) Wonsun Ahn*, Yuelu Duan, Josep Torrellas University of Illinois at Urbana Champaign.
Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical.
Using Hardware Vulnerability Factors to Enhance AVF Analysis Vilas Sridharan RAS Architecture and Strategy AMD, Inc. International Symposium on Computer.
CSCI 4717/5717 Computer Architecture
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
OCIN Workshop Wrapup Bill Dally. Thanks To Funding –NSF - Timothy Pinkston, Federica Darema, Mike Foster –UC Discovery Program Organization –Jane Klickman,
Architectural Support for Operating Systems. Announcements Most office hours are finalized Assignments up every Wednesday, due next week CS 415 section.
2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
(C) 2004 Daniel SorinDuke Architecture Using Speculation to Simplify Multiprocessor Design Daniel J. Sorin 1, Milo M. K. Martin 2, Mark D. Hill 3, David.
University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
GPU-Qin: A Methodology For Evaluating Error Resilience of GPGPU Applications Bo Fang , Karthik Pattabiraman, Matei Ripeanu, The University of British.
1 Layers of Computer Science, ISA and uArch Alexander Titov 20 September 2014.
A Portable Virtual Machine for Program Debugging and Directing Camil Demetrescu University of Rome “La Sapienza” Irene Finocchi University of Rome “Tor.
UW-Madison Computer Sciences Vertical Research Group© 2010 A Unified Model for Timing Speculation: Evaluating the Impact of Technology Scaling, CMOS Design.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
ECE 753: FAULT-TOLERANT COMPUTING Kewal K.Saluja Department of Electrical and Computer Engineering HIGH Level Fault-Tolerance: Checkpointing and recovery.
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
Using Loop Perforation to Dynamically Adapt Application Behavior to Meet Real-Time Deadlines Henry Hoffmann, Sasa Misailovic, Stelios Sidiroglou, Anant.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Architectural Optimizations Ed Carlisle. DARA: A LOW-COST RELIABLE ARCHITECTURE BASED ON UNHARDENED DEVICES AND ITS CASE STUDY OF RADIATION STRESS TEST.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Encore: Low-Cost,
(C) 2003 Daniel SorinDuke Architecture Dynamic Verification of End-to-End Multiprocessor Invariants Daniel J. Sorin 1, Mark D. Hill 2, David A. Wood 2.
Computer Architecture And Organization UNIT-II General System Architecture.
Application Heartbeats Henry Hoffmann, Jonathan Eastep, Marco Santambrogio, Jason Miller, Anant Agarwal CSAIL Massachusetts Institute of Technology Cambridge,
Idempotent Processor Architecture Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group UW-Madison MICRO 2011, Porto Alegre.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
Lecture 2: Computer Architecture: A Science ofTradeoffs.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Harnessing Soft Computation for Low-Budget Fault Tolerance Daya S Khudia Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan,
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.
Middleware for Fault Tolerant Applications Lihua Xu and Sheng Liu Jun, 05, 2003.
The CRISP Performance Model for Dynamic Voltage and Frequency Scaling in a GPGPU Rajib Nath, Dean Tullsen 1 Micro 2015.
GangES: Gang Error Simulation for Hardware Resiliency Evaluation Siva Hari 1, Radha Venkatagiri 2, Sarita Adve 2, Helia Naeimi 3 1 NVIDIA Research, 2 University.
University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.
Presenter: Yi-Ting Chung Fast and Scalable Hybrid Functional Verification and Debug with Dynamically Reconfigurable Co- simulation.
QUANTUM COMPUTING: Quantum computing is an attempt to unite Quantum mechanics and information science together to achieve next generation computation.
15-740/ Computer Architecture Lecture 3: Performance
Raghuraman Balasubramanian Karthikeyan Sankaralingam
Control Unit Lecture 6.
Ph.D. in Computer Science
nZDC: A compiler technique for near-Zero silent Data Corruption
FPGA: Real needs and limits
idempotent (ī-dəm-pō-tənt) adj
Automatic Detection of Extended Data-Race-Free Regions
Henk Corporaal TUEindhoven 2009
CS775: Computer Architecture
Hwisoo So. , Moslem Didehban#, Yohan Ko
Reducing Memory Reference Energy with Opportunistic Virtual Caching
Middleware for Fault Tolerant Applications
†UCSD, ‡UCSB, EHTZ*, UNIBO*
Computer Architecture: A Science of Tradeoffs
CSC3050 – Computer Architecture
Co-designed Virtual Machines for Reliable Computer Systems
The University of Adelaide, School of Computer Science
Sculptor: Flexible Approximation with
Overview of Exception Handling Implementation in Open64
University of Wisconsin-Madison Presented by: Nick Kirchem
Stream-based Memory Specialization for General Purpose Processors
Presentation transcript:

UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou Nomura Karthikeyan Sankaralingam

ISCA Executive Summary  Problem  Technology is driving simple hardware  Fault recovery requires complex hardware  Software Recovery  Enables simple hardware  High energy efficiency  Relax: An Architectural Framework for Software Recovery  ISA:a well-defined interface for software recovery  Software: support to use the ISA  Hardware:support to implement the ISA

ISCA Architecture Trend Energy efficiency Hardware simplification

ISCA Search Computer Vision Data Mining Media Processing Scientific Computing … Applications Trend Data-intensive, error- tolerant applications Architecture Trend Energy efficiency Hardware simplification

ISCA Vdd OutIn CMOS Trend Device variability, wear-out, soft errors Search Computer Vision Data Mining Media Processing Scientific Computing … Applications Trend Data-intensive, error- tolerant applications Architecture Trend Energy efficiency Hardware simplification

CMOS Trend Device variability, wear-out, soft errors Hardware Recovery Software Recovery Applications Trend Data-intensive, error- tolerant applications Inefficient No flexibility Checkpoints conservative Efficient Error tolerance Natural recovery points ISCA Vdd OutIn Search Computer Vision Data Mining Media Processing Scientific Computing … Architecture Trend Energy efficiency Hardware simplification Simple Hardware No speculative state Recovery Support Is Needed Complex Hardware Speculative state

ISCA Relax Software Recovery Hardware Detection ISA

ISCA ISA Software Hardware Relax

ISCA ISA SIMPLE HARDWARE application error tolerance software-defined recovery simplicity energy efficiency flexibility Software defines recovery handler Hardware detects and jumps to handler on fault and is allowed to commit corrupted state * rlx RECOVER... RECOVER:... rlx RECOVER... RECOVER:... * Details in paper

ISCA ISA Software Hardware

ISCA Software int sad(int *left, int *right, int len) int sum = 0; for (int i = 0; i < len; ++i) { sum += abs(left[i] - right[i]); } return sum; } SAD (Sum of Absolute Differences) Example (adapted from a H.264 video encoder)

ISCA Software int sad(int *left, int *right, int len) int sum = 0; for (int i = 0; i < len; ++i) { sum += abs(left[i] - right[i]); } return sum; } SAD (Sum of Absolute Differences) Example int sad(int *left, int *right, int len) int sum = 0; for (int i = 0; i < len; ++i) { sum += abs(left[i] - right[i]); return sum; } (adapted from a H.264 video encoder) raw encoded 1.No writes to memory 2.Idempotent 3.Recoverable by re-execution SIMPLE + INTUITIVE + FLEXIBLE

ISCA ISA Hardware Software

ISCA  Microarchitecture 1.Fine-grained hardware detection (e.g. Argus) 2.Recovery PC register + control logic Hardware SIMPLE MICROARCHITECTURE

ISCA Homogenous Relax All cores with no hardware recovery support Hardware Organization “Relaxed” cores No hardware recovery Normal cores Hardware recovery Dynamically Heterogeneous Relax Hardware recovery adaptively disabled Statically Heterogeneous Relax Some cores with; some cores without FLEXIBLE DESIGN

ISCA ISA Software Hardware Evaluation

ISCA Evaluation Is it useful? How useful is it?

ISCA Is it Useful? Application NamePercent Execution Time Contribution of Function BarnesHut (Lonestar)>99.9% bodytrack (PARSEC)21.9% canneal (PARSEC)89.4% ferret (PARSEC)15.7% kmeans (MineBench)83.3% raytrace (PARSEC)49.4% x264 (PARSEC)49.2% Language support using LLVM One relax region per application (most dominant function) Retry and discard behavior 7 Applications IT WORKS!

ISCA How Useful Is It? Software recovery for timing speculation

ISCA Methodology  Instruction-level fault injection  Execution time model  Statically Heterogeneous Architecture  Energy model  Energy-delay product (EDP)  Analytical model for hardware efficiency

ISCA Results – Execution Time * error rates range from to errors/cycle Execution time overhead is less than 10% and 1% typical Discard performance is comparable to retry

ISCA Results – Energy-delay * error rates range from to errors/cycle Relax achieves energy improvements for timing speculation

ISCA Future Work  Better software support  Compiler automation?  Binary instrumentation?  Nesting relax blocks?  Hardware support  What are the chip-level area and power savings?  Is Relax hardware truly simpler?  Other domains  Software rollback for hardware transactional memory?  Tools to assist analysis of “discard”  Discard is hard to reason about; non-deterministic

ISCA Summary  Emerging Architectures  Many-core architectures are simple  Hardware fault recovery is complex  Emerging Applications  Error tolerant  Large idempotent regions  Software Recovery is a natural fit  Relax : an architectural framework for software recovery  ISA:an interface to define it  Software: support for applications to use it  Hardware:hardware that enables it

ISCA ?

ISCA ISA Semantics  Errors must be “spatially contained” to the target resources of a relax block  Misdirected stores and register not recoverable by Relax!  Errors must be “temporally contained” to the scope of a relax block  ECC (or other technique) necessary for memory  Cache coherence, cache writeback, etc. require other mechanisms  Control flow must be “legal” (follow static control flow edges)  Includes hardware exceptions (must wait on detection before trap)  Atomic operations (e.g. atomic increment) are problematic  Not supported (sorry) ISCA

ISCA Fault Detection  Short latencies important for  Detecting misdirected stores  Detecting misdirected register writes  Otherwise, latencies depend on region sizes  50 cycle regions + 5 cycle latency = 10% overhead  Average region sizes in paper = 1000 cycles  Then, 10 cycle latency = 1% overhead

ISCA “Optimal” Error Rate Error rate EDP Time EDP Hardware Efficiency Execution Time Overall Efficiency optimum