NC STATE UNIVERSITY ASPLOS-XII Understanding Prediction-Based Partial Redundant Threading for Low-Overhead, High-Coverage Fault Tolerance Vimal Reddy Sailashri.

Slides:



Advertisements
Similar presentations
NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer.
Advertisements

Topics Left Superscalar machines IA64 / EPIC architecture
UPC Compiler Support for Trace-Level Speculative Multithreaded Architectures Antonio González λ,ф Carlos Molina ψ Jordi Tubella ф INTERACT-9, San Francisco.
NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for.
IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.
EECS 470 Lecture 8 RS/ROB examples True Physical Registers? Project.
Federation: Repurposing Scalar Cores for Out- of-Order Instruction Issue David Tarjan*, Michael Boyer, and Kevin Skadron* University of Virginia Department.
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors Fred A. Bower, Daniel J. Sorin, and Sule Ozev.
Transient Fault Detection and Recovery via Simultaneous Multithreading Nevroz ŞEN 26/04/2007.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Advanced Computer Architecture Lab University of Michigan MASE Eric Larson MASE: Micro Architectural Simulation Environment Eric Larson, Saugata Chatterjee,
IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of.
Using Hardware Vulnerability Factors to Enhance AVF Analysis Vilas Sridharan RAS Architecture and Strategy AMD, Inc. International Symposium on Computer.
NC STATE UNIVERSITY DSN 2007 © Vimal Reddy Inherent Time Redundancy (ITR): Using Program Repetition for Low-Overhead Fault Tolerance Vimal Reddy.
CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
Hardware Fault Recovery for I/O Intensive Applications Pradeep Ramachandran, Intel Corporation, Siva Kumar Sastry Hari, NVDIA Manlap (Alex) Li, Latham.
Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC.
Glenn Reinman, Brad Calder, Department of Computer Science and Engineering, University of California San Diego and Todd Austin Department of Electrical.
UPC Reducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures Carlos Molina ψ, ф Jordi Tubella ф Antonio González λ,ф ISHPC-VI,
An Integrated Framework for Dependable Revivable Architectures Using Multi-core Processors Weiding Shi, Hsien-Hsin S. Lee, Laura Falk, and Mrinmoy Ghosh.
Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.
Cost-Efficient Soft Error Protection for Embedded Microprocessors
Slipstream Processors by Pujan Joshi1 Pujan Joshi May 6 th, 2008 Slipstream Processors Improving both Performance and Fault Tolerance.
CS 7810 Lecture 21 Threaded Multiple Path Execution S. Wallace, B. Calder, D. Tullsen Proceedings of ISCA-25 June 1998.
1 Multi-Level Error Detection Scheme based on Conditional DIVA-Style Verification Kevin Lacker and Huifang Qin CS252 Project Presentation 12/10/2003.
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical.
Presenter: Jyun-Yan Li Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors Pramod Subramanyan, Virendra.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible.
Transient Fault Detection via Simultaneous Multithreading Shubhendu S. Mukherjee VSSAD, Alpha Technology Compaq Computer Corporation.
1 Transient Fault Recovery For Chip Multiprocessors Mohamed Gomaa, Chad Scarbrough, T. N. Vijaykumar and Irith Pomeranz School of Electrical and Computer.
Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.
ReSlice: Selective Re-execution of Long-retired Misspeculated Instructions Using Forward Slicing Smruti R. Sarangi, Wei Liu, Josep Torrellas, Yuanyuan.
Code Size Efficiency in Global Scheduling for ILP Processors TINKER Research Group Department of Electrical & Computer Engineering North Carolina State.
Implicitly-Multithreaded Processors Il Park and Babak Falsafi and T. N. Vijaykumar Presented by: Ashay Rane Published in: SIGARCH Computer Architecture.
|Processors designed for low power |Architectural state is correct at basic block granularity rather than instruction granularity 2.
Relyzer: Exploiting Application-level Fault Equivalence to Analyze Application Resiliency to Transient Faults Siva Hari 1, Sarita Adve 1, Helia Naeimi.
Idempotent Processor Architecture Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group UW-Madison MICRO 2011, Porto Alegre.
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 3) Jonathan Winter.
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
Sandeep Navada © 2013 A Unified View of Non-monotonic Core Selection and Application Steering in Heterogeneous Chip Multiprocessors Sandeep Navada, Niket.
Computer Architecture: Multithreading (IV) Prof. Onur Mutlu Carnegie Mellon University.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.
Low-cost Program-level Detectors for Reducing Silent Data Corruptions Siva Hari †, Sarita Adve †, and Helia Naeimi ‡ † University of Illinois at Urbana-Champaign,
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 2) Jonathan Winter.
Samira Khan University of Virginia Feb 9, 2016 COMPUTER ARCHITECTURE CS 6354 Precise Exception The content and concept of this course are adapted from.
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical.
University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.
CS203 – Advanced Computer Architecture ILP and Speculation.
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
Computer Architecture: Multithreading (III)
nZDC: A compiler technique for near-Zero silent Data Corruption
PowerPC 604 Superscalar Microprocessor
Hwisoo So. , Moslem Didehban#, Yohan Ko
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Computer Architecture: Multithreading (IV)
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Mengjia Yan† , Jiho Choi† , Dimitrios Skarlatos,
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
Patrick Akl and Andreas Moshovos AENAO Research Group
Transparent Control Independence (TCI)
ECE 721 Alternatives to ROB-based Retirement
ECE 721 Modern Superscalar Microarchitecture
Spring 2019 Prof. Eric Rotenberg
Presentation transcript:

NC STATE UNIVERSITY ASPLOS-XII Understanding Prediction-Based Partial Redundant Threading for Low-Overhead, High-Coverage Fault Tolerance Vimal Reddy Sailashri Parthasarathy † Eric Rotenberg Dept. of Electrical & Computer Engineering North Carolina State University Raleigh, NC † Architecture Modeling Infrastructure Group Intel Corporation Hudson, MA

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 2 Transient Fault Tolerance Transient faults –Temporary hardware faults –Worsening with shrinking technology Soft errors Noise Prominent solution: Redundant Multithreading –Full program duplication –Complete fault tolerance –High overheads

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 3 Full Redundant Execution Full duplication 100% fault coverage fault

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 4 Partial Redundant Threading (PRT) Only partially duplicate a program Shorter the redundant thread, lesser the overhead Approach taken to create partial thread affects fault tolerance

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 5 fault Conventional PRT

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 6 fault Conventional PRT Arbitrary duplication Non-duplicated portions lose fault coverage

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 7 Prediction-based PRT Confident prediction of A Freq. Case: Correct prediction (e.g., 99.9% of the time) fault

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 8 fault Prediction-based PRT Confident prediction of A Freq. Case: Correct prediction (e.g., 99.9% of the time)

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 9 fault Prediction-based PRT Confident prediction of A Rare case: Incorrect prediction (e.g., 0.1% of the time)

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 10 Prediction-based PRT Confident predictions are good proxies for redundant execution Predictions break thread inter-dependence Near-100% fault coverage

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 11 Relaxing checking constraints - S(p) and T(p) are removable

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 12 Relaxing checking constraints - S(p) and T(p) are removable - Such removal can shorten partial thread significantly - But, no predictions to replace them fault

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 13 PRT Spectrum Case study: PRT on Slipstream M. Gomaa T. N. Vijaykumar ISCA 2005 Z. Purser K. Sundermoorthy E. Rotenberg MICRO 2000 ASPLOS 2000 N. Wang S. J. Patel DSN 2004

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 14 Slipstream Overview Delay Buffer Branch + Value outcomes verify SMT Processor Verified Architectural State R-stream Unverified Architectural State A-stream Outcome mismatch copy restart

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 15 Slipstream Overview Delay Buffer Branch + Value outcomes verify Verified Architectural State R-streamA-stream Unverified Architectural State Slipstream Components remove Detect predictable instructions Predict based on repeated detection (confidence) SMT Processor monitor

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 16 Slipstream Overview Delay Buffer Branch + Value outcomes verify Verified Architectural State R-streamA-stream Unverified Architectural State Slipstream Components remove SMT Processor Predictions act as proxies Detect predictable instructions Predict based on repeated detection (confidence) monitor

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 17 Slipstream Overview Delay Buffer Branch + Value outcomes verify Verified Architectural State R-streamA-stream Unverified Architectural State Slipstream Components remove Confident branches Confident dead writes Confident silent writes Slices whose leaves are any of the above SMT Processor Predictions act as proxies Detect predictable instructions Predict based on repeated detection (confidence) monitor

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 18 Confident Branch Correct Branch Prediction (Taken) Main (R-stream)Partial (A-stream) (Not Taken) fault

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 19 Confident Branch Main (R-stream)Partial (A-stream) (Not Taken) fault Incorrect Branch Prediction (Not Taken)

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 20 Main (R-stream)Partial (A-stream) fault Correct Prediction (Dead) Confident Dead Write

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 21 Main (R-stream)Partial (A-stream) fault Incorrect Prediction (Not Dead) ? Confident Dead Write

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 22 Main (R-stream)Partial (A-stream) Confident Silent Write/Store 72

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 23 Main (R-stream)Partial (A-stream) Confident Silent Write/Store 33 33

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 24 Main (R-stream)Partial (A-stream) Confident Silent Write/Store V VV Correct Prediction (Silent) fault X V X

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 25 Main (R-stream)Partial (A-stream) Confident Silent Write/Store stale Incorrect Prediction (Not Silent) fault X

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 26 Slipstream Fault Detection Coverage We showed confident predictions detect faults –Additional coverage = Correctly predicted confident instr. + Their backward slices –Mispredictions vulnerable, but rare in Slipstream –Mispredictions + backward slices = only 0.1% instr. Hence, non-duplicated instr. well covered Slipstream has high fault detection coverage (99.9%) Prior research: Only duplicated instr. covered

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 27 Fault Coverage: Detection vs. Recovery Detection Coverage Represents ability to detect faults Slipstream: 99.9% of instr. Recovery Coverage Ability to rollback to ‘golden’ state on fault detection Slipstream’s recovery coverage?

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 28 Slipstream Fault Recovery fault detected Flawed Rollback (Conventional Slipstream) Correct Rollback

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 29 Reorder Buffer (ROB) fault head commit to arch. state tail Flush Restart Arch. State Corrupted Base Slipstream Recovery detected

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 30 Reorder Buffer (ROB) head commit to arch. state tail Flush to ROB head Restart fault Enhancement: ROB-head Recovery detected

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 31 Reorder Buffer (ROB) head commit to arch. state tail Flush to ROB head fault Enhancement: ROB-head Recovery detected Arch. State Corrupted Limited rollback distance: R-stream retires quickly – accelerated by leading A-stream

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 32 Reorder Buffer (ROB) head commit to arch. state tail Enhancement: ROB occupancy management Threshold Delay in retirement, hence performance hit Increased rollback distance detected

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 33 Reorder Buffer (ROB) head commit to arch. state tail fault Enhancement: History Buffer head tail Undo changes Performance friendly: R-stream mispredictions are rare

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 34 Main (R-stream)Partial (A-stream) Indirect Check of Silent Write/Store V Correct Prediction (Silent) fault X Base Slipstream Recovery

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 35 Main (R-stream)Partial (A-stream) Direct Check of Silent Write/Store V Correct Prediction (Silent) fault X existing value new value V Direct Check Base Slipstream Recovery Need Enhanced Recovery

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 36 Novel Framework to Analyze Coverage Each instr. considered “candidate faulty” Coverage = # of instr. checked before committal to arch. state Mispredicted instr. and backward slices marked unchecked

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 37 Prior WorkNew Analysis Checkers Non-checkers Coverage Analysis

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 38 Logically masked Masked? Coverage Analysis Rollback point Masked?

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 39 Clarification Analysis framework is a coverage measurement tool Not in the actual hardware

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 40 Results: Microarchitecture models L1 I & D caches 64KB, 4-way, 64B line, LRU, L1hit = 1 cycle, L1miss/L2hit = 10 cycles L2 unified cache 1MB, 8-way, 64B line, LRU, L1miss/L2miss = 100 cycles superscalar core dispatch/issue/retire bandwidth: 8 (4) reorder buffer (ROB): 256 (128) load/store queue: 64 (32) issue queue: 64 (32) cache ports (read/write): 4 (2)

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 41 Breakdown of Instructions SPEC2K Integer Benchmarks 65% duplicated 35% non-duplicated > 50% non-duplicated on some bench.

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 42 Breakdown of Instructions 23% confident predictions 12% backward slices of confident predictions See paper for more detailed breakdown

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 43 Fault Coverage Slip + Direct + Recovery Enhancements Prior Work (65%): Only dup. instr. New result (78%): Correct confident predictions covered ROB-head recovery improves with occupancy threshold: 95% to 98% Direct silent write/ store checks improve coverage (88%) 65% 78% 88% 95% 97% 98% 99% History buffer schemes provide high coverage (98% to 99%)

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 44 Slipstream Performance (SMT 8-wide) Average slowdown: Full redundant execution:14% Slipstream:1.3%

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 45 Performance Impact of Enhanced Recovery ROB-occupancy management delays retirement, causes slowdown (gradual decrease from RH0 to RH48) History buffer approach is performance friendly (negligible slowdown)

NC STATE UNIVERSITY ASPLOS-XII© 2006 Vimal Reddy 46 Conclusions Confident predictions can replace duplication –Slipstream case study : Redundant thread reduced by up to 57% while retaining near-100% coverage Prediction-based PRT offers a new avenue for efficient fault tolerant computing