1 Multi-Level Error Detection Scheme based on Conditional DIVA-Style Verification Kevin Lacker and Huifang Qin CS252 Project Presentation 12/10/2003.

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for.
CS 7810 Lecture 4 Overview of Steering Algorithms, based on Dynamic Code Partitioning for Clustered Architectures R. Canal, J-M. Parcerisa, A. Gonzalez.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Reducing Leakage Power in Peripheral Circuits of L2 Caches Houman Homayoun and Alex Veidenbaum Dept. of Computer Science, UC Irvine {hhomayou,
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors Fred A. Bower, Daniel J. Sorin, and Sule Ozev.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of.
NC STATE UNIVERSITY ASPLOS-XII Understanding Prediction-Based Partial Redundant Threading for Low-Overhead, High-Coverage Fault Tolerance Vimal Reddy Sailashri.
CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science August 20, 2009 Enabling.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Cluster Prefetch: Tolerating On-Chip Wire Delays in Clustered Microarchitectures Rajeev Balasubramonian School of Computing, University of Utah July 1.
Benefits of Early Cache Miss Determination Memik G., Reinman G., Mangione-Smith, W.H. Proceedings of High Performance Computer Architecture Pages: 307.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Slipstream Processors by Pujan Joshi1 Pujan Joshi May 6 th, 2008 Slipstream Processors Improving both Performance and Fault Tolerance.
UPC Trace-Level Speculative Multithreaded Architecture Carlos Molina Universitat Rovira i Virgili – Tarragona, Spain Antonio González.
Revisiting Load Value Speculation:
1 Practical Selective Replay for Reduced-Tag Schedulers Dan Ernst and Todd Austin Advanced Computer Architecture Lab The University of Michigan June 8.
Presenter: Jyun-Yan Li Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors Pramod Subramanyan, Virendra.
Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible.
A Low-Cost Memory Remapping Scheme for Address Bus Protection Lan Gao *, Jun Yang §, Marek Chrobak *, Youtao Zhang §, San Nguyen *, Hsien-Hsin S. Lee ¶
Transient Fault Detection via Simultaneous Multithreading Shubhendu S. Mukherjee VSSAD, Alpha Technology Compaq Computer Corporation.
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.
Computer Engineering Group Brandenburg University of Technology at Cottbus 1 Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in VLIW-Processors.
A Centralized Cache Miss Driven Technique to Improve Processor Power Dissipation Houman Homayoun, Avesta Makhzan, Jean-Luc Gaudiot, Alex Veidenbaum University.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
1/36 by Martin Labrecque How to Fake 1000 Registers Oehmke, Binkert, Mudge, Reinhart to appear in Micro 2005.
Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2004 Daniel J. Sorin Duke University.
Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.
Project Presentation By: Dean Morrison 12/6/2006 Dynamically Adaptive Prepaging for Effective Virtual Memory Management.
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.
Combining Software and Hardware Monitoring for Improved Power and Performance Tuning Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division.
Dynamic Branch Prediction During Context Switches Jonathan Creekmore Nicolas Spiegelberg T NT.
Exploiting Value Locality in Physical Register Files Saisanthosh Balakrishnan Guri Sohi University of Wisconsin-Madison 36 th Annual International Symposium.
1 Lecture 3: Pipelining Basics Today: chapter 1 wrap-up, basic pipelining implementation (Sections C.1 - C.4) Reminders:  Sign up for the class mailing.
CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 2) Jonathan Winter.
PipeliningPipelining Computer Architecture (Fall 2006)
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
Computer Organization CS224
ECE Dept., Univ. Maryland, College Park
Multiscalar Processors
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Flow Path Model of Superscalars
Exploring Value Prediction with the EVES predictor
The processor: Pipelining and Branching
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Douglas Lacy & Daniel LeCheminant CS 252 December 10, 2003
Address-Value Delta (AVD) Prediction
Sampoorani, Sivakumar and Joshua
Fault Tolerant Systems in a Space Environment
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
What Are Performance Counters?
Presentation transcript:

1 Multi-Level Error Detection Scheme based on Conditional DIVA-Style Verification Kevin Lacker and Huifang Qin CS252 Project Presentation 12/10/2003

2 Outline Motivation Verification system architecture Error statistics Performance results Conclusion and future ideas

3 DIVA and Others — Existing Processor Dynamic Verification Schemes ApproachesAlgorithm UsedWeakness Watchdog μP Control flow AnalysisSignature comparison, Frame construction Complexity in algorithm development; Sometimes require pre-compiling; Effective for special purpose programs; Data reasonableness checkGaussian elimination Memory access validationCapability based addressing SW multithreaded execution Dynamically scheduled execution of program copies Complexity in designing dynamic scheduling; Can’t detect permanent faults. Dynamic Implementation Verification Architecture (DIVA) T. M. Austin, “DIVA: a reliable substrate for deep submicron microarchitecture design,” ACM/IEEE international symposium on microarchitecture, 1999 Simple scheme High error coverage Exploits the abundant computation power modern technology provides.

4 State mismatch Time out Sanjay J. Patel, “Assertion/recovery: a micro- architecture for error-tolerant computing systems,” C2S2 workshop, 2003 What’s the Price DIVA is Paying? High degree of redundancy in re-executing every single instruction –Performance overhead with limited ROB size and Checker speed –High Power Consumption Especially inefficient for small error rate in most situations –How often does error hits a running processor? For example, cosmic ray causes 4000FIT (failures in 10 9 hours) for modern processor with on-chip caches, I.e. 1 soft error every 28.5 year… –When error happens, how does it affect the execution correctness? Can we reduce DIVA activity efficiently to save the costs? Error not manifested Speculation, ineffectual computation, uninvolved logic, stall cycles and dead values all help mask errors.

5 Conditional DIVA-Style Verification: Go Faster with Less Power Core μP Indicating a possible error? DIVA Checker Instruction Commit N Y Enhanced DIVA Checker Level 1 Error DetectionLevel 2 Error Detection Error recovery control Questions: What are the effective error indicators? What’s the optimum point of design tradeoff? Idea: DIVA only checks when possible error indicated Core processor runs faster with less interference from DIVA checker. DIVA Checker burns less power with reduced work load. The advantages of DIVA scheme is inherited, such as simplicity and high error coverage. Penalty: Error coverage will not be 100% due to the error indicator miss.

6 Conditional DIVA Scheme — System Implementation Core μPDIVA Checker Instruction Commit Error recovery control ROB Possible Error Marker Rules: DIVA checker only checks instructions marked as possible victims of error. In case of ROB congestion, oldest finished instructions are directly retired — no performance hits, otherwise marked instructions will be checked. Error recovery model: Flush core processor when error is found by DIVA checker. Fatal error recovery scheme by re-executing 10 instructions before the crash point.

7 Conditional DIVA Scheme — Design for Optimum Tradeoffs Design tradeoffs Error coverage Hardware/power costs Effectiveness of error indicators (Level 1) Performance overhead ROB overflow handling scheme DIVA checker latency (Level 2) Goal of this design: Find the most effective error indicators to maximize error coverage and minimize costs. We assume performance-favorable ROB overflow handling scheme. The DIVA checker never interferes core processor execution. Therefore the performance overhead is minimized. DIVA checker latency will be chosen to balance the error coverage and hardware/power costs.

8 Simulation Setup & Error Model SimpleScalar/PISA 3.0 tool set, instruction bandwidth 4 SPEC2000 benchmarks –(gzip, vpr, gcc, mcf, parser, vortex, bzip2, twolf, mesa, art, equake) All hardware and transient errors covered by DIVA: Sources of Real ErrorsTested?Error Injection Model Crosstalk: electrical disturbances in logic values held in circuits and wires YesRandom bit flips in register files storing the ALU calculation results. Radiation inference: gamma rays and alpha particlesYesRandom bit flips in register files Transmission errors during communication between two levels of memory or between cache and processor YesRandom bit flips in register files storing the memory access results. Circuit flaws caused by process defects and variations in deep sub-micron technologies No N/A Computer architecture bugs based by increased design complexity No N/A

9 Error Statistics — Effective Indicators of Possible Error Data prediction is effective –High correct rate in predicting a correct instruction. Very low miss rate. –Multiple data predictors can be combined. If any of them is correct then mark the instruction as non-error. Good data predictors –Constant stride (s) 2,4,6,8… –Repeat patten (r n ) 6,4,6,4… –Incremental (p n ) 1,2,3,4… Combination of s and r1- r4 is selected.

10 To Improve Error Coverage — Queueing Theory Assume checker queue before modifications has utilization u>1 If our only tool is dropping instructions indiscriminately –Dropping each instruction with probability d leads to u’ = (1-d)u –To prevent overflow, u’ 1 – 1/u –We miss errors at a rate of at least 1 – 1/u If we have an indicator which marks non-errors at a rate of a, and marks errors at a rate of e –Drop all marked instructions and drop the unmarked ones at a rate of d –u’ = (1-a)(1-d)u –To prevent overflow, u’ 1 – 1/((1-a)u) –We miss errors at a rate of at least 1 – (1/u)(1-e)/(1-a) Conclusion: If a > e, conditional verification is a net gain

11 Error Coverage and Check Activity Reduction — Performance with Bandwidth-1 Checker

12 Check Bandwidth higher than 2 only provides limited gain in error coverage. At bandwidth of 2 up to 25% of DIVA activity can be saved, with only 0.8% max. miss error coverage. Error Coverage and Check Activity Reduction — Performance with Bandwidth-2 Checker

13 Conclusions The proposed scheme achieves –zero performance overhead in core processor execution –For bandwidth 2, average checker workload diminished 10% –For bandwidth 1, average checker workload diminished 45% The penalty –For bandwidth 2, average error coverage is 99.2% –For bandwidth 1, average error coverage is 83.9% Not perfect, but factor of 6 improvement in mean time between uncaught error Future ideas –Program-specific indicators –Correlate error properties with program correctness