1 Saad Arrabi 2/24/2010 CS 8501.  Definition of soft errors  Motivation of the paper  Goals of this paper  ACE and un-ACE bits  Results  Conclusion.

Slides:



Advertisements
Similar presentations
Xavier Vera, Jaume Abella,Javier Carretero and Antonio Gonzalez.
Advertisements

Computer Organization and Architecture
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
® 1 Shubu Mukherjee, FACT Group Radiation-Induced Soft Errors: An Architectural Perspective Shubu Mukherjee 1, Joel Emer 2, & Steven. K Reinhardt 1,3 “If.
1 CSC 714 Center for Embedded Systems Research (CESR) Department of Computer Science North Carolina State University Frank Mueller Missing in Action: Timing.
Microprocessor Reliability
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors Fred A. Bower, Daniel J. Sorin, and Sule Ozev.
Chapter 4 Quality Assurance in Context
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of.
Using Hardware Vulnerability Factors to Enhance AVF Analysis Vilas Sridharan RAS Architecture and Strategy AMD, Inc. International Symposium on Computer.
® 1 ISCA 2004 Shubu Mukherjee, FACT Group, MMDC, Intel Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor Techniques to Reduce.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.
1 A Design Approach for Radiation-hard Digital Electronics Rajesh Garg Nikhil Jayakumar Sunil P Khatri Gwan Choi Department of Electrical and Computer.
Cost-Effective Register File Soft Error reduction Pablo Montesinos, Wei Liu and Josep Torellas, University of Illinois at Urbana-Champaign.
Multiscalar processors
1 Multi-Level Error Detection Scheme based on Conditional DIVA-Style Verification Kevin Lacker and Huifang Qin CS252 Project Presentation 12/10/2003.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
Formal Techniques for Verification Using SystemC By Nasir Mahmood.
GPU-Qin: A Methodology For Evaluating Error Resilience of GPGPU Applications Bo Fang , Karthik Pattabiraman, Matei Ripeanu, The University of British.
System Testing There are several steps in testing the system: –Function testing –Performance testing –Acceptance testing –Installation testing.
1 Dependability Benchmarking of VLSI Circuits Cristian Constantinescu Intel Corporation.
1 Efficient Analytical Determination of the SEU- induced Pulse Shape Rajesh Garg Sunil P. Khatri Department of ECE Texas A&M University College Station,
CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.
Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible.
Fault-tolerant Typed Assembly Language Frances Perry, Lester Mackey, George A. Reis, Jay Ligatti, David I. August, and David Walker Princeton University.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Soft errors in adder circuits Rajaraman Ramanarayanan, Mary Jane Irwin, Vijaykrishnan Narayanan, Yuan Xie Penn State University Kerry Bernstein IBM.
Jon Perez, Mikel Azkarate-askasua, Antonio Perez
Eliminating Silent Data Corruptions caused by Soft-Errors Siva Hari, Sarita Adve, Helia Naeimi, Pradeep Ramachandran, University of Illinois at Urbana-Champaign,
Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.
Yun-Chung Yang SimTag: Exploiting Tag Bits Similarity to Improve the Reliability of the Data Caches Jesung Kim, Soontae Kim, Yebin Lee 2010 DATE(The Design,
Title of Selected Paper: IMPRES: Integrated Monitoring for Processor Reliability and Security Authors: Roshan G. Ragel and Sri Parameswaran Presented by:
1 A Cost-effective Substantial- impact-filter Based Method to Tolerate Voltage Emergencies Songjun Pan 1,2, Yu Hu 1, Xing Hu 1,2, and Xiaowei Li 1 1 Key.
Highlights of the 36 th Annual International Symposium on Microarchitecture December 2003 Theo Theocharides Embedded and Mobile Computing Center Department.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2004 Daniel J. Sorin Duke University.
Relyzer: Exploiting Application-level Fault Equivalence to Analyze Application Resiliency to Transient Faults Siva Hari 1, Sarita Adve 1, Helia Naeimi.
Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan D EPARTMENT OF E LECTRICAL AND C OMPUTER E NGINEERING T HE U NIVERSITY OF B RITISH C OLUMBIA.
Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.
Analytical Approach for Soft Error Rate Estimation of SRAM-Based FPGAs Ghazanfar (Hossein) Asadi and Mehdi B. Tahoori Why Soft Error Rate (SER) Estimation?
MUONS!. What are we going to talk about? How muons are created Detecting methods Muon half life and decay rate Muon decay data Problems with the data.
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
Hrushikesh Chavan Younggyun Cho Structural Fault Tolerance for SOC.
1 System-Level Vulnerability Estimation for Data Caches.
Error-Detecting and Error-Correcting Codes
Architectural Vulnerability Factor (AVF) Computation for Address-Based Structures Arijit Biswas, Paul Racunas, Shubu Mukherjee FACT Group, DEG, Intel Joel.
Harnessing Soft Computation for Low-Budget Fault Tolerance Daya S Khudia Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan,
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.
Low-cost Program-level Detectors for Reducing Silent Data Corruptions Siva Hari †, Sarita Adve †, and Helia Naeimi ‡ † University of Illinois at Urbana-Champaign,
Static Analysis to Mitigate Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava Compiler Microarchitecture Lab Arizona State University, USA.
Spring 2008 CSE 591 Compilers for Embedded Systems Aviral Shrivastava Department of Computer Science and Engineering Arizona State University.
A Novel, Highly SEU Tolerant Digital Circuit Design Approach By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.
GangES: Gang Error Simulation for Hardware Resiliency Evaluation Siva Hari 1, Radha Venkatagiri 2, Sarita Adve 2, Helia Naeimi 3 1 NVIDIA Research, 2 University.
University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.
Multiscalar Processors
nZDC: A compiler technique for near-Zero silent Data Corruption
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Embedded Computer Architecture 5SAI0 Technology
Hwisoo So. , Moslem Didehban#, Yohan Ko
Douglas Lacy & Daniel LeCheminant CS 252 December 10, 2003
Phase Capture and Prediction with Applications
Dynamic Prediction of Architectural Vulnerability
Dynamic Prediction of Architectural Vulnerability
Analytical Approach for Soft Error Rate Estimation of SRAM-Based FPGAs
2/23/2019 A Practical Approach for Handling Soft Errors in Iterative Applications Jiaqi Liu and Gagan Agrawal Department of Computer Science and Engineering.
Embedded Computer Architecture 5SAI0 Technology
SDC is in the eye of the beholder: A Survey and preliminary study
Fault Tolerant Systems in a Space Environment
Phase based adaptive Branch predictor: Seeing the forest for the trees
Presentation transcript:

1 Saad Arrabi 2/24/2010 CS 8501

 Definition of soft errors  Motivation of the paper  Goals of this paper  ACE and un-ACE bits  Results  Conclusion and comments 2

 Cosmic ray interactions with atmosphere  Protons bombard O 2 and N 2  Shower of secondary particles  charged and neutral  Cascade of interactions 3 Illustration credit: S. Swardy / U. Chicago / NASA

 Sometimes they cause a spike in current in the transistors  Transient faults, permanent faults  Trend in the future with transistors getting smaller  Rays that lead to errors and ones that don’t 4

 Transient simulation of a particle strike 5

 Techniques to prevent soft errors exist  Expensive in performance and cost  You don’t want to over do it  Sun is documented to having lost a major customer to IBM from this phenomena [3].  Things will get tougher with time  Multi bit upset  Lower voltage 6

 Reduce cost of the soft error protection  Know the required cost of protection early in the design process  Give good estimates for AVF while still being conservative  Provide a systematic method that can be expanded in the future 7

 *AVF: architectural vulnerability factor  *ACE: architecturally correct execution  *FIT: Error(Fault) in Time  *SDC: silent data corruption  MTBF: Mean Time Between Failures  DUE: detected unrecoverable errors  FDD: first-level dynamically dead  TDD: Transitively dynamically dead  * Focus of the paper 8

 Microarchitectural Un-ACE Bits  Idle or Invalid State.  Mis-speculated State.  Predictor Structures.  Ex-ACE State.  Architectural Un-ACE Bits  NOP instructions.  Performance-enhancing instructions.  Predicated-false instructions.  Dynamically dead instructions. (FDD & TDD)  Logical masking. 9 OpcodeValid bitsinvalid Validityinstruction Which branch is predicted Unused instruction or data NOPUn-ACE bits PrefetchAddress FalseResult

 Using performance simulator  Adding window analysis to follow ACE and un-ACE instructions  Check the validity by micro benchmarks  Typical methods are fault injection models  Require RTL model  Slow and later in the design process 10

11

12

13

14

 Reduce the estimation of AVF  Done through performance simulator, so early in the design process and application oriented  Details is limited by the simulator details  Identify the AFV of different parts rather average  Less accurate that error injection models  Is the way they tested their accuracy good enough?  Why being so conservative?  Is performance will be included in the future 15

16 Questions? Some slides are courtesy of Nishant George