Using Hardware Vulnerability Factors to Enhance AVF Analysis Vilas Sridharan RAS Architecture and Strategy AMD, Inc. International Symposium on Computer.

Slides:



Advertisements
Similar presentations
Adaptive Processes Introduction to Software Engineering Adaptive Processes.
Advertisements

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
1 Saad Arrabi 2/24/2010 CS  Definition of soft errors  Motivation of the paper  Goals of this paper  ACE and un-ACE bits  Results  Conclusion.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of.
® 1 ISCA 2004 Shubu Mukherjee, FACT Group, MMDC, Intel Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor Techniques to Reduce.
UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
Transient Fault Tolerance via Dynamic Process-Level Redundancy Alex Shye, Vijay Janapa Reddi, Tipp Moseley and Daniel A. Connors University of Colorado.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
What Great Research ?s Can RAMP Help Answer? What Are RAMP’s Grand Challenges ?
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Strategic Directions in Real- Time & Embedded Systems Aatash Patel 18 th September, 2001.
Instrumentation and Profiling David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA
Compilation, Architectural Support, and Evaluation of SIMD Graphics Pipeline Programs on a General-Purpose CPU Mauricio Breternitz Jr, Herbert Hum, Sanjeev.
Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.
University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded.
Software-Based Online Detection of Hardware Defects: Mechanisms, Architectural Support, and Evaluation Kypros Constantinides University of Michigan Onur.
3.1Introduction to CPU Central processing unit etched on silicon chip called microprocessor Contain tens of millions of tiny transistors Key components:
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
1 Prediction of Software Reliability Using Neural Network and Fuzzy Logic Professor David Rine Seminar Notes.
Software Faults and Fault Injection Models --Raviteja Varanasi.
Introspective 3D Chips S. Mysore, B. Agrawal, N. Srivastava, S. Lin, K. Banerjee, T. Sherwood (UCSB), ASPLOS 2006 Shimin Chen (LBA Reading Group Presentation)
Determining the Optimal Process Technology for Performance- Constrained Circuits Michael Boyer & Sudeep Ghosh ECE 563: Introduction to VLSI December 5.
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj.
Copyright © 2008 UCI ACES Laboratory Kyoungwoo Lee 1, Aviral Shrivastava 2, Nikil Dutt 1, and Nalini Venkatasubramanian 1.
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.
Soft errors in adder circuits Rajaraman Ramanarayanan, Mary Jane Irwin, Vijaykrishnan Narayanan, Yuan Xie Penn State University Kerry Bernstein IBM.
Eliminating Silent Data Corruptions caused by Soft-Errors Siva Hari, Sarita Adve, Helia Naeimi, Pradeep Ramachandran, University of Illinois at Urbana-Champaign,
Computer Organization and Design Computer Abstractions and Technology
Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.
Title of Selected Paper: IMPRES: Integrated Monitoring for Processor Reliability and Security Authors: Roshan G. Ragel and Sri Parameswaran Presented by:
Copyright 2004 David J. Lilja1 Measuring Computer Performance SUMMARY.
1 A Cost-effective Substantial- impact-filter Based Method to Tolerate Voltage Emergencies Songjun Pan 1,2, Yu Hu 1, Xing Hu 1,2, and Xiaowei Li 1 1 Key.
Relyzer: Exploiting Application-level Fault Equivalence to Analyze Application Resiliency to Transient Faults Siva Hari 1, Sarita Adve 1, Helia Naeimi.
Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.
Lecture 11: 10/1/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
1 System-Level Vulnerability Estimation for Data Caches.
Architectural Vulnerability Factor (AVF) Computation for Address-Based Structures Arijit Biswas, Paul Racunas, Shubu Mukherjee FACT Group, DEG, Intel Joel.
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
Harnessing Soft Computation for Low-Budget Fault Tolerance Daya S Khudia Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan,
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.
Low-cost Program-level Detectors for Reducing Silent Data Corruptions Siva Hari †, Sarita Adve †, and Helia Naeimi ‡ † University of Illinois at Urbana-Champaign,
D A C U C P Speculative Alias Analysis for Executable Code Manel Fernández and Roger Espasa Computer Architecture Department Universitat Politècnica de.
Department of Computer Science 6 th Annual Austin CAS Conference – 24 February 2005 Ricardo Portillo, Diana Villa, Patricia J. Teller The University of.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Evaluation – Metrics, Simulation, and Workloads Copyright 2004 Daniel.
Dynamic Verification of Sequential Consistency Albert Meixner Daniel J. Sorin Dept. of Computer Dept. of Electrical and Science Computer Engineering Duke.
CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 2) Jonathan Winter.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
EE 653: Group #3 Impact of Drowsy Caches on SER Arjun Bir Singh Mohammad Abdel-Majeed Sameer G Kulkarni.
University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.
Raghuraman Balasubramanian Karthikeyan Sankaralingam
nZDC: A compiler technique for near-Zero silent Data Corruption
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Hwisoo So. , Moslem Didehban#, Yohan Ko
Douglas Lacy & Daniel LeCheminant CS 252 December 10, 2003
Dynamic Prediction of Architectural Vulnerability
Dynamic Prediction of Architectural Vulnerability
Analytical Approach for Soft Error Rate Estimation of SRAM-Based FPGAs
InCheck: An In-application Recovery Scheme for Soft Errors
COMS 361 Computer Organization
Co-designed Virtual Machines for Reliable Computer Systems
Dynamic Verification of Sequential Consistency
Presentation transcript:

Using Hardware Vulnerability Factors to Enhance AVF Analysis Vilas Sridharan RAS Architecture and Strategy AMD, Inc. International Symposium on Computer Architecture June 23, 2010 David R. Kaeli ECE Department Northeastern University

What is this talk about?  Transient faults Cause data corruption without damage to the underlying device Modeled as a bit flip in the microarchitecture (0  1 or vice versa)  Vulnerability analysis Determines which faults matter and which do not Allows us to make informed decisions about which structures to protect We do this today using the Architectural Vulnerability Factor (AVF)  This talk focuses primarily on the techniques For results, please refer to the paper 2

Architectural Vulnerability Factor (AVF)  The fraction of bits in a hardware structure H that, when corrupted, will result in incorrect program output (an error) These bits are required for Architecturally Correct Execution (ACE bits) Other bits are unACE bits B H :Size in bits N:Number of cycles S. S. Mukherjee et al., Int’l Symposium on Microarchitecture, Dec

Motivating Example Constant workload / Variable microarchitecture Variable workload / Constant microarchitecture AVF depends on hardwareand on software This talk focuses on quantifying hardware vulnerability V. Sridharan and D. R. Kaeli, Int’l Symposium on High Performance Computer Architecture, Feb

Outline  Introduction  Quantifying Hardware Vulnerability  Using HVF for Microarchitectural Exploration  Estimating AVF at Runtime  Conclusions 5

User ProgramOperating SystemVirtual MachineMicroarchitectureDevices A Typical System AVF TVF 6

The System Vulnerability Stack User ProgramOperating SystemVirtual MachineMicroarchitectureDevices Timing VF Program VF Operating System VFVirtual Machine VFHardware VF ABI ISA Functional VF = Vulnerability Factor ISA = Instruction Set Architecture ABI = Application Binary Interface 7

Fault Visibility Physical Registers Physical Memory Hardware-visible state Process (Virtual) Memory Program-visible state Reorder Buffer Issue Queue Load Buffer Store Buffer Architected Registers Hardware-visible fault Program-visible fault 8

Issue Queue Masked fault Exposed fault Hardware-visible fault Consequences of a Visible Fault Physical Registers Physical Memory Process (Virtual) Memory Reorder Buffer Load Buffer Store Buffer Architected Registers Hardware-visible fault Program-visible fault Activated fault 9

Hardware Vulnerability Factor  The fraction of activated and exposed hardware-visible faults in hardware structure H These faults that cause a perturbation of the ISA Masked hardware-visible faults do not contribute to HVF B H :Size of H in bits N:Number of cycles 10

Outline  Introduction  Quantifying Hardware Vulnerability  Using HVF for Microarchitectural Exploration  Estimating AVF at Runtime  Conclusions 11

Using HVF for Microarchitectural Exploration  Full AVF analysis is possible at hardware design time Software workloads are available at design time  Can HVF help? Provides additional insight to hardware designers Accelerates AVF simulation 12

Additional Insight Generated by HVF Cycle Write Read P1 P2 P3 P4 (Live) Read (Dead) Read (Dead) Read (Dead) AVF = 10% 13

Additional Insight Generated by HVF Cycle Write Read P1 P2 P3 P4 Read HVF = 40% HVF = 70% 14

Insight from HVF: Real-World Example equake mgrid Regions of similar register usage AVF ≈ 8% AVF ≈ 15% 15

Outline  Introduction  Quantifying Hardware Vulnerability  Using HVF for Microarchitectural Exploration  Estimating AVF at Runtime  Conclusions 16

Estimating AVF at Runtime  Allows a system to adapt to changing vulnerability environment Enable redundancy when AVF is high Increase performance when AVF is low  Prior predictors don’t let software designers influence AVF estimate Predictors are entirely encoded in hardware Rely on training benchmarks or invariants (e.g., stored data is vulnerable) Assumptions fall apart in atypical programs (e.g., SW redundancy, games)  We split AVF estimation into HVF and PVF components Allow software designers to measure PVF using a profiling step Estimate HVF in hardware at runtime using an HVF Monitor Unit < 3% error between measured and estimated AVF (see paper for details) 17

Summary  Transient faults are a challenge for all processor manufacturers AVF analysis is a key part of understanding transient fault behavior  HVF quantifies hardware vulnerability to transient faults HVF provides additional insight to hardware designers HVF simulation can accelerate AVF modeling during hardware design  Runtime AVF estimation can be split into HVF and PVF components Software designers can influence runtime AVF estimates 18 HVF generates meaningful insight into system vulnerability to transient faults

Using Hardware Vulnerability Factors to Enhance AVF Analysis Questions?

References  V. Sridharan and D. R. Kaeli, Using Hardware Vulnerability Factors to Enhance AVF Analysis, Int’l Symp. on Computer Architecture (ISCA-37), June  V. Sridharan and D. R. Kaeli, Eliminating Microarchitectural Dependency from Architectural Vulnerability, Int’l Symp. on High-Performance Computer Architecture (HPCA-15), February  A. Dixit et al., Trends from Ten Years of Soft Error Experimentation, Workshop on Silicon Errors in Logic – System Effects, March  V. Sridharan and D. R. Kaeli, The Effect of Input Data on Program Vulnerability, Workshop on Silicon Errors in Logic – System Effects (SELSE-5), March  V. Sridharan and D. R. Kaeli, Reliability in the Shadow of Long-Stall Instructions, Workshop on Silicon Errors in Logic – System Effects (SELSE-3), April  R. Baumann, Radiation-Induced Soft Errors in Advanced Semiconductor Technologies, IEEE Trans. On Device and Materials Reliability, September  S. S. Mukherjee et al., A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor, Int’l Symp. on Microarchitecture (MICRO-36), December  P. Roche et al., Comparisons of Soft Error Rate for SRAMs in Commercial SOI and Bulk Below the 130-nm Technology Node, IEEE Trans. on Nuclear Science, December  J. D. Dirk et al., Terrestrial Thermal Neutrons, IEEE Trans. On Nuclear Science, December