Evaluating Impact of Soft-Errors in an Embedded System - Vijay Sheshadri Graduate Student Dept. of Electrical Engineering.

Slides:



Advertisements
Similar presentations
PhD Student: Carlos Arthur Lang Lisbôa Advisor: Luigi Carro VLSI-SoC PhD Forum Low overhead system level approaches to deal with multiple and long.
Advertisements

IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for.
Scrubbing Approaches for Kintex-7 FPGAs
Discussion of: “Terrestrial-based Radiation Upsets: A Cautionary Tale” CprE 583 Tony Kuker 12/06/05.
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
HPEC 2012 Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing Quinn Martin Alan George.
Single Event Upsets (SEUs) – Soft Errors By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M University, College.
April 30, Cost efficient soft-error protection for ASICs Tuvia Liran; Ramon Chips Ltd.
ICAP CONTROLLER FOR HIGH-RELIABLE INTERNAL SCRUBBING Quinn Martin Steven Fingulin.
Microprocessor Reliability
1 Saad Arrabi 2/24/2010 CS  Definition of soft errors  Motivation of the paper  Goals of this paper  ACE and un-ACE bits  Results  Conclusion.
IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of.
Using Hardware Vulnerability Factors to Enhance AVF Analysis Vilas Sridharan RAS Architecture and Strategy AMD, Inc. International Symposium on Computer.
® 1 ISCA 2004 Shubu Mukherjee, FACT Group, MMDC, Intel Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor Techniques to Reduce.
® 1 Shubu Mukherjee, FACT Group Cache Scrubbing in Microprocessors: Myth or Necessity? Practical Experience Report Shubu Mukherjee Joel Emer, Tryggve Fossum,
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.
March 16-18, 2008SSST'20081 Soft Error Rate Determination for Nanometer CMOS VLSI Circuits Fan Wang Vishwani D. Agrawal Department of Electrical and Computer.
A Delay-efficient Radiation-hard Digital Design Approach Using Code Word State Preserving (CWSP) Elements Charu Nagpal Rajesh Garg Sunil P. Khatri Department.
Embedded Systems Laboratory Informatics Institute Federal University of Rio Grande do Sul Porto Alegre – RS – Brazil SRC TechCon 2005 Portland, Oregon,
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Cost-Efficient Soft Error Protection for Embedded Microprocessors
Barcelona, Spain November 13, 2005 WAR-1: Assessing SEU Vulnerability Via Circuit-Level Timing Analysis 1 Assessing SEU Vulnerability via Circuit-Level.
Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design Vishwani.
1 Enhancing Random Access Scan for Soft Error Tolerance Fan Wang* Vishwani D. Agrawal Department of Electrical and Computer Engineering, Auburn University,
University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded.
Motivation Yang You 1, Jinghong Chen 1, Datao Gong 2, Deping Huang 1, Tiankuan Liu 2, Jingbo Ye 2 1 Department of Electrical Engineering, Southern Methodist.
1 Efficient Analytical Determination of the SEU- induced Pulse Shape Rajesh Garg Sunil P. Khatri Department of ECE Texas A&M University College Station,
Finite State Machines. Binary encoded state machines –The number of flip-flops is the smallest number m such that 2 m  n, where n is the number of states.
Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible.
SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.
SiLab presentation on Reliable Computing Combinational Logic Soft Error Analysis and Protection Ali Ahmadi May 2008.
Soft errors in adder circuits Rajaraman Ramanarayanan, Mary Jane Irwin, Vijaykrishnan Narayanan, Yuan Xie Penn State University Kerry Bernstein IBM.
FORMAL VERIFICATION OF ADVANCED SYNTHESIS OPTIMIZATIONS Anant Kumar Jain Pradish Mathews Mike Mahar.
FT-UNSHADES Analysis of SEU effects in Digital Designs for Space Gioacchino Giovanni Lucia TEC-EDM, MPD - 8 th March Phone: +31.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Lach1MAPLD 2005/241 Accessible Formal Verification for Safety-Critical FPGA Design John Lach, Scott Bingham, Carl Elks, Travis Lenhart Charles L. Brown.
Analytical Approach for Soft Error Rate Estimation of SRAM-Based FPGAs Ghazanfar (Hossein) Asadi and Mehdi B. Tahoori Why Soft Error Rate (SER) Estimation?
Spring 2008 CSE 591 Compilers for Embedded Systems Aviral Shrivastava Department of Computer Science and Engineering Arizona State University.
2011/IX/27SEU protection insertion in Verilog for the ABCN project 1 Filipe Sousa Francis Anghinolfi.
Hrushikesh Chavan Younggyun Cho Structural Fault Tolerance for SOC.
Varadarajan Srinivasan, Julian W. Farquharson,
Eduardo L. Rhod, Álisson Michels, Carlos A. L. Lisbôa, Luigi Carro ETS 2006 Fault Tolerance Against Multiple SEUs using Memory-Based Circuits to Improve.
Architectural Vulnerability Factor (AVF) Computation for Address-Based Structures Arijit Biswas, Paul Racunas, Shubu Mukherjee FACT Group, DEG, Intel Joel.
Harnessing Soft Computation for Low-Budget Fault Tolerance Daya S Khudia Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan,
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.
Copyright 2005, M. Tahoori1 Soft Error Modeling and Mitigation Mehdi B. Tahoori Northeastern University
A4 1 Barto "Sequential Circuit Design for Space-borne and Critical Electronics" Dr. Rod L. Barto Spacecraft Digital Electronics Richard B. Katz NASA Goddard.
IPR: In-Place Reconfiguration for FPGA Fault Tolerance Zhe Feng 1, Yu Hu 1, Lei He 1 and Rupak Majumdar 2 1 Electrical Engineering Department 2 Computer.
Spring 2008 CSE 591 Compilers for Embedded Systems Aviral Shrivastava Department of Computer Science and Engineering Arizona State University.
A Novel, Highly SEU Tolerant Digital Circuit Design Approach By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.
MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.
Programmable Logic Devices
Presenter: Darshika G. Perera Assistant Professor
SE-Aware HPC Extension : Selective Data Protection for reducing failures due to soft errors 7/20/2006 Kyoungwoo Lee.
Soft Error Analysis of FPGA under ISO Standard
VLSI Testing Lecture 6: Fault Simulation
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTES
MAPLD 2005 BOF-L Mitigation Methods for
VLSI Testing Lecture 6: Fault Simulation
Soft Error Detection for Iterative Applications Using Offline Training
Dynamic Prediction of Architectural Vulnerability
Dynamic Prediction of Architectural Vulnerability
Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.
Analytical Approach for Soft Error Rate Estimation of SRAM-Based FPGAs
InCheck: An In-application Recovery Scheme for Soft Errors
Guihai Yan, Yinhe Han, and Xiaowei Li
Presentation transcript:

Evaluating Impact of Soft-Errors in an Embedded System - Vijay Sheshadri Graduate Student Dept. of Electrical Engineering

May 3, What is a Soft-error? Transient fault caused by cosmic ray particles. 1 0 A charged particle incident on a component The charged particle creates EHPs which get collected by the drain Sufficient charge collection causes an erroneous bit- flip

May 3, Soft-error in a System Bit Read Bit has error protection Error is only detected (e.g., parity + no recovery) Error can be corrected (e.g, ECC) yes no Does bit matter? Silent Data Corruption (SDC) yes no Detected, but unrecoverable error (DUE) no error yes no benign fault no error benign fault no error Source: Shubhu Mukherjee et al. Radiation-Induced Soft Errors: An Architectural Perspective, HPCA 2005

May 3, Masking of Soft-error REGISTERSREGISTERS I1 I2 I3 I4 I5 I6 I7 C E D B REGISTERSREGISTERS O2 O Particle strike Electrical masking Soft error No soft error latching window masking Logical Masking 4

May 3, FIT Equation: Vulnerability Factors FIT =  (for each vulnerable device i) (intrinsic error rate i * vulnerability factor i ) Vulnerability Factor = Timing Vulnerability Factor * Architectural Vulnerability Factor Timing Vulnerability Factor (TVF) fraction of time bit is vulnerable Architectural Vulnerability Factor (AVF) fraction of time bit matters for final output of a program Source: Shubhu Mukherjee et al. Radiation-Induced Soft Errors: An Architectural Perspective, HPCA 2005

May 3, Architectural Vulnerability Factor Fraction of time bit matters for final output of a program Branch Predictor Doesn’t matter at all (AVF = 0%) Program Counter Almost always matters (AVF ~ 100%) Computing AVF for complex structures Statistical Fault Injection ACE (Architecturally Correct Execution) Analysis Source: Shubhu Mukherjee et al. Radiation-Induced Soft Errors: An Architectural Perspective, HPCA 2005

Soft-error & Automobiles Mar, NHTSA enlisted NASA Engineering and Safety Center (NESC) to investigate “Unintended Acceleration” Apr,2011 – NESC discounts SEU in its report to NHTSA stating that the ICs manufactured using SOI (Silicon-on-insulator) technology As per AEC-Q100 standard, SEU testing required for automobile electronics with RAM > 1Mb May 3,

An Example Predicted Block RAM upset rates for a Virtex-5 FPGA = 635 FIT/Mb = 1.5E-05 upsets per day per Mb. Ref : A. Lesea, “Continuing Experiments of Atmospheric Neutron Effects on Deep Submicron Integrated Circuits,” WP286 (v1.0), Xilinx, Inc Assume this FPGA used in throttle control module If 500,000 such vehicles produced by vendor, then total upsets per day = 1.5E-05 x 500,000 = 7.6 vehicle upsets per day May 3,

Soft-error Mitigation Robust circuit designs (radiation-hardenend) resilient to soft-errors Soft-error mitigation at Device-level – silicon-on-insulator, triple-well Circuit-level – DICE cell, Triple-modular redundancy Architecture-level – RMT, lock-stepping, ECC May 3,

10 Soft-error Mitigation Soft-error mitigation techniques incur penalties in area (spatial redundancy) timing (temporal redundancy) Selective hardening of the components for reduced penalty Often based on logical/electrical/timing derating A low cost mitigation technique proposed for critical applications based on application derating Certain applications can mask or recover from transient faults* Ref: V. Wong et al, “Soft Error Resilience of Probabilistic Inference Applications” SELSE II, 2006

May 3, Critical Application - An Analogy Climate monitor/display Airbag deployment GPS Cruise control A micro-controller embedded in a car dashboard maybe handling many applications. A critical application in this case could be ‘Airbag deployment’. SE during this application could be catastrophic

May 3, Target Module PWM – output is a pulse, width of which decides speed of motor. Etpwmi0 module ~800 FFs & ~3000 logic gates 180-nm CMOS technology, 80 MHz frequency ADC CPU core PWM Motor

May 3, Basic Simulation Steps* Pre-analysis: Identify components utilized by critical application Fault injection: Inject a single fault at random time instance by depositing the opposite value on the component Error metric: Error count => no. of mismatches b/w output and reference PW count => no. of clock-cycles the output is ‘1’ as compared to reference Ref: J. Blome et al, “Cost-Efficient Soft Error Protection for Embedded Microprocessors” CASES, 2006

Simulation tools Verilog netlist simulated with timing information, using Synopsys VCS Fault-injection module coded in C. Uses VPI (verilog procedural interface) functions to Access a net in the netlist (vpiHandle) Read value of the net (vpi_get_value) Overwrite value of the net (vpi_put_value) May 3,

May 3, Simulation – Pre-analysis Pre-analysis Categorize FFs based on their activity a) Low-activity FFs (no. of toggles less than 2) b) High-activity FFs (no. of toggles higher than 2) Opposite values forced and output pulse observed for errors FFs in which errors were observed are identified and subjected to fault-injection

May 3, Simulation – Fault-injection Fault injection For the FFs obtained from pre-analysis, inject fault at a random instance of time (within time interval of first output pulse) Measure Error count & PW count. Identify FFs with error in acceptable limits Fault-injection window Output pulse Original value Test bench Fault- injection module (verilog)(C+VPI) Modified value

May 3, Absolute error vs. Acceptable error Absolute error – Raise error flag for any mismatch b/w the output pulse and reference Acceptable error - Raise error flag only if mismatch b/w the output pulse and reference lies outside tolerance limit* Examples: Delayed pulse - Self-correcting pulse Fault- injected here Target FF Actual output reference copy Fault- injected here Target FF reference copy Actual output delay Ref: X. Li, et al “Exploiting Soft Computing for Increased Fault Tolerance” Workshop on Architectural Support for Gigascale Integration, 2006

May 3, Simulations-Combinational logic Fault injection steps: SE modeled as a 1ns pulse (System Clock Freq = 80MHz) Transient pulse injected onto the gate output Target combinational circuit selected at random Example: 2-input NAND gate Actual output reference copy A B Y Injected Fault A B Y

May 3, Results Pre-analysis - ~18% FFs used by the application Fault-injection - number of faults injected is proportional to the number of flip-flops in the group Low-toggle FFs more in number, hence no. of faults injected in low-toggle FF is higher

May 3, Results Low-toggle FF more vulnerable to soft-errors since an erroneous bit-flip may remain unchanged High-toggle FF is written very often, an erroneous bit flip has a higher probability of getting overwritten

May 3, Computing AVF AVF = P e * % component P e = probability that a fault injected in the component results in an error (P e ) = (no. of errors) / (no. of faults injected) % component = the percentage of that component with respect to total number of components Example: For a latch, a. if # errors = 50% of injected faults (P e = 0.5) b. if latches make for 20% of circuit AVF = 0.5 x 0.2 = 0.1

AVF - Results Low activity FF have a higher P e and are more in number; hence have a higher AVF Combinational logic, though high in number, has P e ~4E-03, causing AVF to drop 5/3/

Summary Fault-resilience scheme for critical applications using application derating and inherent error tolerance For the application considered, ~12% of the sequential logic was safety critical (prev. work reports 30% of seq. logic hardened for 99% fault-coverage in ARM embedded proc. running image processing algorithm) failures in combinational logic were negligible Worst-case scenario would only be the same as radiation-hardening a generic system i.e., all the hardware is identified as safety-critical 5/3/

Future Work Perform fault-injection analysis on the processor core managing the control loop Conduct neutron beam experiments on the circuit to compare with simulations and find FIT rate Implement circuit hardening and test the system to ascertain its robustness 5/3/