Soft Error Analysis of FPGA under ISO 26262 Standard Mohammad Mahdi Karimi PhD candidate Electrical and Computer Engineering Department
Challenges in fault detection Overview Introduction to ISO 26262 Types of Soft Error Challenges in fault detection Our approach at Functional Safety Analysis of FPGA
Introduction to Safety Standards On May 7, 2016 at 3:40 p.m. on U.S. 27 near the BP Station west of Williston, a 45-year-old Ohio man was killed when he drove under the trailer of an 18-wheel semi. The top of Joshua Brown’s 2015 Tesla Model S vehicle was torn off by the force of the collision….When the truck made a left turn onto NE 140th Court in front of the car, the car’s roof struck the underside of the trailer as it passed under the trailer. The first role of an engineer is to ensure dangerous equipment's will not fail with catastrophic results.
Introduction to ISO 26262 ISO 26262 is the state of the art standard for functional safety of E/E systems for passenger vehicles. Functional safety features of each automotive product development phase is ranging: Specification Design Implementation Verification Validation and Production release
ISO 26262 is a risk based safety standard ISO 26262 Standard ISO 26262 is a risk based safety standard By calculating the risk of existing hazardous operational situations, it provides an automotive specific risk based approach for determining the risk classes using ASIL rates. Nov 2011 ISO 26262 (First ed.) Jul 2016 ISO 19451 (Semiconductors draft Standard) Jan 2018 Upcoming ISO 26262 second edition
Section II: Soft Errors
Technology Scaling Increased functionality increases the number of transistors in the design thus increasing the possibility of error in the design.
Soft Errors Soft errors are Radiation-Induced Transient Errors. Soft errors are logical faults in a circuit’s operation that do not reflect a permanent malfunction of the device. Soft Errors can be caused by: Space Radiation Thermal Neutrons Atmospheric muons Alpha Particles
De-Rating Effect Not all radiation induced faults propagate and produce errors because of the numerous masking effects: Electrical De-Rating Pulses whose amplitude never reaches Vtr are masked. Logic De-Rating According to the state of the circuit the propagation of the fault is subject to logic blocking. Costenaro, Enrico. Techniques for the evaluation and the improvement of emergent technologies’ behavior facing random errors. Diss. Université Grenoble Alpes, 2015.
De-Rating Effect Temporal De-Rating The opportunity window of a fault (SET or SEU) to be latched in a down-stream memorizing element. Functional De-Rating An upset does propagate to downstream state points, but the effect is not significant at the system level Memory De-Rating The portion of time during which the data stored in a memory will eventually be read and thus used by the application.
Our Problem The Problem is to identify the effect of the Soft Error in our system Level analysis The automotive market for FPGAs is ramping up and they consist of many IP core blocks based of their models and applications. Due to ISO 26262, in order to maintain functional safety and reasonable residual risk, system should be analyzed thoroughly and make sure we address every possible safety related transient failures.
Challenges FPGAs consists of millions of elements that their functionality changes based on their application and software. Every subsystem shall be analyzed based on the functional safety goals defined at the system level. Ability for the framework to identify the dependency of system safety goals to every subsystem failure. Creating a framework that is automated in the way that can handle software updates and design changes, and has the ability to scale to larger systems.
Our Approach We will develop a systematic technique that analyses subsystem failure propagation. Each design will be modeled using a hardware description language (e.g., SystemVerilog) as well as compressed mathematical representations where possible. The system would be evaluated based on multiple fault injection techniques (e.g., SystemVerilog test bench, Boolean Satisfiability) to provide diagnostic fault coverage
References Ramanarayanan, R., et al. "Analysis of soft error rate in flip-flops and scannable latches." SOC Conference, 2003. Proceedings. IEEE International [Systems-on-Chip]. IEEE, 2003. ISO 26262 Road vehicles – Functional safety – Part 5: Product development: hardware level. Silburt, Allan L., et al. "Design for soft error resiliency in internet core routers." IEEE Transactions on Nuclear Science 56.6 (2009): 3551-3555. Costenaro, Enrico. Techniques for the evaluation and the improvement of emergent technologies’ behavior facing random errors. Diss. Université Grenoble Alpes, 2015. Evans, Adrian. Abstraction techniques for scalable soft error analysis and mitigation. Diss. Université Grenoble Alpes, 2014.
Thank you mmkarimi@aggies.ncat.edu Mohammad Mahdi Karimi PhD candidate Electrical and Computer Engineering Department mmkarimi@aggies.ncat.edu