Cost-Efficient Soft Error Protection for Embedded Microprocessors

Slides:



Advertisements
Similar presentations
Computer Science Education
Advertisements

Machine cycle.
NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for.
EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Quantitative Analysis of Control Flow Checking Mechanisms for Soft Errors Aviral Shrivastava, Abhishek Rhisheekesan, Reiley Jeyapaul, and Carole-Jean Wu.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
University of Michigan Electrical Engineering and Computer Science 1 A Distributed Control Path Architecture for VLIW Processors Hongtao Zhong, Kevin Fan,
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors Fred A. Bower, Daniel J. Sorin, and Sule Ozev.
Computer Science 210 Computer Organization Clocks and Memory Elements.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /23/2013 Lecture 7: Computer Clock & Memory Elements Instructor: Ashraf Yaseen DEPARTMENT OF MATH &
IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of.
CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science August 20, 2009 Enabling.
Feng-Xiang Huang A Low-Cost SOC Debug Platform Based on On-Chip Test Architectures.
Transient Fault Tolerance via Dynamic Process-Level Redundancy Alex Shye, Vijay Janapa Reddi, Tipp Moseley and Daniel A. Connors University of Colorado.
University of Michigan Electrical Engineering and Computer Science 1 Reducing Control Power in CGRAs with Token Flow Hyunchul Park, Yongjun Park, and Scott.
University of Michigan Electrical Engineering and Computer Science 1 Parallelizing Sequential Applications on Commodity Hardware Using a Low-Cost Software.
Cost-Effective Register File Soft Error reduction Pablo Montesinos, Wei Liu and Josep Torellas, University of Illinois at Urbana-Champaign.
University of Michigan Electrical Engineering and Computer Science 1 An Architecture Framework for Transparent Instruction Set Customization in Embedded.
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Functional Coverage Driven Test Generation for Validation of Pipelined Processors P. Mishra and N. Dutt Proceedings of the Design, Automation and Test.
Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
Barcelona, Spain November 13, 2005 WAR-1: Assessing SEU Vulnerability Via Circuit-Level Timing Analysis 1 Assessing SEU Vulnerability via Circuit-Level.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
University of Michigan Electrical Engineering and Computer Science 1 StageNet: A Reconfigurable CMP Fabric for Resilient Systems Shantanu Gupta Shuguang.
University of Michigan Electrical Engineering and Computer Science 1 Top 5 Reasons Reliability is the Biggest Fallacy in Computer Architecture Research.
University of Michigan Electrical Engineering and Computer Science 1 Online Timing Analysis for Wearout Detection Jason Blome, Shuguang Feng, Shantanu.
1 Enhancing Random Access Scan for Soft Error Tolerance Fan Wang* Vishwani D. Agrawal Department of Electrical and Computer Engineering, Auburn University,
University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded.
Software-Based Online Detection of Hardware Defects: Mechanisms, Architectural Support, and Evaluation Kypros Constantinides University of Michigan Onur.
Lect 13-1 Lect 13: and Pentium. Lect Microprocessor Family  Microprocessor  Introduced in 1989  High Integration  On-chip 8K.
Transient Fault Detection via Simultaneous Multithreading Shubhendu S. Mukherjee VSSAD, Alpha Technology Compaq Computer Corporation.
COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept.
SiLab presentation on Reliable Computing Combinational Logic Soft Error Analysis and Protection Ali Ahmadi May 2008.
CML CML Compiler-Managed Protection of Register Files for Energy-Efficient Soft Error Reduction Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture.
Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,
European Test Symposium, May 28, 2008 Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI Kundan.
CSI-2111 Computer Architecture Ipage Control, memory and I/O v Objectives: –To define and understand the control units and the generation of sequences.
Title of Selected Paper: IMPRES: Integrated Monitoring for Processor Reliability and Security Authors: Roshan G. Ragel and Sri Parameswaran Presented by:
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Bundled Execution.
Implicit-Storing and Redundant- Encoding-of-Attribute Information in Error-Correction-Codes Yiannakis Sazeides 1, Emre Ozer 2, Danny Kershaw 3, Panagiota.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Adaptive Online Testing.
Analysis of Cache Tuner Architectural Layouts for Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /3/2013 Lecture 9: Memory Unit Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.
Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.
1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000.
Reduction of Register File Power Consumption Approach: Value Lifetime Characteristics - Pradnyesh Gudadhe.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
Harnessing Soft Computation for Low-Budget Fault Tolerance Daya S Khudia Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan,
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 22 Memory Definitions Memory ─ A collection of storage cells together with the necessary.
PART 5: (1/2) Processor Internals CHAPTER 14: INSTRUCTION-LEVEL PARALLELISM AND SUPERSCALAR PROCESSORS 1.
Static Analysis to Mitigate Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava Compiler Microarchitecture Lab Arizona State University, USA.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Efficient Soft Error.
Spring 2008 CSE 591 Compilers for Embedded Systems Aviral Shrivastava Department of Computer Science and Engineering Arizona State University.
University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.
Multiscalar Processors
nZDC: A compiler technique for near-Zero silent Data Corruption
Scott Mahlke University of Michigan
Hwisoo So. , Moslem Didehban#, Yohan Ko
Fault Tolerant Systems in a Space Environment
Presentation transcript:

Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome1, Shuguang Feng1, Shantanu Gupta1, Scott Mahlke1, Daryl Bradley2 University of Michigan1 ARM, Ltd. 2 This work was done in collaboration with Daryl Bradley from ARM Ltd. Along with Shuguang Feng, Shantanu Gupta and Scott Mahlke from The University of Michigan. 1

The Soft Error Problem CLK 1 Q D transient fault soft error 2 CLK D Q 1 transient fault soft error To begin, we’re going to start with a little bit of background on the soft error problem, what it is, and some projected trends. A soft error is a transient piece of incorrect hardware state, also known as a single event upset, or a transient error. Soft errors can be caused by a number of phenomenon ranging from electrical noise such as crosstalk, to high energy particle strikes caused by radiation from the atmosphere or semiconductor packaging materials. In this work we’re going to focus on the projected trends for soft errors caused by radiated particles, however the detection and correction techniques presented in this work are not particular to any specific cause. In general, a soft error can occur in a number of ways. For example, a high energy particle can strike a sequential state element and potentially invert the value stored in that element. Another way that a soft error may occur is if a particle strikes a combinational logic node, causing an incorrect value to be temporarily present at the output of that node. If this incorrect value then propagates to a state element and is stored, again, we have a soft error. In this work, we refer to the incorrect value within combinational logic a transient fault, and an incorrect stored value a soft error. There are also a number of natural mechanisms within a semiconductor device that may potentially mask the effects of a transient fault. We’ll briefly discuss these mechanisms now. 2

Fault Masking Architectural/Software: incorrect state is written before it is read Electrical: the fault pulse is electrically attenuated by subsequent gates in the circuit Latching-Window: the fault pulse does not reach a state element within the latching window Logical: faulted value does not affect logical operation of the circuit mov r5, 8 mov r2, 4 - … decoder Register File 1 2 3 4 5 add r6, r2, r5 mov r2, 4 CLK tsetup thold mov r5, 8 4 add r6, r2, r5 The first mechanism presented here is logical masking, and this occurs when the output of a combinational circuit is unaffected by a transient fault within the circuit. So here we have a fault occurring, where the faulted value is logically ANDed with 0, thus masking the effects of the transient fault. The next fault-masking mechanism is latching-window masking. Latching-window masking occurs when the transient fault does not propagate to a state element for the required setup/hold time window. Electrical attenuation occurs as a result of the electrical properties of a chain of logic gates, where each gate may potentially reduce the severity of a voltage spike. Lastly, architectural masking occurs when an erroneous value is overwritten before it is read. For example, in this code sequence, we have a fault occurring in r5, in the first cycle, but because the value in r5 is overwritten in the second cycle, the result of the add instruction in the third cycle is unaffected. In this work we model and study the effects of logical, latching-window, and architectural masking, but do not account for the effects of electrical attenuation. 8 9 3

Soft Error Rate Contributions Soft Error Rate Trends Soft Error Rate Contributions The graph on the left is data presented by Subhashish Mitra from Intel that breaks down the contribution to the overall soft error rate of a high-performance microprocessor of different design elements. The yellow portion of this chart is the contribution from sequential state elements such as registers and latches, the portion in purple is the contribution from unprotected SRAMs, and the portion in blue is the contribution from combinational logic. On the right is a graph presented by Shivakumar from UT-Austin which predicts that the SER for combinational logic will increase dramatically in the next few technology generations while the SER for SRAM cells and sequential elements are expected to remain relatively constant. Now, relating this to the embedded design space, we would expect the blue section to be a much more significant portion of the total soft error rate, simply because embedded designs are not nearly so aggressively pipelined, leading to a larger ratio of combinational logic to sequential state elements. Further, as shown in the graph on the right, the SER of logic is expected to increase over the next few technology generations, which also would broaden the effects of faults in combinational logic. The important point to take away here is that there is that the effects of faults in combinational logic are expected to become much more significant, however, most soft error solutions focus simply on well structured SRAMs and state arrays, leaving designs vulnerable to faults in logic. Further, the most common techniques for protecting against faults in logic require structural duplication, which is prohibitively expensive within the embedded domain. Mitra 2005 Shivakumar 2002 Increasing contribution of faults in combinational logic to the overall soft error rate 4

Outline Soft error analysis setup Summary of fault analysis results Fault tolerance techniques Register value cache Strategic deployment of fault detectors Conclusion 5

Fault Analysis Framework Register Bank Data Interface Instruction Address Logic Data Multiply ALU Shift Instruction Decode ARM926EJ-S Instruction Fetch cache MMU Bus Interface Write Buffer/ Mux Array testbench reference design test report generation benchmark fault injection/error analysis framework error checking and logging fault injection scheduler In this work we conducted our experiments using a verilog model of an ARM926EJ-S microprocessor core. The ARM926 is a Harvard architecture consisting of a standard five-stage pipeline with 4KB instruction and data caches. This model was synthesized using an Artisan cell library characterized for a 130nm process with a maximum clock frequency of 200MHz. The chart in the top right corner depicts the area consumption of different design elements within the processor. This chart shows that while the core area is dominated by SRAM arrays, the area consumed by combinational logic is greater than that consumed by sequential state elements. Next we have a high-level diagram of the fault analysis framework used to study the effects of soft-errors within the processor core. In this framework, a testbench instantiates two identical copies of the ARM926 processor core. At the beginning of each simulation, the fault injection scheduler schedules a time at which to inject a fault into one of the cores. If the experiment is meant to model the effects of faults occurring in state elements, a random clock cycle time is selected for fault injection, and at the beginning of that cycle, the output of a randomly selected register is inverted for the duration of the clock cycle. If the experiment is meant to model the effects of faults occurring in combinational logic, a random time instant is selected, as well as a random duration on the interval of ¼ of a clock cycle to an entire clock cycle for fault injection. The output of a random logic gate is then selected at fault injection time and it’s outputs are inverted. Since the fault injection time and duration are uniformly random and ignorant of clock-cycle boundaries, these experiments are used to measure both logical masking and latching-window masking. After a fault is injected into the system, at each subsequent positive edge of the clock signal, every register within the processor core is compared with same register in the second core. If a mismatch occurs, the location of the error and the clock cycle time are logged. 6

Observed Error Rates At the software interface, error rates within 3% Faults Occurring in Registers Error Site Error Rate Microarchitectural State 94% Architectural State 7% 94% 16% 7% 4% Faults Occurring in Combinational Logic Error Site Error Rate Microarchitectural State 16% Architectural State 4% In our first experiment we demonstrate the amount of observed logical masking within the 926 design while running an image processing kernel representing a typical workload on an embedded design. For this experiment we load the instruction memory with the benchmark, set the pc appropriately and allow the test and reference designs to execute for 3000 cycles. Then, a uniform random distribution is used to select a within the subsequent 3000 cycles at which point a fault should be injected. The fault injection site may either be a logic node or a state element and the injection site is selected based on a random distribution over the set of either logic gates or state elements. In this experiment a fault is modeled as a logic state inversion, occurring at the cycle boundary and lasting for the duration of an entire clock cycle. Here we show the observed logical masking rates for microarchitectural state, architectural state, and top-level ports in the design. Where the masking rate is simply the inverse of the error rate. The microarchitectural state is the set of all registers within the design, whereas the architectural state consists of the 31, 32 bit GPRs and 6 status registers defined by the ARM ISA. The top level ports are the entry and exit points for data in and out of the core. We can see that the microarchitectural masking rate for faults occurring in state elements is considerably less than for faults occurring in combinational logic. Meaning that when a fault occurs at a state element, it is much more likely to be expressed in the subsequent cycle. However, the rates of errors occurring at the software-visible level, in architectural state and at top-level ports, potentially sending incorrect data out on the memory bus, typically only differ by about 6%. Also interesting to note here is the average number of bits corrupted when an error is observed. When a single fault is injected into a state element, it is typically expressed as a single microarchitectural bit error, whereas when a fault occurring in logic is expressed, it typically causes multiple state elements to hold incorrect values. At the software interface, error rates within 3% 7

Impact of Fault Injection 8

Targeting the Faults that Count ARM926EJ-S register file consumes 8.7% of total core area Responsible for 57.4% of architectural errors Register file area dominated by combinational logic ECC cost, efficacy? 9

The Register Value Cache Register File 1 Read/Write Addr/Data 2 decoder 3 Read Result 4 5 … Register Value Cache Read/Write Values tee’d off before the register file - can catch faults in either the register file or the cache, not both 1 CMP 2 x 3 Stall/ Check CRC CMP 4 x 5 … CMP 10

The Register Value Cache Index Array Valid Value Array Read/Write Addr Read Data Previous Read Values Write Data CRC CMP Write Data CRC Error Check Operation Write Operation Read Operation Error 11

Example Register File mov r2, 4 mov r2, 4 4 mov r5, 8 mov r5, 8 - mov r2, 4 mov r2, 4 1 - 4 2 - 4 decoder 3 - mov r5, 8 mov r5, 8 4 - 8 5 - add r3, r2, r5 add r3, r1, r4 … Register Cache 4 crc - - 1 8 crc - - 4 x 2 - - Check CRC 3 - - 8 4 5 x … 12

RVC Fault Coverage 57.4% 13

RVC Overhead 14

What About the Rest? Leverage fault fanout to place detectors at likely targets 15

Fault Fanout 16

Transient Fault Detector Main Flip-Flop Main Flip-Flop Q CLK Shadow Latch Shadow Latch Error Delay We want to detect these events when they happen. We don’t want to place these detectors everywhere… A Self-Tuning DVS Processor Using Delay-Error Detection and Correction: S. Das 2006 17

Glitch Detector Coverage Power Area Coverage Coverage Percent Overhead Percent Overhead 18

Combined Technique Coverage Power Area Coverage Coverage Percent Overhead Percent Overhead 19

Conclusion Circuit level soft error analysis offers significant insight Faults in combinational logic do not require structural duplication Coverage versus cost tradeoffs available Significant benefits in compromise 85% fault coverage for only 5.5% area 2-3x increase in MTTF 20

Questions? 21

RVC Hit Rates 22