LOGO Soft-Error Detection Through Software Fault-Tolerance Techniques by Gökhan Tufan İsmail Yıldız.

Slides:



Advertisements
Similar presentations
Target Code Generation
Advertisements

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 3 Developed By:
Optimizing Compilers for Modern Architectures Allen and Kennedy, Chapter 13 Compiling Array Assignments.
Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Fault-Tolerant Systems Design Part 1.
The Assembly Language Level
Macro Processor.
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
Programming Types of Testing.
CS 355 – Programming Languages
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Fehlererkennung in SW David Rigler. Overview Types of errors detection Fault/Error classification Description of certain SW error detection techniques.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
PRE-PROGRAMMING PHASE
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
CHAPTER 7: SORTING & SEARCHING Introduction to Computer Science Using Ruby (c) Ophir Frieder at al 2012.
Fundamentals of Python: From First Programs Through Data Structures
Fundamentals of Python: First Programs
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
Ryan Chu. Arithmetic Expressions Arithmetic expressions consist of operators, operands, parentheses, and function calls. The purpose is to specify an.
INTRODUCTION TO COMPUTING CHAPTER NO. 06. Compilers and Language Translation Introduction The Compilation Process Phase 1 – Lexical Analysis Phase 2 –
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Instituto de Informática and Dipartimento di Automatica e Informatica Universidade Federal do Rio Grande do Sul and Politecnico di Torino Porto Alegre,
Low Level Programming Lecturer: Duncan Smeed Low Level Program Control Structures.
1 Automatic Refinement and Vacuity Detection for Symbolic Trajectory Evaluation Orna Grumberg Technion Haifa, Israel Joint work with Rachel Tzoref.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Computer Science Department Data Structure & Algorithms Lecture 8 Recursion.
Problem Solving Techniques. Compiler n Is a computer program whose purpose is to take a description of a desired program coded in a programming language.
Fault-Tolerant Systems Design Part 1.
European Test Symposium, May 28, 2008 Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI Kundan.
Controlling Execution Programming Right from the Start with Visual Basic.NET 1/e 8.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Pseudocode. Simple Program Design, Fourth Edition Chapter 2 2 Objectives In this chapter you will be able to: Introduce common words, keywords, and meaningful.
CPS120: Introduction to Computer Science Lecture 14 Functions.
CONTENTS Processing structures and commands Control structures – Sequence Sequence – Selection Selection – Iteration Iteration Naming conventions – File.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
CprE 458/558: Real-Time Systems
Fault-Tolerant Systems Design Part 1.
Chapter 7 Object Code Generation. Chapter 7 -- Object Code Generation2  Statements in 3AC are simple enough that it is usually no great problem to map.
Using Loop Invariants to Detect Transient Faults in the Data Caches Seung Woo Son, Sri Hari Krishna Narayanan and Mahmut T. Kandemir Microsystems Design.
SOFTWARE TESTING. Introduction Software Testing is the process of executing a program or system with the intent of finding errors. It involves any activity.
Introduction to OOP CPS235: Introduction.
SAFEWARE System Safety and Computers Chap18:Verification of Safety Author : Nancy G. Leveson University of Washington 1995 by Addison-Wesley Publishing.
/ PSWLAB Thread Modular Model Checking by Cormac Flanagan and Shaz Qadeer (published in Spin’03) Hong,Shin Thread Modular Model.
LECTURE 4 Logic Design. LOGIC DESIGN We already know that the language of the machine is binary – that is, sequences of 1’s and 0’s. But why is this?
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
Computer Organization Instructions Language of The Computer (MIPS) 2.
Evaluating the Fault Tolerance Capabilities of Embedded Systems via BDM M. Rebaudengo, M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica.
PROGRAMMING FUNDAMENTALS INTRODUCTION TO PROGRAMMING. Computer Programming Concepts. Flowchart. Structured Programming Design. Implementation Documentation.
Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Windows Programming Lecture 03. Pointers and Arrays.
The PLA Model: On the Combination of Product-Line Analyses 강태준.
Week#3 Software Quality Engineering.
A New Approach to Software-Implemented Fault Tolerance
Software Testing.
Soft-Error Detection through Software Fault-Tolerance Techniques
COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE
nZDC: A compiler technique for near-Zero silent Data Corruption
Chapter 1 Introduction(1.1)
Automatic Test Pattern Generation
ECE 352 Digital System Fundamentals
Yan Shi CS/SE 2630 Lecture Notes
Presentation transcript:

LOGO Soft-Error Detection Through Software Fault-Tolerance Techniques by Gökhan Tufan İsmail Yıldız

Objective  The paper describes a systematic approach for automatically introducing data and code redundancy into an existing program written using a high-level language.  The transformations aim at making the program able to detect most of the soft-errors affecting data and code, independently of the Error Detection Mechanisms (EDMs) possibly implemented by the hardware.  Since the transformations can be automatically applied as a pre-compilation phase, the programmer is freed from the cost and responsibility of introducing suitable EDMs in its code.

Agenda Introduction and Literature 1 Transformation Rules 2 Experimental Results 3 Conclusion 4

Introduction and Literature  Trend  The increasing popularity of low-cost safety-critical computer-based applications asks for the availability of new methods for designing dependable systems.  Major concern  The cost (and hence the design and development time)  Solutions  The adoption of commercial hardware is a common practice.  Relying on software techniques for obtaining dependability often means accepting some overhead in terms of increased size of code and reduced performance.

Software Fault Tolerance  A way for facing the consequences of hardware errors  in particular those originating from transient faults caused for example by small particles hitting the circuit  No software bugs  assume that the code is correct  the faulty behavior is only due to transient faults affecting the system.

Software Error Detection Techniques Algorithm Based Fault Tolerance Assertions Control Flow Checking Software Error Detection Techniques Procedure Duplication Automatic Transformations

Main Features  Introducing data and code redundancy according to a set of transformations to be performed on the high-level source code CODE duplicating the code implementing each operation, adding checks for verifying the consistency of the executed operations DATA achieved by duplicating each variable and adding consistency checks after every read operation Detect errors affecting

Advantages 1 automatically applied to a high-level source code 3 complements other already existing error detection mechanisms 2 completely independent on the underlying hardware 4 detects a wide range of faults, and is not limited to a specific fault model

Agenda Introduction and Literature 1 Transformation Rules 2 Experimental Results 3 Conclusion 4

Properties of Transformation Rules  To be applied to the high level code  Introduce data and code redundancy  No assumption on the cause or on the type of the fault  Assume that an error corresponds to one or more bits whose value is erroneously changed while they are stored in memory, cache, or register, or transmitted on a bus.

Properties of Transformation Rules  Although devised for transient faults, is also able to detect most permanent faults possibly existing in the system.  Compared to other error detection methods:  The detection capabilities of these rules are much higher  Since they address any error affecting the data, without any limitation on the number of modified bits or on the physical location of the bits themselves.

Basic Rules - Errors in Data  Rule #1: every variable x must be duplicated: let x1 and x2 be the names of the two copies  Rule #2: every write operation performed on x must be performed on x1 and x2  Rule #3: after each read operation on x, the two copies x1 and x2 must be checked for consistency, and an error detection procedure should be activated if an inconsistency is detected.

Code modification for errors affecting data Original CodeModified Code a = b;a0 = b0; a1 = b1; if (b0 != b1) error(); a = b + c;a0 = b0 + c0; a1 = b1 + c1; if ((b0!=b1) || (c0!=c1)) error();

Rules imply that…  Any variable v must be split in two copies v0 and v1 that should always store the same value  A consistency check on v0 and v1 must be performed each time the variable is read  The check must be performed immediately after the read operation in order to block the fault effect propagation  Variables should be checked also when they appear in any expression used as a condition for branches or loops  Each instruction that writes variable v must also be duplicated in order to update the two copies of the variable.

In case of a procedure…  The parameters passed to a procedure, as well as the returned values, should be considered as variables.  Therefore, the rules defined above can be extended as follows:  every procedure parameter is duplicated  each time the procedure reads a parameter, it checks the two copies for consistency  the return value is also duplicated

Modification for errors affecting procedure parameters Original CodeModified Code res = search (a); … int search (int p) { int q; … q = p + 1; … return(1); } search(a0, a1, &res0, &res1); … void search (int p0,int p1,int *r0,int *r1) { int q0, q1; … q0 = p0 + 1; q1 = p1 + 1; if (p0 != p1) error(); … *r0 = 1; *r1 = 1; return; }

Statements Type S1 statements affecting data only (assignments, arithmetic expression computations) Type S2 statements affecting the execution flow (tests, loops, procedure calls and returns) Statements

Errors affecting the code Type E1 errors changing the operation to be performed by the statement, without changing the code execution flow (by changing an add operation into a sub) Type E2 errors changing the execution flow (by transforming an add operation into a jump or vice versa). Errors

Classification of the effects of the errors

E1 errors affecting S1 statements  Automatically detected by simply applying the transformation rules introduced above for errors affecting data  Consider a statement executing an addition between two operands  Rule #2 and #3 also guarantee the detection of any error of type E1 which transforms the addition into another operation

E2 errors affecting S1 statements  The error that transforms an addition operation into a jump may be an example  Solution is based on tracking the execution flow, trying to detect differences with respect to the correct behavior  First identify all the basic blocks composing the code  A basic block is a sequence of statements which are always indivisibly executed (they are branch- free)

Rules  Rule #4: an integer value k i is associated with every basic block i in the code  Rule #5:  a global execution check flag (ecf) variable is defined  a statement assigning to ecf the value of k i is introduced at the very beginning of every basic block i  a test on the value of ecf is also introduced at the end of the basic block

Example of code transformation for E2 errors affecting S1 statements Original CodeModified Code /* basic block beginning */ … /* basic block end */ /* basic block beginning #371 */ ecf = 371; … if (ecf != 371) error(); /* basic block end */

Rules  The aims of these rules are i.to check whether any error happened whose effect is to modify the correct execution flow ii.to introduce a jump to an incorrect target address  An error modifying the field containing the target address in a jump instruction  An error that changes an ALU instruction (e.g., an add) into a branch one

Faults, which can not be detected by the proposed rules any erroneous jump into the same basic block any error producing a jump to the first assembly instruction of a basic block (the one assigning to ecf the value corresponding to the block) Faults

Errors affecting S2 statements  The issue is how to verify that the correct execution flow is followed  In order to detect errors affecting a test statement, the following rule is introduced:  Rule #6: For every test statement  the test is repeated at the beginning of the target basic block of both the true and (possible) false clause  If the two versions of the test (the original and the newly introduced) produce different results, an error is signaled

Code transformation for a test statement Original CodeModified Code if (condition) {/* Block A */ … } else {/* Block B */ … } if (condition) {/* Block A */ if(!condition) error(); … } else {/* Block B */ if(condition) error(); … }

Procedure call and Return statements  Rule #7: an integer value k j is associated with any procedure j in the code  Rule #8: immediately before every return statement of the procedure  the value k j is assigned to ecf  a test on the value of ecf is also introduced after any call to the procedure.

Code transformation for the procedure call and return statements Original CodeModified Code … ret = my_proc(a); /* procedure call */ … /* procedure definition */ int my_proc(int a) { /* procedure body */ … return(0); } … /*call of procedure #790 */ ret = my_proc( a); if( ecf != 790) error(); … /* procedure definition */ int my_proc(int a) { /* procedure body */ … ecf = 790; return (0); }

Detected errors by Rule #7 and #8 errors causing a jump into the procedure code errors causing a jump to the statement following the call statement errors affecting the target address of the call instruction errors affecting the register storing the procedure return address

Agenda Introduction and Literature 1 Transformation Rules 2 Experimental Results 3 Conclusion 4

Experiment Process Phase 1 Phase 2 Phase 3 Select a set of simple C programs to be used as benchmarks Apply the proposed approach by manually modifying their source code according to the previously introduced rules Perform a set of fault injection experiments able to assess the detection capabilities of the resulting system

Benchmark Programs Bubble SortMatrixParser an implementation of the bubble sort algorithm, run on a vector of 10 integer elements multiplication of two matrices composed of 10x10 integer values a syntactical analyzer for arithmetic expressions written in ASCII format

Effects of proposed transformations Source code size increase Executable code size increase Performance slow-down Bubble Matrix Parser Average

Fault Injection Environment  Fault Injection is performed  By exploiting an ad hoc hardware device which allows monitoring the program execution and triggering a fault injection procedure when a given point is reached  For the purpose of the experiments, the adopted fault model is the single-bit flip into memory locations.  Faults are randomly generated.

Fault Classification They did not produce any difference in the program behavior Detected by the error procedure activated according to the proposed transformation rules Detected by a hardware EDM They have not been detected by any EDM and do produce a different behavior Fail Silent Fail Silent Violations SW-detected HW-detected

Fault injection results for faults in the CODE area TotalFail Silent HW- detected SW- detected Fail Silent Violations Bubble Matrix Parser Average

Fault injection results for faults in the DATA area TotalFail Silent HW- detected SW- detected Fail Silent Violations Bubble Matrix Parser Average

Agenda Introduction and Literature 1 Transformation Rules 2 Experimental Results 3 Conclusion 4

 The proposed transformation rules are suitable to be automatically implemented into a compiler as a pre-processing phase,  thus becoming completely transparent to the programmer  reduce the cost for developing safe programs, and increasing the confidence in the obtained safety level  Experimental results show that the rules are able to reach a very high degree of coverage of the faults which can possibly happen in a microprocessor based system

Conclusion  The application of the method  increases the code size by an average factor of 2  slow-down its performance by a factor of 5  However, in most safety-critical systems only a limited portion of the code must be fault tolerant, while other parts are not crucial for the correct behavior of the whole system  Therefore, the slow-down and code size increase factors related to the whole system are generally lower

References Soft-error Detection through Software Fault-Tolerance Techniques by M. Rebaudengo, M. Sonza Reorda, M. Torchiano 1 Experimental Evaluation of the Fail-Silent Behavior in Programs with Consistency Checks by M. Zenha Rela, H. Madeira, J. G. Silva 2 3 An integrated HW and SW Fault Injection environment for real-time systems by A. Benso, P.L. Civera, M. Rebaudengo, M. Sonza Reorda

LOGO