ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky.

Slides:

Advertisements

Similar presentations

PIPELINE AND VECTOR PROCESSING

Advertisements

1 CONSTRUCTING AN ARITHMETIC LOGIC UNIT CHAPTER 4: PART II.

Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,

ECIV 201 Computational Methods for Civil Engineers Richard P. Ray, Ph.D., P.E. Error Analysis.

Chapter 4 Operations on Bits

Signed Numbers.

Floating Point Numbers

CHAPTER 5: Floating Point Numbers

1 ECE369 Chapter 3. 2 ECE369 Multiplication More complicated than addition –Accomplished via shifting and addition More time and more area.

IT Systems Number Operations EN230-1 Justin Champion C208 –

Operations on data CHAPTER 4.

4 Operations On Data Foundations of Computer Science ã Cengage Learning.

Prepared by: Hind J. Zourob Heba M. Matter Supervisor: Dr. Hatem El-Aydi Faculty Of Engineering Communications & Control Engineering.

Computer Organization and Architecture Computer Arithmetic Chapter 9.

Computer Arithmetic Nizamettin AYDIN

Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.

1 Lecture 5 Floating Point Numbers ITEC 1000 “Introduction to Information Technology”

Fixed-Point Arithmetics: Part II

IT253: Computer Organization

IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.

07/19/2005 Arithmetic / Logic Unit – ALU Design Presentation F CSE : Introduction to Computer Architecture Slides by Gojko Babić.

CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.

HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.

CS717 Algorithm-Based Fault Tolerance Matrix Multiplication Greg Bronevetsky.

CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.

Principles of Linear Pipelining

Computer Architecture Lecture 32 Fasih ur Rehman.

Spring 2008 CSE 591 Compilers for Embedded Systems Aviral Shrivastava Department of Computer Science and Engineering Arizona State University.

IT253: Computer Organization

MIPS ALU. Building from the adder to ALU ALU – Arithmetic Logic Unit, does the major calculations in the computer, including – Add – And – Or – Sub –

Chapter 4 Operations on Bits. Apply arithmetic operations on bits when the integer is represented in two’s complement. Apply logical operations on bits.

Short Cuts for Multiply and Divide For Positive Numbers 1. Multiply by 2 k is the same as shift k to the left, 0 fill 2. Divide by 2 k is the same as.

Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.

Speedup Speedup is defined as Speedup = Time taken for a given computation by a non-pipelined functional unit Time taken for the same computation by a.

Floating Point Numbers Dr. Mohsen NASRI College of Computer and Information Sciences, Majmaah University, Al Majmaah

Chapter 9 Computer Arithmetic

William Stallings Computer Organization and Architecture 8th Edition

CHAPTER 5: Representing Numerical Data

Dr.Faisal Alzyoud 2/20/2018 Binary Arithmetic.

Chapter 4 Operations on Bits.

NxN Crossbar design for Barrel Shifter

A Level Computing Component 2

William Stallings Computer Organization and Architecture 7th Edition

Chapter 6 Floating Point

Data Representation and Arithmetic Algorithms

CSCE 350 Computer Architecture

Digital Logic & Design Lecture 03.

How to represent real numbers

Computer Arithmetic Multiplication, Floating Point

ECEG-3202 Computer Architecture and Organization

Data Representation and Arithmetic Algorithms

Chapter 3 DataStorage Foundations of Computer Science ã Cengage Learning.

Chapter 8 Computer Arithmetic

ECE 352 Digital System Fundamentals

靜夜思床前明月光，疑是地上霜。舉頭望明月，低頭思故鄉。 ~ 李白李商隱.

Presentation transcript:

ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky

ED 4 I Background A code transformation system developed at the Stanford Center for Reliable Computing. Authors: Nahmsuk Oh, Subhasish Mitra, Edward J. McCluskey ED 4 I allows us to run a program on two slightly different inputs and still be able to compare results at the end.

Motivation The simplest way to detect Byzantine Faults is to run the same program on multiple processors and compare results. ED 4 I is Byzantine Fault detection for uniprocessors. Must take into account both temporary and and permanent faults.

Definitions Temporary Faults – any fault that temporarily affects a processor, long enough to execute several instructions. Ex: Radiation hitting wires, frayed wires. Permanent Faults – a fault that affects a processor for a long period of time. Ex: Spilling Coke on the chip, cut wires.

Problem Statement We can detect Byzantine Failures by running each program or procedure twice and comparing the results. However, this does not guard against permanent faults since the results of both runs will be the same. Need to make the two runs different so that the same fault will affect the results differently. Overhead = 100%.

Key Idea Lets feed into the program two different sets of data and then compare the results. Key Insight: If the program only uses arithmetic operations, we can alter the input by multiplying all input numbers by a constant. Then the modified output will be the (real output) * (the constant). Thus, you can verify that the two computations succeeded AND the two computations will be affected by errors differently.

New Program If we alter the input to the program, we must alter the program to work with this modified input. The transformation is given the constant k (called the “diversity factor”) and it creates the “k-factor diverse program”. The new program will have the same control flow graph as the old program but all the variables will be k-multiples of the of original ones.

Transformations If k ↔ <, ≥ ↔ ≤) All constants in code get multiplied by k. Addition and Subtraction of variables unchanged. Multiplication: v 1 *v 2 *....*v n → (v 1 *v 2 *....*v n )/k n-1 Division: v 1 /v 2 → (v 1 /v 2 )*k

Fault Detection Probability For functional unit h i (such as the adder), fault f and diversity factor k: X i = is the set of inputs to h i E i = subset of X containing the inputs that will result in erroneous output due to the fault. E' i = subset of E i that will escape detection C i (k) = Probability of catching an error in h i.

Data Integrity Probability For functional unit h i, fault f and diversity factor k: X i = is the set of inputs to h i E i = subset of X containing the inputs that will result in erroneous output due to the fault. E' i = subset of E i that will escape detection D i (k) = Probability of missing no errors in h i.

Choosing the value of k For some functional units we can derive C i (k) and D i (k) analytically for each k. This is too hard in general so we resort to trying out a range of k's empirically to determine C i (k) and D i (k).

Bus Signal Line Bus wire stuck at either 0 or 1. Derived results for a 12-bit bus:

Adder Experimental results for a 12-bit ripple carry adder: Experimental results for a 12-bit carry lookahead adder:

Multiplier & Divider Experimental Results for 12-bit array multiplier 8-bit Wallace Tree multiplier SRT divider

Shifter Experimental Results for 16-bit multiplexer- based shifter:

Using Benchmarks to pick k Need to determine how much each functional unit is used in the average program. Add, sub, mult and shift use the obvious functional units. “memory access” uses the memory bus “branch” uses a carry-lookahead adder

Benchmarked Data Integrity Calculated Data Integrity=D i (k) given above usage statistics. (high D i (k) top priority) Highlighted columns provide the best data integrity for each benchmark.

Benchmarked Detection Probability Calculated Detection Probability=C i (k) given above usage statistics. Highlighted columns provide the best detection probability for each benchmark.

Optimum k Optimum k selected: Must maximize the Data Integrity=D i (k). Given maximum D i (k), maximize C i (k). For each program, should get an estimate of how it uses the different functional units and pick k accordingly.

Dealing with Overflow By multiplying all variables by k, we may cause them to overflow. Can scale variables up to next largest type. Scale down variables by dividing by k. Must only check higher order bits when comparing new results to results of original program. Can use compile-time range checking to determine vulnerability to overflow and pick k accordingly

Floating Point Numbers Above technique fails for floating point numbers. IEEE 754 format: K=-2 will only change the sign bit and some bits in the exponent. Solution: pick separate k's for the exponent and the mantissa and run the program once with each k. Overhead = 200%.

Picking k for the mantissa To find errors in mantissa, pick k to be 3/2. A stuck-at-1 fault: In original program, variable x's value corrupted to: In transformed program, Since However, the mantissa must be <2, so if the mantissa is right shifted by 1 and normalized.

Transformed variables So now, the value in transformed program is: Value in original program is:

Fault Detection in Mantissa If there is a stuck-at-1 fault Value in transformed program: Value in original program * k (for checking):

We can detect Mantissa errors! Note that the error values for the original and the transformed programs are different! We actually use k= in order to flip the sign bit for improved detection capability

k for exponents In order to flip all the bits of the exponent, need to transform program to use k= and k= If a fault invalidates a bit of the exponent, the fault will be detected by comparing to the exponents of one of the two transformed programs.

Effectiveness for Mantissa Effectiveness of k= (for IEEE 754 single precision)

Effectiveness for Exponent Effectiveness of k= (for IEEE 754 single precision)

Summary ED 4 I effectively detects Byzantine Failures in numerical applications on uniprocessors. Purely software solution using Data Diversity. Detects permanent and temporary faults. Works with fixed-point and floating point numbers. Compatible with arithmetic and logical operations (probably with any bitwise logical operation if it can be recast into arithmetic) High overhead: 100% or 200%.