The Complexity of Adding Failsafe Fault-tolerance Sandeep S. Kulkarni Ali Ebnenasir.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Tintu David Joy. Agenda Motivation Better Verification Through Symmetry-basic idea Structural Symmetry and Multiprocessor Systems Mur ϕ verification system.
Comparative Succinctness of KR Formalisms Paolo Liberatore.
CS 267: Automated Verification Lecture 8: Automata Theoretic Model Checking Instructor: Tevfik Bultan.
Partial Order Reduction: Main Idea
Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.
Hoare’s Correctness Triplets Dijkstra’s Predicate Transformers
Byzantine Generals Problem: Solution using signed messages.
On the complexity of orthogonal compaction maurizio patrignani univ. rome III.
Complexity Results about Nash Equilibria
Computability and Complexity 14-1 Computability and Complexity Andrei Bulatov Cook’s Theorem.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Synthesis of Fault-Tolerant Distributed Programs Ali Ebnenasir Department of Computer Science and Engineering Michigan State University East Lansing MI.
Enhancing The Fault-Tolerance of Nonmasking Programs Sandeep S. Kulkarni and Ali Ebnenasir Software Engineering and Network Systems Laboratory Computer.
The Theory of NP-Completeness
Technion 1 Generating minimum transitivity constraints in P-time for deciding Equality Logic Ofer Strichman and Mirron Rozanov Technion, Haifa, Israel.
CS294, YelickConsensus, p1 CS Consensus
Technion 1 (Yet another) decision procedure for Equality Logic Ofer Strichman and Orly Meir Technion.
Self-Stabilization An Introduction Aly Farahat Ph.D. Student Automatic Software Design Lab Computer Science Department Michigan Technological University.
Automatic Synthesis of Fault-Tolerance Ali Ebnenasir Software Engineering and Network Systems Laboratory Computer Science and Engineering Department Michigan.
Toward NP-Completeness: Introduction Almost all the algorithms we studies so far were bounded by some polynomial in the size of the input, so we call them.
CS 603 Communication and Distributed Systems April 15, 2002.
ECE 667 Synthesis and Verification of Digital Systems
R R R Fault Tolerant Computing. R R R Acknowledgements The following lectures are based on materials from the following sources; –S. Kulkarni –J. Rushby.
Romaric GUILLERM Hamid DEMMOU LAAS-CNRS Nabil SADOU SUPELEC/IETR ESM'2009, October 26-28, 2009, Holiday Inn Leicester, Leicester, United Kingdom.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
February 18, 2015CS21 Lecture 181 CS21 Decidability and Tractability Lecture 18 February 18, 2015.
Theory of Computation, Feodor F. Dragan, Kent State University 1 NP-Completeness P: is the set of decision problems (or languages) that are solvable in.
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
Consensus and Its Impossibility in Asynchronous Systems.
Defining Programs, Specifications, fault-tolerance, etc.
Incompleteness and incomparability in preference aggregation: complexity results M. Silvia Pini*, Francesca Rossi*, K. Brent Venable*, and Toby Walsh**
Inferring Synchronization under Limited Observability Martin Vechev, Eran Yahav, Greta Yorsh IBM T.J. Watson Research Center (work in progress)
Fault-Tolerant Parallel and Distributed Computing for Software Engineering Undergraduates Ali Ebnenasir and Jean Mayo {aebnenas, Department.
1 Features as Constraints Rafael AccorsiUniv. Freiburg Carlos ArecesUniv. Amsterdam Wiet BoumaKPN Research Maarten de RijkeUniv. Amsterdam.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
Hwajung Lee. One of the selling points of a distributed system is that the system will continue to perform even if some components / processes fail.
Symbolic Synthesis of Masking Fault-Tolerant Distributed Programs Borzoo Bonakdarpour Workshop APRETAF January 23, 2009 Joint work with Sandeep Kulkarni.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Validation - Formal verification -
Lecture 5 1 CSP tools for verification of Sec Prot Overview of the lecture The Casper interface Refinement checking and FDR Model checking Theorem proving.
1 How to establish NP-hardness Lemma: If L 1 is NP-hard and L 1 ≤ L 2 then L 2 is NP-hard.
Self-stabilization in NEST Mikhail Nesterenko (based on presentation by Anish Arora, Ohio State University)
CS 542: Topics in Distributed Systems Self-Stabilization.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.
Chapter 11 Introduction to Computational Complexity Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Program Correctness. The designer of a distributed system has the responsibility of certifying the correctness of the system before users start using.
Superstabilizing Protocols for Dynamic Distributed Systems Authors: Shlomi Dolev, Ted Herman Presented by: Vikas Motwani CSE 291: Wireless Sensor Networks.
Faults and fault-tolerance One of the selling points of a distributed system is that the system will continue to perform even if some components / processes.
Design of Tree Algorithm Objectives –Learning about satisfying safety and liveness of a distributed program –Apply the method of utilizing invariants and.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
Complexity of Compositional Model Checking of Computation Tree Logic on Simple Structures Krishnendu Chatterjee Pallab Dasgupta P.P. Chakrabarti IWDC 2004,
LPV: a new technique, based on linear programming, to formally prove or disprove safety properties J-L Lambert, valiosys.
Faults and fault-tolerance
On the Size of Pairing-based Non-interactive Arguments
NP-Completeness (2) NP-Completeness Graphs 7/23/ :02 PM x x x x
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
On the Complexity of Buffer Allocation in Message Passing Systems
NP-Completeness Yin Tat Lee
Propositional Calculus: Boolean Algebra and Simplification
Lecture 24 NP-Complete Problems
NP-Completeness Proofs
Faults and fault-tolerance
NP-Complete Problems.
NP-Completeness Yin Tat Lee
CSE 589 Applied Algorithms Spring 1999
COP4020 Programming Languages
Presentation transcript:

The Complexity of Adding Failsafe Fault-tolerance Sandeep S. Kulkarni Ali Ebnenasir

Motivations Why automatic addition of fault-tolerance? Why begin with a fault-intolerant program? Reuse of the fault-intolerant program Separation of concerns (functionality vs. fault- tolerance) Potential to preserve properties such as efficiency One obstacle Adding masking fault-tolerance to distributed programs is NP-hard [ FTRTFT, 2000]

Motivation (Continued) Approach for dealing with complexity Heuristics [SRDS 2001] Weaker form of tolerance  Failsafe Safety only in the presence of faults  Nonmasking Safety may be temporarily violated Restricting input  Programs  Specifications

Motivation (Continued) Why failSafe Fault-Tolerance? Simplify the design of masking Partial automation of masking fault-tolerance (using TSE ’ 98) Intolerant Program Nonmasking fault-tolerant Masking fault-tolerantFailsafe fault-tolerant Automate

Outline of the Talk Problem of adding fault-tolerance Difficulties caused by distribution Complexity of failsafe fault-tolerance Class of programs and specifications for which polynomial synthesis is possible

Basic Concepts: Programs and Faults State space S p Program transitions delta p, faults delta f Invariant S, fault-span T Specification spec: Safety is specified by transitions, (s j, s k ) that should not be executed S T p/fp f

Problem Statement Inputs: program p, Invariant S, Faults f, Specification spec Outputs: program p ’, Invariant S ’ Requirements: Only fault-tolerance is added; no new functional behavior is added Invariant of fault-intolerant programInvariant of fault-tolerant program No new transition here New transitions may be added here

Difficulties with Distribution Read/Write restrictions Two Boolean variables a and b Process cannot read b Can we include the following transition ? a=0,b=0 a=1,b=0 Only if we include the transition a=0,b=1 a=1,b=1 Groups of transitions (instead of individual transitions) must be chosen.

Reduction from 3-SAT Included iff x 0 is false Included iff x 0 is true Included iff x j is false Included iff x k is true Included iff x l is false c j = x j \/ x k \/ x l _ a n = a 0 a0a0

Dealing with the Complexity of Adding Failsafe Fault-tolerance For what class of problems, failsafe fault- tolerance can be added in polynomial time Restrictions on Fault-tolerant programs Specifications Faults Our approach for restrictions: In the absence of faults, preserve all computations of the fault-intolerant program

Restrictions on Programs and Specifications Monotonicity requirements Capture the notion that safe assumptions can be made about variables that cannot be read Focus on specifications and transitions of fault-intolerant programs

Monotonicity of Specifications Definition: A specification spec is positive monotonic with respect to variable x iff:  For every s 0, s 1, s ’ 0, s ’ 1 : The value of all other variables in s 0 and s ’ 0 are the same The value of all other variables in s 1 and s ’ 1 are the same s1s1 s0s0 x = false If Does not violate safety s’0s’0 s’1s’1 x = true Does not violate safety Then

Monotonicity of Programs Definition: Program p with invariant S is negative monotonic with respect to variable x iff:  For every s 0, s 1, s ’ 0, s ’ 1 : The value of all other variables in s 0 and s ’ 0 are the same The value of all other variables in s 1 and s ’ 1 are the same s1s1 s0s0 Invariant S x = true s’0s’0 s’1s’1 X = falsex = false

Theorem Adding failsafe fault-tolerance can be done in polynomial time if either:  Program is negative monotonic, and  Spec is positive monotonic Or  Program is positive monotonic, and  Spec is negative monotonic If only one of these conditions is satisfied then adding failsafe fault-tolerance is still NP- hard For many problems, these requirements are easily met

Example: Byzantine Agreement Processes: General, g, and three non-generals j, k, and l Variables d.g : {0, 1} d.j, d.k, d.l : {0, 1, ┴ } b.g, b.j, b.k, b.l : {true, false} f.g, f.j, f.k, f.l : {0, 1} Fault-intolerant program transitions d.j = ┴ /\ f.j = 0 d.j := d.g d.j ≠ ┴ /\ f.j = 0 f.j := 1 Fault transitions ¬ b.g /\ ¬ b.j /\ ¬ b.k /\ ¬ b.l b.j := true b.j d.j,f.j :=0|1,0|1

Example: Byzantine Agreement (Continued) Safety Specification: Agreement: No two non-Byzantine non-generals can finalize with different decisions Validity: If g is not Byzantine, no process can finalize with different decision with respect to g Read/Write restrictions Readable variables for process j:  b.j, d.j, f.j  d.g, d.k, d.l Process j can write  d.j, f.j

Example: Byzantine Agreement (Continued) Observation 1: Positive monotonicity of specification with respect to b.j Observation 2: Negative monotonicity of program, consisting of the transitions of j, with respect to b.k Observation 3: Negative monotonicity of specification with respect to f.j Observation 4: Positive monotonicity of program, consisting of the transitions of j, with respect to f.k

Summary Complexity analysis for failsafe fault- tolerance Reduction from 3-SAT Restrictions on specifications and programs for which polynomial synthesis is possible  Several problems fall in this category Byzantine agreement, consensus, commit, … Necessity of these restrictions

Future Work Simplifying the design of masking fault- tolerance using the two-step approach Refining boundary between classes for which polynomial synthesis is possible and for which exponential complexity is inevitable Using monotonicity requirements for simplifying masking fault-tolerance

Thank You Questions?

Future Work Conclusion Specifying the boundary  Fault-tolerance addition can be done in polynomial time  Exponential complexity is inevitable  Goal: what problems can benefit from automation? Necessity and sufficiency of monotonicity requirements Future Work How can we Change a non-monotonic program to a monotonic one by modifying its invariant? How can we Strengthen a non-monotonic specification to a monotonic one? How a nonmasking program can be designed manually to satisfy monotonicity requirements?

Basic Concepts: Fault-tolerant Program Fault-tolerance in the presence of faults: Failsafe: Satisfies its safety specification Nonmasking: Satisfies its liveness specification (safety may be violated temporarily) Masking: Satisfies safety and liveness specification

The complexity of Adding Failsafe fault-tolerance Adding (failsafe/nonmasking/masking) fault- tolerance in high atomicity model is in P Adding masking fault-tolerance to distributed programs is in NP How about failsafe? Adding Failsafe to distributed programs is NP-hard!! (proof in the paper) Reduction of 3-SAT to the problem of failsafe fault-tolerance addition

Our Approach Stepwise towards masking fault- tolerance: Automating the addition of failsafe fault-tolerance How hard is adding failsafe fault- tolerance? Polynomial time boundaries for failsafe tolerance addition?

S p’ S p,