1 The Formal Verification of SPIDER Lee Pike Department of Computer Science Indiana University, Bloomington

Slides:

Advertisements

Similar presentations

Byzantine Generals. Outline r Byzantine generals problem.

Advertisements

BASIC BUILDING BLOCKS -Harit Desai. Byzantine Generals Problem If a computer fails, –it behaves in a well defined manner A component always shows a zero.

PROTOCOL VERIFICATION & PROTOCOL VALIDATION. Protocol Verification Communication Protocols should be checked for correctness, robustness and performance,

Bus Architectures for Satety- Critical Embedded Systems --by Harit Desai.

CS 603 Handling Failure in Commit February 20, 2002.

The Byzantine Generals Problem Boon Thau Loo CS294-4.

Prepared by Ilya Kolchinsky.  n generals, communicating through messengers  some of the generals (up to m) might be traitors  all loyal generals should.

Byzantine Generals Problem: Solution using signed messages.

1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.

Structure of Consensus 1 The Structure of Consensus Consensus touches upon the basic “topology” of distributed computations. We will use this topological.

Byzantine Generals Problem Anthony Soo Kaim Ryan Chu Stephen Wu.

Distributed Systems Fall 2010 Time and synchronization.

1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.

CS 582 / CMPE 481 Distributed Systems Fault Tolerance.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.

CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.

1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:

CS294, YelickSelf Stabilizing, p1 CS Self-Stabilizing Systems

Replication Management using the State-Machine Approach Fred B. Schneider Summary and Discussion : Hee Jung Kim and Ying Zhang October 27, 2005.

2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.

The Byzantine Generals Strike Again Danny Dolev. Introduction We’ll build on the LSP presentation. Prove a necessary and sufficient condition on the network.

1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.

Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.

Composition Model and its code. bound:=bound+1.

Langley Research Center SPIDER Formal Models–Where are we now? Paul S. Miner In collaboration with: Alfons Geser (NIA), Jeff Maddalon,

Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.

Fault Tolerance via the State Machine Replication Approach Favian Contreras.

The Architecture of Secure Systems Jim Alves-Foss Laboratory for Applied Logic Department of Computer Science University of Idaho By, Nagaashwini Katta.

Andreas Larsson, Philippas Tsigas SIROCCO Self-stabilizing (k,r)-Clustering in Clock Rate-limited Systems.

Lecture 0 Anish Arora CSE 6333 Introduction to Distributed Computing.

Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.

ARMADA Middleware and Communication Services T. ABDELZAHER, M. BJORKLUND, S. DAWSON, W.-C. FENG, F. JAHANIAN, S. JOHNSON, P. MARRON, A. MEHRA, T. MITTON,

Total Order Broadcast and Multicast Algorithms: Taxonomy and Survey (Paper by X. Défago, A. Schiper, and P. Urbán) ACM computing Surveys, Vol. 36,No 4,

Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.

Consensus and Its Impossibility in Asynchronous Systems.

Lecture 4: Sun: 23/4/1435 Distributed Operating Systems Lecturer/ Kawther Abas CS- 492 : Distributed system & Parallel Processing.

Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.

1 The Byzantine Generals Problem Leslie Lamport, Robert Shostak, Marshall Pease Presented by Radu Handorean.

TTP and FlexRay. Time Triggered Protocols Global time by fault tolerant clock synchronisation Exact time point of a certain message is known (determinism)

Intrusion Tolerant Software Architectures Bruno Dutertre and Hassen Saïdi System Design Laboratory, SRI International OASIS PI Meeting.

Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.

A. Haeberlen Fault Tolerance and the Five-Second Rule 1 HotOS XV (May 18, 2015) Ang Chen Hanjun Xiao Andreas Haeberlen Linh Thi Xuan Phan Department of.

Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.

Langley Research Center Why is SPIDER Design Assurance based on Formal Methods? Paul S. Miner NASA Langley Internal Formal Methods.

UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department

PROACTIVE SECRET SHARING Or: How to Cope With Perpetual Leakage Herzberg et al. Presented by: Avinash Ravi Kevin Skapinetz.

Why Do Airplanes Crash? Investigating Air Data Inertial Reference Units Department of Electrical and Computer Engineering INTRODUCTION Modern aircraft.

Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.

Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb

DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.

1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.

Langley Research Center An Architectural Concept for Intrusion Tolerance in Air Traffic Networks Jeffrey Maddalon Paul Miner {jeffrey.m.maddalon,

Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.

1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.

Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.

EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Distributed Processing Election Algorithm

Outline Distributed Mutual Exclusion Distributed Deadlock Detection

Distributed Consensus

Agreement Protocols CS60002: Distributed Systems

Distributed Consensus

PERSPECTIVES ON THE CAP THEOREM

EEC 688/788 Secure and Dependable Computing

Consensus in Synchronous Systems: Byzantine Generals Problem

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Byzantine Generals Problem

Presentation transcript:

1 The Formal Verification of SPIDER Lee Pike Department of Computer Science Indiana University, Bloomington

1 Thanks to ● Steven Johnson, Indiana University, Bloomington ● The National Institute of Aerospace ● The NASA LaRC Formal Methods Team, especially Paul Miner

1 Overview ● SPIDER Overview ● Reasoning about Faults ● The Old vs. New Interactive Consistency (IC) Protocol ● SPIDER Formal Verification Goals & Future Work ● References

1 SPIDER Overview Why? ● Develop a fault-tolerant architecture based on an ultra-reliable bus ● Scalable ● Handle a large number of possibly-simultaneous faults, specifically transient faults from electromagnetic effects. ● Provide reintegration services ● Case study for the FAA ● Developed in accordance with RTCADO-254: Design Assurance Guidance for Airborne Electronic Hardware. ● Provide a test-bed for techniques in the specification and verification of safety-critical electronic systems. These sort of architectures are the foundation of tomorrow's X-by wire safety-critical systems.

1 SPIDER Overview What? ● Scalable Processor-Independent Design for Electromagnetic Resilience

1 SPIDER Overview What? ● Scalable Processor-Independent Design for Electromagnetic Resilience ● Processor Elements (PEs) PE

1 SPIDER Overview What? ● Scalable Processor-Independent Design for Electromagnetic Resilience ● Processor Elements (PEs) ● Reliable Optical BUS (ROBUS) ● Time Division Multiple Access (TDMA) bus ● Maintains Synchrony between PEs. ● Prevents Babbling Idiots & PE-to-PE interference ● The services of the ROBUS are the focus of the verification effort. ROBUS PE

1 ROBUS Overview Topology BIU1 to PE BIU2 BIU3 RMU1 RMU2 ROBUS RMU3 ● n Bus Interface Units (BIUs) ● m Redundancy Management Units (RMUs) ● The BIUs and RMUs are called nodes. ● Every BIU and RMU is directly connected. ● No two BIUs are directly connected. Similarly for the RMUs.

1 ROBUS Overview Services (Protocols) ● Interactive Consistency Purpose: Reliably broadcast messages between PEs. ● Clock Synchronization Purpose: Maintain synchrony between all nodes and PEs. ● Distributed Diagnosis Purpose: Convict faulty nodes in the ROBUS. The focus of this talk is Interactive Consistency.

1 Global Fault Classifications ● Good Not faulty node d d d

1 Global Fault Classifications ● Good Not faulty ● Benign Broadcasts only detectably faulty messages node garbag e

1 Global Fault Classifications ● Good Not faulty ● Benign Broadcasts only detectably faulty messages ● Symmetric Broadcasts the same arbitrary message to all node d'

1 Global Fault Classifications ● Good Not faulty ● Benign Broadcasts only detectably bad messages ● Symmetric Broadcasts the same arbitrary message to all ● Asymmetric (Byzantine) Arbitrarily sends arbitrary messages node d' d'' d

1 Local Fault Information Each Node Maintains ● Accusations A node accuses other nodes based on the messages it receives as well as indirect information.

1 Local Fault Information Each Node Maintains ● Accusations A node accuses other nodes based on the messages it receives as well as indirect information. ● Convictions Periodically, the distributed diagnosis protocol is executed; nodes exchange accusations to produce convictions. ● NOTE: While a good node knows that all good nodes have the same convictions, it does not know that all good nodes have the same accusations.

1 Local Fault Information Each Node Maintains ● Accusations A node accuses other nodes based on the messages it receives as well as indirect information. ● Convictions Periodically, the distributed diagnosis protocol is executed; nodes exchange accusations to produce convictions. ● NOTE: While a good node knows that all good nodes have the same convictions, it does not know that all good nodes have the same accusations. ● Eligible Voters For each BIU, the set of RMUs that it neither accuses nor convicts. Similarly for each RMU.

1 Interactive Consistency Protocol External View ● Purpose: Reliably communicate data between processing elements (PEs) over the ROBUS. ROBUS PE

1 Interactive Consistency Protocol External View ● A PE sends its data to the ROBUS. ROBUS PE sende r data in

1 Interactive Consistency Protocol External View ● The IC Protocol is executed in the ROBUS. ROBUS PE...IC Protocol...

1 Interactive Consistency Protocol External View ● The ROBUS broadcasts data back out to the PEs. ROBUS PE sende r data out...IC Protocol... data out

1 Old Interactive Consistency Protocol Internal View BIU1 to PE BIU2 BIU3 RMU1 RMU2 sender data in ROBUS RMU3

1 1. A BIU broadcasts data to the RMUs. If the BIU is good, the same value is broadcast to all RMUs. BIU1 to PE BIU2 BIU3 RMU1 RMU2 sender data in ROBUS data RMU3 data

1 2. For each good RMU, if it receives data that isn't detectably faulty, then it passes the data received back to each BIU. Otherwise, source_error is sent. BIU1 to PE BIU2 BIU3 RMU1 RMU2 ROBUS RMU3 similarly for RMUs 2 and 3 data or source_err or data or source_err or data or source_err or RMU1 good

1 3. Each BIU eliminates from its EV those RMUs that sent detectably faulty messages. BIU1 to PE BIU2 BIU3 RMU1 RMU2 ROBUS RMU BIUs 2 and 3 do likewise d d garbage RMU1 good RMU2 benign faulty

1 4. For each BIU, it votes on the majority data sent from each RMU in its EV. BIU1 to PE BIU2 BIU3 RMU1 RMU2 ROBUS RMU BIUs 2 and 3 do likewise vote = d d d

1 5. IF the majority of RMUs sent the same data, then it is sent to the BIU's PE. ELSE source_error is sent to the BIU's PE. BIU1 to PE BIU2 BIU3 RMU1 RMU2 ROBUS RMU3 BIUs 2 and 3 similarly send data vote = d d

1 IC Protocol Guarantees ● Validity If the broadcasting BIU is good, not convicted, and sends data d, then the result of the vote for a good BIU is be d. ● Agreement Any two good BIUs vote the same result for the broadcasted value (even if the sender is asymmetric!).

1 Old Assumptions to ensure guarantees hold Environment Assumptions The Maximum Fault Assumption (MFA): 1. There are more good BIUs than symmetric + asymmetric BIUs. 2. Similarly for the RMUs. 3. There are either no asymmetric BIUs or no asymmetric RMUs.

1 Old Assumptions to ensure guarantees hold Environment Assumptions The Maximum Fault Assumption (MFA): 1. There are more good BIUs than symmetric + asymmetric BIUs. 2. Similarly for the RMUs. 3. There are either no asymmetric BIUs or no asymmetric RMUs. System Assumptions ● Symmetric Agreement If a node is not asymmetric, then all good nodes assign it the same accusation. ● Good Trusting Good nodes aren't accused by good nodes. ● Conviction Agreement All good nodes have the same convictions.

1 Validity Proof Sketch Assume the broadcasting BIU is good and sends data d. BIU1 BIU2 BIU3 RMU1 RMU2 sender good ROBUS d d RMU3 d

1 Validity Proof Sketch BIU1 BIU2 BIU3 RMU1 RMU2 ROBUS RMU3 similarly for RMUs 2 and 3 d d d Thus, all good RMUs send d back to the BIUs. RMU1 good

1 Validity Proof Sketch BIU1 BIU2 BIU3 RMU1 RMU2 ROBUS RMU d d Each good BIU filters out the bad messages received. By the MFA, most of its EV then contains good RMUs. garbage similarly for BIUs 2 and 3

1 Validity Proof Sketch BIU1 BIU2 BIU3 RMU1 RMU2 ROBUS RMU vote = d d d Since all good RMUs sent d, the result of the vote yields d. q.e.d.

1 Agreement Proof Sketch BIU1 BIU2 BIU3 RMU1 RMU2 sender asym ROBUS d d'' RMU3 d' Either the broadcasting BIU is asymmetric or not. Suppose it is.

1 Agreement Proof Sketch ROBUS BIU1 BIU2 BIU3 RMU1 RMU2 RMU3 Then no RMU is asymmetric, by the MFA. So every RMU sends the same data to every BIU x z y BIUs 2 and 3 receive the same values

1 Agreement Proof Sketch ROBUS BIU1 BIU2 BIU3 RMU1 RMU2 RMU3 Since no RMU is asymmetric, by symmetric trusting, the EV of each BIU is the same. Thus, the result of the vote for each BIU is the same x z y BIUs 2 and 3 receive the same values

1 Agreement Proof Sketch BIU1 BIU2 BIU3 RMU1 RMU2 sender not asym ROBUS d d RMU3 d For the other case, suppose the sending BIU is not asymmetric.

1 Agreement Proof Sketch ROBUS BIU1 BIU2 BIU3 RMU1 RMU2 RMU3 Most of the RMUs are good, by the MFA. Since all good RMUs received the same values, they send the same values. RMU1 good RMU3 good BIU1 good BIU3 good x x

1 Agreement Proof Sketch ROBUS BIU1 BIU2 BIU3 RMU1 RMU2 RMU3 By good trusting, no good BIU accuses a good RMU. Since most RMUs are good, there are a majority of good RMUs in the EV of each good BIU, after filtering benign RMUs. RMU1 good RMU3 good 2 1 3x 2 1 3x x x BIU1 good BIU3 good

1 Agreement Proof Sketch ROBUS BIU1 BIU2 BIU3 RMU1 RMU2 RMU3 Thus, the result of the votes will be the same for all good BIUs. q.e.d. RMU1 good RMU3 good 2 1 3x 2 1 3x x x BIU1 good BIU3 good

1 New Assumptions to reason about reintegration Environment Assumptions The Dynamic Maximum Fault Assumption (DMFA): 1. For each good BIU, its EV consists of more good RMUs than symmetric + asymmetric RMUs. 2. Similarly for good RMUs. 3. Either no asymmetric RMU is in the EV of a good BIU or no asymmetric BIU is in the EV of a good RMU.

1 New Assumptions to reason about reintegration Environment Assumptions The Dynamic Maximum Fault Assumption (DMFA): 1. For each good BIU, its EV consists of more good RMUs than symmetric + asymmetric RMUs. 2. Similarly for good RMUs. 3. Either no asymmetric RMU is in the EV of a good BIU or no asymmetric BIU is in the EV of a good RMU. System Assumptions ● Symmetric Agreement If a node is not asymmetric, then all good nodes assign it the same accusation. ● Good Trusting Good nodes aren't accused by good nodes. ● Conviction Agreement All good nodes have the same convictions.

1 Agreement Breaks! Under the New Assumptions (courtesy of Wilfredo) ROBUS BIU1 BIU2 BIU3 RMU1 RMU2 RMU3 Suppose the sender is asymmetric, but is in no EV of all good RMUs. Suppose there is an asymmetric RMU in the EV of both good BIUs. This satisfies the DMFA. asym good & accuses BIU2 good & accuses BIU2 sender asym d d' d'' good & trusts all

1 Agreement Breaks! Under the New Assumptions ROBUS BIU1 BIU2 BIU3 RMU1 RMU2 RMU3 The two good RMUs relay the values received, and since RMU3 can relay arbitrary data, it sends d to BIU1 and d' to the other. asym good & accuses BIU2 good & accuses BIU2 sender asym d d d d' good & trusts all d'

1 Agreement Breaks! Under the New Assumptions ROBUS BIU1 BIU2 BIU3 RMU1 RMU2 RMU3 The result of the votes of BIU1 and BIU2 differ. Agreement is violated! asym good & accuses BIU2 good & accuses BIU2 sender asym d d d d' good & trusts all d' vote = d vote = d'

1 Revised IC Protocol In the new IC Protocol, the RMUs relay source_error when ● They receive bad messages and ● They accuse the sender.

1 Revised IC Protocol In the new IC Protocol, the RMUs relay source_error when ● They receive bad messages and ● They accuse the sender. The revised IC protocol satisfies both validity and agreement (verified in PVS).

1 Formal Verification Why Level 3 Verification? ● A math proof is proof enough, right? ● Level 3 verification can require significant time to complete. In other words...

1 Using PVS

1 Formal Verification Why Level 3 Verification? ● A math proof is proof enough, right? ● Level 3 verification can require orders of magnitude more time to complete than level 1 or level 2 verification. But... ● Proofs for fault-tolerant protocols for distributed architectures are tedious and large (there are nearly 400 lemmas & theorems in our current unfinished set of proofs). ● Proofs are not checked by a community of mathematicians like other mathematical results are. In other words...

1 You don't have to be a Laurel or Hardy to make an oversight in an informal proof. Small changes in assumptions can obviate guarantees.

1 Some Goals & Current Work in verifying SPIDER ● Robust Specifications/Proofs ● Hold for arbitrary configurations of SPIDER ● Hold for all accusation & conviction policies satisfying the system requirements

1 Some Goals & Current Work in verifying SPIDER ● Robust Specifications/Proofs ● Hold for arbitrary configurations of SPIDER ● Hold for all accusation & conviction policies satisfying the system requirements ● Specification/Proof “Reuse” (Economic specs/proofs)

1 Some Goals & Current Work in verifying SPIDER ● Robust Specifications/Proofs ● Hold for arbitrary configurations of SPIDER ● Hold for all accusation & conviction policies satisfying the system requirements ● Specification/Proof “Reuse” (Economic specs/proofs) ● Specification/Proof Hierarchy ● Property specifications ● Relational specifications ● Functional composition specifications ● State machine specifications

1 References ● SPIDER Homepage: ● PVS Homepage: ● Butler, Ricky et al. NASA Langley's Research and Technology-Transfer Program in Formal Methods Available athttp://shemesh.larc.nasa.gov/fm/fm- welcome.html. ● Rushby, John. Formal Methods and Digital Systems Validation for Airborne Systems. NASA Contractor Report Available at: