Slide 1 ITUA Not for public distribution. Intrusion Tolerance by Unpredictable Adaptation Presented by Partha Pal and William Sanders OASIS PI Meeting,

Slides:



Advertisements
Similar presentations
Chapter 8 Fault Tolerance
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Byzantine Generals Problem: Solution using signed messages.
OASIS Reference Model for Service Oriented Architecture 1.0
Apr 2, 2002Mårten Trolin1 Previous lecture On the assignment Certificates and key management –Obtaining a certificate –Verifying a certificate –Certificate.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CS 582 / CMPE 481 Distributed Systems
1 Lecture 22: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA’03, Wisconsin A Low Overhead Fault Tolerant Coherence.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Group Communication using Ensemble Part II. 2 Introduction From previous tutorial: Ensemble’s application interface: Concepts of Group Membership, View,
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Composition Model and its code. bound:=bound+1.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering.
Lab 1 Bulletin Board System Farnaz Moradi Based on slides by Andreas Larsson 2012.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
CH2 System models.
DSN 2002 June page 1 BBN, UIUC, Boeing, and UM Intrusion Tolerance by Unpredictable Adaptation (ITUA) Franklin Webber BBN Technologies ParthaPal.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Distributed Transactions Chapter 13
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Practical Byzantine Fault Tolerance
CoBFIT: A component-Based Framework for Intrusion Tolerance Author: HariGovind V. Ramasamy Adnan Agbaria William H. Sanders Presented by: Keqiang Zhu.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Intrusion Tolerant Software Architectures Bruno Dutertre and Hassen Saïdi System Design Laboratory, SRI International OASIS PI Meeting.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
November NC state university Group Communication Specifications Gregory V Chockler, Idit Keidar, Roman Vitenberg Presented by – Jyothish S Varma.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
V1.7Fault Tolerance1. V1.7Fault Tolerance2 A characteristic of Distributed Systems is that they are tolerant of partial failures within the distributed.
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
Replication and Group Communication. Management of Replicated Data FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM Instructor’s.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Group Communication Theresa Nguyen ICS243f Spring 2001.
Slide 1 2/22/2016 Policy-Based Management With SNMP SNMPCONF Working Group - Interim Meeting May 2000 Jon Saperia.
Fault Tolerance Chapter 7. Goal An important goal in distributed systems design is to construct the system in such a way that it can automatically recover.
PROCESS RESILIENCE By Ravalika Pola. outline: Process Resilience  Design Issues  Failure Masking and Replication  Agreement in Faulty Systems  Failure.
Fault Tolerance (2). Topics r Reliable Group Communication.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
Coordination and Agreement
Intrusion Tolerant Architectures
Intrusion Tolerance by Unpredictable Adaptation
Outline Announcements Fault Tolerance.
Active replication for fault tolerance
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed Systems CS
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

Slide 1 ITUA Not for public distribution. Intrusion Tolerance by Unpredictable Adaptation Presented by Partha Pal and William Sanders OASIS PI Meeting, August 21, 2002

Slide 2 ITUA Not for public distribution. Outline What is ITUA? Status report Technical updates –Decentralized management –Intrusion tolerant gateway Future plans Accomplishments

Slide 3 ITUA Not for public distribution. What is ITUA Development of Intrusion Tolerance Technology Architecture ValidationTech Transfer Intrusion Tolerance at Multiple Levels IT Gateway IT GCS Loops Application Level Range of Adaptive Response: Local & Rapid to Coordinated Replace corrupt replica Isolate corrupt security domains Block IP add./ Restore file Select application objects Theme of Unpredictability Restore: just a file? Tree? Placement of new replica Block? Drop? Timed? Which application object? An architecture where ITUA tolerance technologies are integrated Create Scientific Basis for Probabilistically Quantifying Survivability Validate ITUA Technologies Probabilistic Evaluation Measurement Internal Red Teaming Boeing’s IEIST Application Army CECOM SMS OASIS Dem/Val Convict corrupt replica

Slide 4 ITUA Not for public distribution. Status Report Started July 2000 End Date: December 2003 Approximately 65 percent done Ongoing tasks –Completion of the decentralized manager implementation –Combining IT replication, group membership, and reliable multicast into intrusion-tolerant gateway –Including more IEIST Components (Boeing) in the integrated demonstration –Validation: Methodology development and evaluation of ITUA technology

Slide 5 ITUA Not for public distribution. Technical Update 1: Decentralized Redundancy Management Initial Concept Current Approach S S S M M M M M M M M M Scalable, two stage information dissemination is more corruption prone, and harder to analyze Elimination of subordinate groups makes it simpler Scalability of multicast need not be fused with the issue of surviving corruption Domain 1 Domain 2 Domain 3 Domain 1 Domain 2 Domain 3

Slide 6 ITUA Not for public distribution. Manager: one per host; each consists of two components: Replication Controller and Security Adviser Hosts are organized into “security domains” consisting of hosts that are at risk to be compromised together. Managers communicate to each other through a manager group composed of all managers. Manager Group Domain 1Domain 2Domain 4Domain 3 Manager Organizing Management Components M M M M M M M M M HOST

Slide 7 ITUA Not for public distribution. Manager Features Decentralized management of redundant resources: –Not possible to compromise management function completely unless a majority of the managers are corrupt –No notion of a leader: probabilistic and consensus based algorithms Coordinated response, but actions are taken locally: –A manager can start and stop replicas on its own machine –A manager may place a “suspected” manager in its own purgatory Validation study: one manager per domain yields better survivability characteristics

Slide 8 ITUA Not for public distribution. The Notion of Purgatory A facility/abstraction to temporarily prevent replica’s started by a suspected manager from joining a replication group –Suspicion: bad behavior, but w/o sufficient proof Timing delays in a partially synchronous model –If M1 suspects M2, M2 is in M1’s purgatory –If enough managers put M2 in their purgatory, replicas started by M2 cannot join a replication group –If a replica R owned by M2 misbehaves (and enough managers agree), R will be excluded by the IT GCS, and M2 will be asked to remove R –If M2 does not oblige, it becomes a provably bad behavior on M2’s part Eliminate M2’s domain Different times for different crimes: past actions and the type of may bring different sentences. By not (always) permanently ostracizing a domain, temporary non-malicious problems do not remove domains permanently

Slide 9 ITUA Not for public distribution. Signed Messages and Provable Bad Behavior Intrusion-Tolerant Gateway reports misbehavior by signed messages: –If a replica becomes “corrupt,” owners of remaining replicas will be notified Each owner (the manager that started that replica) receiving the notification verifies the signature, and multicasts a signed message to the manager group –A manager is expected to receive independent corroboration of the event vouched for by multiple owners –By checking the message content and signature, it is possible to accurately identify if a manager is injecting bad data Signing but not checking values: corrupt manager can pass on bad information Checking values but not signing: anyone can pass bad information Domain of a manager caught this way will be isolated –ITUA does not currently considers replenishing dropped domains ITUA assumes a Trusted CA

Slide 10 ITUA Not for public distribution. Probabilistic Replica Start Algorithm When a member R of a replication group G leaves (removed by managers/crashes), managers that don’t have a member of G become “available” to start one An available manager M decides to start a replica with probability p where p= 1/((#domains without a replica) * (# managers in M’s domain)), if M’s domain does not have a member of G p = 0, otherwise If M decides to start a replica, it will consult the trusted CA to obtain the new replica’s credentials, and multicast its intention and the replica’s public key. Private key will be handed to the new replica process via stdin when it starts after the mcast operation returns Other managers with members of G receiving this message, will instruct replicas to admit the newcomer if M is not in their purgatory and M is the first to propose in M’s domain Note: multiple managers in a domain may propose, multiple replicas may start in multiple domains, but on average one will be started, if none starts within a specified period, retry

Slide 11 ITUA Not for public distribution. Tolerated Manager Failures N: set of domains in the system F: set of faulty domains M(X): number of managers in domain set X We assume that loss of faulty domains does not cause the system to stop operating: i.e., – is “small”, e.g.. – and In order to successfully multicast to the manager group, more than 2/3 of the participating managers need (gateway level) to be non-faulty. Given these, manager algorithms (above the gateway level) will operate correctly as long as a majority of participating managers are non-faulty.

Slide 12 ITUA Not for public distribution. Technical Update 2: Intrusion Tolerant Gateway ITUA IT Gateway –Provides CORBA interface to applications. –Implements specific communication strategies (handlers). –Adaptable to multiple GCS implementations. ITUA IT GCS – Provides the necessary multicast and group membership properties. ORB Naming Service Gateway Application Object Protocol stack DII Processor Handler Factory Replication Group Factory Handler Type A Communication Strategy Handler Type B Communication Strategy Group Member GCS Adaptation Layer Intrusion-Tolerant GCS Replication Group Connection Group

Slide 13 ITUA Not for public distribution. Infrastructure Details Group Member –Communicates with the handlers and DII Processor in the Gateway above and GCS Adaptor below. –Replication/Connection group members derived from this class –Facilitates secure state transfer Three components –Sending Processor –Receiving Processor –Secure State Transfer Processor GCS Adaptation Layer acts as interface to the GCS below –The gateway is designed to work on top of several GCSs providing reliable delivery, group membership and total order. Replication Group Connection Group GCS Adaptation Layer Sending Processor Receiving Processor Secure State Transfer Processor Group Communication System

Slide 14 ITUA Not for public distribution. IT Handler Algorithm L Step 1 – Multicast signed messages in replication group Step 2 – Multicast in connection group, with proof (2f+1 signatures) Step 3 – Validate Proof – Multicast in replication group to provide total order L

Slide 15 ITUA Not for public distribution. Example: PseudoCode-Message Arrival Step 2 FromConnectionCastStep2(m) if(VerifyProof(m)) if(m.source == myRepGroup) MulticastLatencyTimer.stop() if(MajorityReached(m)) MajorityBuffer.remove(m) else MajorityDelayBuffer.store(m) else if(isLeader()) ReplicationGroupMulticast(m) elseif(McastDelayBuf.find(m)) McastDelayBuf.remove(m) Deliver(m) else totalOrderBuffer.add(m) RebroadcastTimer.start() Timers used to ensure that leaders broadcast in a timely manner – They prevent leaders from being able to stall the protocol. Buffers used to ensure that messages are delivered in a consistent total order even if a leader fails. Proofs are used to ensure that a leader cannot falsely invoke a request in another group without consensus from the replication group.

Slide 16 ITUA Not for public distribution. Fault Reporting in Handler Vote for a request that never reaches a majority. Never sent votes for a particular message (step 1). Invalid signature accompanying replication group broadcast in step 1. Failure to include sufficient proof in a connection group cast in step 2. Sending a message that does not match the majority of requests generated by other replicas. Leader sends a broadcast with an incorrect message or an out of sequence message in step 3. Report_Error(BAD_VOTE, msgSeqNo, ReplicaID) Report_Error(NO_VOTE, msgSeqNo, ReplicaID) Report_Error(BAD_SIGNATURE, msgSeqNo, ReplicaID) Report_Error(BAD_PROOF, msgSeqNo, ReplicaID) Report_Error(BAD_SIGNATURE, msgSeqNo, ReplicaID)

Slide 17 ITUA Not for public distribution. Recent Accomplishments/Next Steps Technology Development –Interaction of Security Adviser- Replication Controller in manager –Key handoff between Management level and gateway/GCS –Next-generation IT GCS and its integration Validation (more in next presentation) –Model based studies, Whiteboard/red team (internal) type experiment Transition –IEIST: Addition of intrusion tolerant capabilities to increase the survivability of the fighter guardian agent –SMS and Dem/Val Papers/Demos/Reports: –DSN 2002 – Full Paper on GCS – Validation Fast Abstract – Workshop papers on Gateway and ITUA architecure – DSN Red Teaming Session – DARPA Tech Demo – Validation Report (in final review) – Pacific Rim: Full Paper on formal verification of group membership protocol (to appear)