EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing
Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Maulik no show 10/5/2009

EEC688/788: Secure & Dependable Computing
Outline Reminder Deadline for lab report: Thursday 7/6 in class (hardcopy preferred) Deadline for project proposal submission: Thursday 6/29 midnight Data and Service Replication Group communication systems Reliable, ordered multicast Types of total ordering GCS services How to implement GCS 1/18/2019 EEC688/788: Secure & Dependable Computing

Data and Service Replication
Replication resorts to the use of space redundancy to achieve high availability Instead of running a single copy of the service, multiple copies are used Usually deployed across a group of physical nodes for fault isolation Data and service replication Usually use different approaches Transactional data replication Optimistic replication (omitted) Balance consistency and performance: CAP theorem (omitted)

Data and Service Replication
Service replication: State machine replication Each replica is modeled as a state machine: state, interface, deterministic state change via interface Replica consistency issue: coordination needed Total order of requests to the server replicas Sequential execution of requests Data replication: Direct access on data Operation on data: read or write Context: transaction processing => concurrent access to replicated data essential

Service Replication State is encapsulated
Clients interact with exported interfaces (APIs) Replication algorithm used to coordinate replicas (for consistency) Fault tolerance middleware

Replication Styles Active replication Every input (request) is executed by every replica Every replica generates the outputs (replies) Voting is needed to cope with non-fail-stop faults Passive replication One of the replicas is designated as the primary replica Only the primary replica executes requests The state of the primary replica is transferred to the backups periodically or after every request processing Semi-active replication One of the replicas is designated as the leader (or primary) The leader determines the order of execution Every input is executed by every replica per the leader’s instruction 1/18/2019 EEC688/788: Secure & Dependable Computing

Active Replication Actively Replicated Client Object A Actively Replicated Server Object B Duplicate Invocation Suppressed Duplicate Responses Suppressed RM RM RM RM RM 1/18/2019 EEC688/788: Secure & Dependable Computing

Active Replication with Voting
Question: to cope with f number of faults (non-malicious), how many replicas are needed? 1/18/2019 EEC688/788: Secure & Dependable Computing

Passive Replication Passively Replicated Client Object A Passively Replicated Server Object B Primary Replica Primary Replica Response Invocation State Transfer State RM RM RM RM RM Question: can passive replication tolerate non-fail-stop faults? 1/18/2019 EEC688/788: Secure & Dependable Computing

Semi-Active Replication
Semi-Actively Replicated Client Object A Semi-Actively Replicated Server Object B Primary Replica Primary Replica Response Invocation Ordering info RM RM RM RM RM 1/18/2019 EEC688/788: Secure & Dependable Computing

Implementation of Service Replication: Ensuring Strong Replica Consistency For active replication, use a group communication system or a consensus algorithm that guarantees total ordering of all messages (plus deterministic processing in each replica) Passive replication with systematic checkpointing Semi-active replication Use two-phase commit 1/18/2019 EEC688/788: Secure & Dependable Computing

Total Ordering of Messages
What is total ordering of messages? All replicas receive the same set of messages in the same order Atomic multicast – If a message is delivered to one replica, it is also delivered to all non-faulty replicas With replication, we need to ensure total ordering of messages sent by a group of replicas to another group of replicas FIFO ordering between one sender and a group is not sufficient m1 m2 1/18/2019 EEC688/788: Secure & Dependable Computing

Potential Sources of Non-determinisms
Multithreading The order of accesses of shared data by different threads might not be the same at different replicas System calls/library calls A call at one replica might succeed while the same call might fail at another replica. E.g., memory allocation, file access Host/process specific information Host name, process id, etc. Local clocks - gettimeofday() Interrupts Delivered and handled asynchronously – big problem Not required 1/18/2019 EEC688/788: Secure & Dependable Computing

Data Replication Transactional data replication
Read/write ops on a set of data items within the scope of a transaction At the transaction level, executions appear to be sequential (One-copy serializable) Actual ops on each data item often concurrent Optimistic data replication Eventual consistency: eventually, all updates will be propagated to all data items

Transactional Data Replication
One-copy serializable A transactional data replication algorithm should ensure that the replicated data appear to the clients as a single copy The interleaving of the execution of the transactions be equivalent to a sequential execution of those transactions on a single copy of the data. Make read ops cheaper than updates: read ops are more prevalent It is challenging to design sound replication algorithms

Wrong Data Replication Algorithms
Write-all A read op on a data item x can be mapped to any replica of x Write on x must be applied to all replicas of x Problem: what if a replica becomes faulty? Blocking! Any single replica fault => bring down the entire system!

Wrong Data Replication Algorithms
Write-all-available A read op on a data item x can be mapped to any replica of x Write on x is applied to available replicas of x Problem: cannot ensure one-copy serializable execution!

Attempting to Fix Write-All-Available
Problem caused by accessing the not-fully-recovered replica => how about preventing this? Still won’t work Ti does not precedes Tj because Tj reads y before Ti writes to y Tj does not precedes Ti because Ti reads x before Tj writes to x Ti: R(x), W(y) Tj: R(y), W(x) Hence, Ti and Tj are not serializable!

Insight to the Problem The problem is caused by the fact that conflicting operations are performed at difference replicas We must prevent this from happening A solution: use quorum-based consensus What is a quorum? Given a system with n processes, a quorum is formed by a subset of the processes in the system Any two quorums must intersect in at least one process Read quorum: a quorum formed for read ops Write quorum: a quorum formed for write ops

A Quorum-Based Replication Algorithm
Basic idea: Write ops apply to a write quorum Read ops apply to a read quorum Fault tolerance: given total number replicas N and write quorum size W (>= read quorum size R), can tolerate up to N-W failures Quorum rule Each replica assigned a positive weight, e.g., 1 A read quorum has a min total weight RT A write quorum has a min total weight WT RT+WT > total weight && 2WT > total weight What if RT=1? WT would include all replicas => not fault tolerant!

Since update is applied to a quorum of replicas, we need to track which replica has the latest value => use version numbers Version number is incremented after each update Read rule A read on data x is mapped to a read quorum replicas of x Each replica returns both the value of x and its version number The client select the value that has the highest version number

Write rule A write op on data x is mapped to a write quorum replicas of x First, retrieve version numbers from the replicas, set v=vmax+1 for this write op Write to the replicas (in the write quorum) with new value and version # v. A replica overwrites both the value and version number v

Quorum-Based Replication Algorithm: Example

Group Communication System
Services provided by the GCS Membership service: who is up and who is down Deals with failure detection and more Reliable, totally ordered, multicast service Virtual synchrony service Virtual synchrony synchronizes membership change with multicasts GCS makes the implementation of state machine replication much easier 1/18/2019 EEC688/788: Secure & Dependable Computing

Main Approaches to Total Ordering
Sequencer based: One of the nodes in the membership is designated the task of assigning a global sequence number (representing the total order) of each application message Fixed sequencer Rotating sequencer Sender based: the nodes in the membership take turn to multicast => all multicast msgs are naturally totally ordered Use a virtual token to be passed around the nodes Vector clock based: The causal relationship among different messages can be captured using vector clocks Each message that is multicast is piggybacked with a vector timestamp 1/18/2019 EEC688/788: Secure & Dependable Computing

System Model An asynchronous system with N nodes that communicate with each other directly by sending and receiving messages A node may become faulty and stop participating the group communication protocol (i.e., a fail-stop fault model is used) A failed node might recover. However, it must rejoin the system via a membership change protocol We assume a closed, single group system: foreign msgs are ignored 1/18/2019 EEC688/788: Secure & Dependable Computing

Protocol Design A group communication system must define two protocols: One for normal operation when all nodes in the current membership can communicate with each other in a timely fashion The other for membership change when one or more nodes are suspected as failed, or when the failed nodes are restarted These protocols work together to ensure the safety properties and the liveness property of the group communication system 1/18/2019 EEC688/788: Secure & Dependable Computing

Protocol Design Liveness: a nonfaulty node multicasts a message, it will eventually be delivered in a total order at other nodes For a message loss, it is addressed by retransmission Node failures, extended delay in processing, and message propagations, are addressed by membership reconfigurations (i.e., view changes) 1/18/2019 EEC688/788: Secure & Dependable Computing

Two Types of Total Ordering
Uniform total ordering Given any msg that is broadcast, if it is delivered by a node according to some total order, it is delivered in every other node in the same total order unless the node has failed Nonuniform total ordering Given a set of messages that have been broadcast and totally ordered, no node delivers any of them out of the total order. However, there is no guarantee that if a node delivers a message, then all other nodes deliver the same message. 1/18/2019 EEC688/788: Secure & Dependable Computing

Example 1/18/2019 EEC688/788: Secure & Dependable Computing

Implementing Total Ordering
Use a sequencer to order all multicast Sequencer determines the order for the message Each sender can take turn to serve as the sequencer (rotating sequencer) Use a token that moves around Token has a sequence number Sender determines the total order: when you hold the token you can send the next burst of multicasts Use vector clocks Each process maintains a vector clock Each msg is piggybacked with a vector timestamp 1/18/2019 EEC688/788: Secure & Dependable Computing

Sequencer Based GCS First practical solution for GCS A system is structured into a combination of two subsystems Multiple senders with a single receiver A single sender with multiple receivers The single receiver and single sender are collocated at the same node => all msgs are funneled through this node, i.e., sequencer 1/18/2019 EEC688/788: Secure & Dependable Computing

Sequencer Based GCS The sequencer is responsible to assign a global sequence number to each message funneled Each node deliver a msg if it has received and delivered all msgs with smaller sequence numbers Sequencer: a single point of failure Rotating sequencer: overcoming single point of failure Assume up to f nodes could fail, total number of nodes N > 2f Each node takes turn to act as a sequencer (e..g, one msg at a time) A node does not deliver a msg until it receives f+1 sequencing msgs Achieves fault tolerance as well as uniform total ordering 1/18/2019 EEC688/788: Secure & Dependable Computing

Rotating Sequencer: Data Structure
View number v, list of node ids in the current view Each node has a rank: it knows when it should take over as the next sequencer A local sequence number vector M[], each element representing the expected local seq # for the corresponding node: for reliable delivery M[i] refers to the expected local seq# carried by the next msg sent by node i Init each element to 0 Expected global seq# s carried in the next sequencing msg sent by the sequencer node: for total ordering 1/18/2019 EEC688/788: Secure & Dependable Computing

Rotating Sequencer: Normal Operation
Transmitting phase A node i broadcasts a msg, B(v,i,n), to all nodes n: local seq#, initial 0, incremented for each msg broadcast => reliable broadcast Waits for a sequencing msg for the broadcast msg A node j accepts a msg B(v,i,n) if it is in the same view and buffer it Sequencing phase Committing phase 1/18/2019 EEC688/788: Secure & Dependable Computing

Sequencing phase When the sequencer receives a broadcast msg B(v,i,n) It verifies that it is the next expected msg from node i, M[i] = n Assigns the current global seq# s to B(v,i,n) Broadcasts a sequencing msg: SEQ(s,v,[i,n]) When a node j receives SEQ(s,v,[i,n]), it accepts it provided S is the expected global seq# It has B(v,i,n) in its buffer, otherwise, request retransmission Updates its data structures: Increment expected global seq# Increment expected local seq# SEQ(s,v,[i,n]) also serves as positive ack for broadcast msg B(v,i,n) 1/18/2019 EEC688/788: Secure & Dependable Computing

Committing phase A node does not deliver a broadcast msg B(v,i,n) until it receives SEQ(s,v,[i,n]) and f subsequent SEQ msgs Ensuring uniform total ordering Even if f nodes failed, at least one node would have received both B(v,i,n) and SEQ(s,v,[i,n]) This node ensures that the message is delivered at other nodes in the same total order How to transfer the sequencer role The transfer of the sequencer role can be achieved implicitly by the sending of a new sequencing message The next node i assumes the sequencer role when it receives a SEQ(s) msg and the following conditions are met (s+1)%N=i It has received all previous SEQ msgs and B msgs What if no one broadcasts B msgs, sequencer sends null SEQ msgs 1/18/2019 EEC688/788: Secure & Dependable Computing

Normal Operation: Example
N=5, f=1 Can a node delivers B as soon as it receives the corresponding SEQ msg? When B(v,4,20) will be delivered? 1/18/2019 EEC688/788: Secure & Dependable Computing

Rotating Sequencer: Membership Change
A membership change is triggered by The detection of a failure. A node fails to receive the corresponding SEQ msg for its B msg => sequencer failed The recovery of a failed node. When a node recovers from a failure, it tries to rejoin the membership Objective of membership change protocol Only one valid membership view can be formed by the system If a B msg is committed at some nodes in a view, then all nodes in the new view must commit B in the same total order 1/18/2019 EEC688/788: Secure & Dependable Computing

Rotating Sequencer: Membership Change
Operates in three phases Phase I: The node that detected a failure (originator) set new view# = v+1, and broadcasts an invitation msg Invitation msg carries the new view# A node accepts the invitation and ack it provided that It has not accepted an invitation for a competing view Note: a node joins at most one membership view at a time The ack carries the node’s current view# and the next expected global seq# 1/18/2019 EEC688/788: Secure & Dependable Computing

Rotating Sequencer: Membership Change, Phase II
The originator keeps collecting acks until Either it has received ack from every node in the new membership, or It has collected at least N-f acks and a timeout occurred (for liveness) If all acks are positive, the originator proceeds to building a node list for the new view and broadcast it The originator also learns the msg ordering history of previous view Highest global seq#: smax Originator’s expected global seq#: s0 If smax > s0, the originator is missing msgs Smax ≥ than that of the last msg committed in previous view Request retransmission Use smax as starting global seq# for new view provided that it can receive all missing msgs, otherwise, use largest s with the corresponding B received If negative responses received, abort and retry A node aborts when (1) receives an abort msg from originator, or (2) it times out membership change 1/18/2019 EEC688/788: Secure & Dependable Computing

Rotating Sequencer: Membership Change, Phase III
The originator collects responses to its new membership view msg If receives positive responses from every node in new view, commits to the new view Otherwise, abort, waits for a random amount time, and retry with a larger view number 1/18/2019 EEC688/788: Secure & Dependable Computing

Rotating Sequencer: Membership Change Examples
Competing originators 1/18/2019 EEC688/788: Secure & Dependable Computing

Premature timeout 1/18/2019 EEC688/788: Secure & Dependable Computing

Network partitioning 1/18/2019 EEC688/788: Secure & Dependable Computing

Token Based GCS: Totem Totem consists of: Total ordering protocol Membership protocol Recovery protocol Flow control mechanisms Total ordering msg delivery types Safe delivery: a message is delivered only when all correct processes have received it => uniform total ordering Agreed delivery: a message is delivered as long as it is the next message in total order => nonuniform total ordering 1/18/2019 EEC688/788: Secure & Dependable Computing

Exercise 1: Quorum-based data replication
Consider the following replicas from R1 to R5 with the (Value, Version number) sets respectively as R1 (0,0), R2 (0,0), R3 (0,0), R4 (0,0), R5 (0,0). The following sequence of read/write operations between replicas are performed as follows Read operation on R1,R2,R3 Write operation on R3, R4, R5 with a value 2 Read operation on R4, R3, R2 Write operation on R4, R5, R1 with a value 5 Write operation on R2,R4,R5 with a value 7 Give the final values from R1 to R5 after all the above operations are performed?

Exercise 2: Quorum-based data replication
Data replication. Assume we have a distributed system with 6 replicas. If the read quorum has size of 3, what is the minimum size of write quorum? Assume a read quorum consists of 2 replicas. A read operation on data x is mapped to two replicas with one replica has value 2 and version number 1, and the other replica has value 3 and version number 2. Which value will be selected?

Exercise 3: Rotating Sequencer GCS
Consider a set of 6 nodes communicating each other in a view with view number 5. And the 6 nodes are N1 to N6 with their respective id’s from 1 to 6. The number of faulty nodes can be 2 nodes. The group communication is done with normal operation using sequencing. If N4 broadcasted a message B1(5,4,30) how many sequencing messages should be received in response to B1 in order to commit the broadcast message. If N2 timed out sequencer N1 and set request for view change to all the nodes, how many minimum number of positive responses should be received to set the new view. Also what will be the new view number? N3 broadcasted a message with N2 as the sequencer B2(6,3,35) and received sequencing messages] SEQ(100,6,[3,35]), SEQ(101,6,[]), SEQ(102,6,[]), SEQ(103,6,[]). After this if N4 broadcasts a message B3 with N6 as sequencer and local sequence number of N4 is 40. Provide the details of B3 and the next expected sequencing message SEQ.

EEC 688/788 Secure and Dependable Computing

Similar presentations

Presentation on theme: "EEC 688/788 Secure and Dependable Computing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EEC 688/788 Secure and Dependable Computing

Similar presentations

Presentation on theme: "EEC 688/788 Secure and Dependable Computing"— Presentation transcript:

Similar presentations

About project

Feedback