EEC 688/788 Secure and Dependable Computing Lecture 5 Wenbing Zhao Cleveland State University wenbing@ieee.org
EEC688 Midterm Result Average: 73.6, low: 34, high: 100 (2 of you!) P1-38.5/50, P2-15.3/20, P3-8.5/10, P4-11.4/20
Outline Checkpointing and logging System models Checkpoint-based protocols Uncoordinted checkpointing Coordinated checkpointing Logging-based protocols Pessimistic logging Optimistic logging Causal logging
Checkpointing and Logging: Checkpointing and logging are the most essential techniques to achieve dependability By themselves, they provide rollback recovery They are used for more sophisticated dependability schemes Checkpoint: a copy of the system state Can be used to recover the system to the state when the checkpoint was taken Checkpointing: the action of taking a copy of the system state, typically periodically Logging: log incoming/outgoing messages, etc.
Rollback Recovery vs. Rollforward Recovery
System Models Distributed system model Global state: consistent, inconsistent Distributed system model redefined Piecewise deterministic assumption Output commit Stable storage
System Models Distributed system Fault Model: fail stop A DS consists of N processes A process may interact with other processes only by means of sending and receiving messages A process may interact with another process within the DS, or a process in the outside world Fault Model: fail stop
System Models Process state Global state Defined by its entire address space in OS Relevant info can be captured by user-supplied APIs Global state The state of the entire distributed systems Not a simple aggregation of the states of the processes
Capturing Global State Global state can be captured using a set of individual checkpoints Inconsistent state: checkpoints reflects message received but not sent
Capturing Global State: Example P0: bank account A, P1: bank account B m0: deposit $100 to B (after A has debited A) P0 takes checkpoint C0 before debit op P1 takes checkpoint C1 after depositing $100 Scenario: P0 crashes after sending m0, and P1 crashes after taking C1 If the global state is reconstructed based on C0 and C1, it would appear that P1 got $100 from nowhere
Capturing Global State: Example P0 takes checkpoint C0 after sending m0 (reflect debit of $100) P1 takes checkpoint C1 after depositing $100 Dependency of P0 and P1 is captured by C0 and C1 Global state can be reconstructed based on C0 and C1 correctly
Capturing Global State: Example P0 takes checkpoint C0 after sending m0 (reflect debit of $100) P1 takes checkpoint C1 before receiving m0 but after sending m1 P2 takes checkpoint C3 before receiving m1 If using C0, C1, C3 to reconstruct global state, it would appear that m0 is sent but not received Debit $100 from A, but not deposited to B However, the reconstructed global state is still regarded as consistent because this state could have happened: m0 and m1 are still in transit => channel state
Distributed System Model Redefined A distributed system consists of the following: A set of N processes Each process consists of a set of states and a set of events One of the states is the initial state The change of states is caused by an event A set of channels Each channel is a uni-directional reliable communication channel between two processes The state of a channel consists of the set of messages in transit in the channel A pair of neighboring processes are connected by a pair of channels, one in each direction. An event (such as the sending or receiving of a message) at a process may change the state of the process and the state of the channel it is associated with, if any
Back on the Global State Example Global state consists of C0, C1, and C2 Channel state from P0 to P1: m0 Channel state from P1 to P2: m1
Piecewise Deterministic Assumption Using checkpoints to restore system state (after a crash) would mean that any execution after a checkpoint is lost Logging of events in between two checkpoints would ensure full recovery Piecewise deterministic assumption: All nondeterministic events can be identified Sufficient information (referred to as determinant) that can be used to recreate the event deterministically must be logged for each event Examples: receiving of a message, system calls, timeouts, etc. Note that the sending of a message is not a nondeterministic event (it is determined by another nondeterministic event or the initial state)
Output Commit Once a message is sent to the outside world, the state of the distributed system may be exposed to the outside world Should a failure occur, the outside world cannot be relied upon for recovery Output commit problem: To ensure that the recovered state is consistent with the external view, sufficient recovery information must be logged prior to the sending of a message to the outside world. A distributed system usually receives message from, and sends message to, the outside world E.g., the clients of the services provided by the distributed system
Stable Storage Checkpoints and events must be logged to stable storage that can survive failures for recovery Various forms of stable storage Redundant disks: RAID-1, RAID-5 Replicated file systems: GFS
Checkpoint-Based Protocols Uncoordinated protocols Coordinated protocols
Uncoordinated Checkpointing Uncoordinated checkpointing: full autonomy, appears to be simple. However, we do not recommend it for two reasons Checkpoints taken might not be useful to reconstruct a consistent global state Cascading rollback to the initial state (domino effect) To enable the selection of a set of consistent checkpoints during a recovery, the dependency of checkpoints has to be determined and recorded together with each checkpoint Extra overhead and complexity => not simple after all
Cascading Rollback Problem Last checkpoint: C1,1 by P1, before P1 crashed Cannot use C0,1 at P0 because it is inconsistent with C1,1 => P0 rollbacks to C0,0 P2 would have to rollback to C2,1 because C0,0 does not reflect the sending of m9 Cannot use C2,1 at P2 because it fails to reflect the sending of m6 => P2 rollbacks to C2,0 Cannot use C3,1 and C3,0 as a result => P3 rollbacks to initial state fix the figure!!!! P2 also crashed
Cascading Rollback Problem The rollback of P3 to initial state would invalidate C2,0 => P2 rollbacks to initial state P1 rollbacks to C1,0 due to the rollback of P2 to initial state This would invalidate the use of C0,0 at P0 => P0 rollbacks to initial state The rollback of P0 to initial state would invalidate the use of C1,0 at P1 => P1 rollbacks to initial state
Tamir and Sequin Global Checkpointing Protocol One of the processes is designated as the coordinator Others are participants The coordinator uses a two-phase commit protocol for consistency on the checkpoints Global checkpointing is carried out atomically: all or nothing First phase: create a quiescent point of the distributed system Second phase: ensure the atomic switchover from old checkpoint to the new one
Tamir and Sequin Global Checkpointing Protocol Control messages for coordination CHECKPOINT message: initiate a global checkpoint & to create quiescent point SAVED message: to inform the coordinator that local checkpoint is done by participant FAULT message: a timeout occurred, global checkpointing should abort RESUME message: to inform participants that it is time to resume normal operation Sending a control message: broadcast to all
Tamir and Sequin Global Checkpointing Protocol
Tamir and Sequin Global Checkpointing Protocol Relay is needed only for non-fully-connected topology
Tamir and Sequin Global Checkpointing Protocol: Example
Tamir and Sequin Global Checkpointing Protocol: Proof of Correctness The protocol produces consistent global state Proof: a consistent global state consists of only two scenarios: All msgs sent by one process prior to its taking a local checkpoint have been received prior to the other process taking its local checkpointing This is the case if no process sends any msg after the global checkpoint is initiated Some msgs sent by one process prior to its taking a local checkpoint might arrive after the other process has checkpointed its state, but they are logged for replay Msgs received after the initiation of global checkpointing are logged, but not executed, ensuring this property Note that if a process fails, the global checkpointing would abort
Chandy and Lamport Distributed Snapshot Protocol CL snapshot protocol is a nonblocking protocol TS checkpointing protocol is blocking CL protocol is more desirable for applications that do not wish to suspect normal operation However, CL protocol is only concerned how to obtain a consistent global checkpoint CL Protocol: no coordinator, any node may initiate a global checkpointing Data structure Marker message: equivalent to the CHECKPOINT message Marker certificate: keep track to see if a marker is received from every incoming channel
CL Distributed Snapshot Protocol
Example P0 channel state: m0 (p1 to p0 channel) P2 channel state: empty
Comparison of TS & CL Protocols Similarity Both rely on control msgs to coordinate checkpointing Both capture channel state in virtually the same way Start logging channel state upon receiving the 1st checkpoint msg from another channel Stop logging channel state after received checkpoint on the incoming channel Communication overhead similar
Comparison of TS & CL Protocols Differences: strategies in producing a global checkpoint TS protocol suspends normal operation upon 1st checkpoint msg while CL does not TS protocol captures channel state prior to taking a checkpoint, while CL captures channel state after taking a checkpoint TS protocol more complete and robust than CL Has fault handling mechanism
Log Based Protocols Work might be lost upon recovery using checkpoint-based protocols By logging messages, we may be able to recover the system to where it was prior to the failure System mode: the execution of a process is modeled as a set of consecutive state intervals Each interval is initiated by a nondeterministic state or initial state We assume the only type of nondeterministic event is receiving of a message
Log Based Protocols In practice, logging is always used together with checkpointing Limits the recovery time: start with the latest checkpoint instead of from the initial state Limits the size of the log: after taking a checkpoint, previously logged events can be purged Logging protocol types: Pessimistic logging: msgs are logged prior to execution Optimistic logging: msgs are logged asynchronously Causal logging: nondeterministic events that not yet logged (to stable storage) are piggybacked with each msg sent For optimistic and causal logging, dependency of processes has to be tracked => more complexity, longer recovery time
Pessimistic Logging Synchronously log every incoming message to stable storage prior to execution Each process periodically checkpoints its state: no need for coordination Recovery: a process restores its state using the last checkpoint and replay all logged incoming msgs
Pessimistic Logging: Example Pessimistic logging can cope with concurrent failures and the recovery of two or more processes
Benefits of Pessimistic Logging Processes do not need to track their dependencies Logging mechanism is easy to implement and less error prone Output commit is automatically ensured No need to carry out coordinated global checkpointing By replaying the logged msgs, a process can always bring itself to be consistent with other processes Recovery can be done completely locally Only impact to other processes: duplicate msgs (can be discarded)
Pessimistic Logging: Discussion Reconnection A process must be able to cope with temporary connection failures and be ready to accept reconnections from other processes Application logic should be made independent from the transport level events: event-based or document-based computing paradigm Message duplicate detection Messages may be replayed during recovery => duplicate messages Transport level duplicate detection irrelevant. Must add mechanism in application level protocols, e.g., WS-ReliableMessaging Atomic message receiving and logging A process may fail right after the receiving of a message before it has a chance to log it to stable storage Need application-level reliable messaging mechanism
Application-Level Reliable Messaging Sender buffers message sent until receives an application-level ack
Application-Level Reliable Messaging Benefits of application-level reliable messaging Atomic message receiving and logging Facilitate distributed system recovery from process failures: enables reconnection Enables optimization: message received can be executed immediately and the logging can be deferred until another message is to be sent Logging and msg execution can be done concurrently If a process sends out a message after receiving several msgs, logging of msgs can be batched Did not cover this slide and on 6/15/2015 omit sender-based logging A process does not ack until it has logged the message No outgoing message => no impact to other processes
Sender Based Message Logging Basic idea Log the message at the sending side in volatile memory Should the receiving process fail, it could obtain the messages logged at the sending processes for recovery. To avoid restarting from the initial state after a failure, a process can periodically checkpoint its local state and write the message log in stable storage (as part of the checkpoint) asynchronously Tradeoff Relative ordering of messages must be explicitly supplied by the receiver to the sender (quite counter-intuitive!) The receiver must wait for an explicit ack for the ordering message before it send any msgs to other processes (however, it can execute the message received immediately without delay) The mechanism is to prevent the formation of orphan messages and orphan processes
Orphan Message and Orphan Process An orphan message is one that was sent by a process prior to a failure, but cannot be guaranteed to be regenerated upon the recovery of the process An orphan process is a process that receives an orphan message If a process sends out a message and subsequently fails before the determinants of the messages it has received are properly logged, the message sent becomes an orphan message
Sender Based Message Logging Protocol: Data Structures A counter, seq_counter, used to assign a sequence number (using the current value of the counter) to each outgoing message Needed for duplicate detection A table for duplicate detection Each entry has the form <process_id,max_seq>, where max_seq is the maximum sequence number that the current process has received from a process with an identifier of process_id. A message is deemed as a duplicate if it carries a sequence number lower or equal to max_seq for the corresponding process Another counter, rsn_counter, used to record the receiving/execution order of an incoming message The counter is initialized to 0 and incremented by one for each message received
Sender Based Message Logging Protocol: Data Structures A message log (in volatile memory) for msg sent by the process. In addition to the msg sent, the following meta data is also recorded: Destination process id: receiver_id Sending sequence number: seq Receiving sequence number: rsn A history list for the messages received since the last checkpoint. It is used to find the receiving order number for a duplicate msg. Upon receiving a duplicate message, the process should supply the corresponding (original) receiving order number so that the sender of the message can log such ordering information properly Each entry in the list has the following information: Sending process id: sender_id Receiving sequence number: rsn (assigned by the current process). The history list is used to find the receiving order number for a duplicate message received
What Should be Checkpointed? All the data structures described above except the history list must be checkpointed together with the process state The two counters, one for assigning the message sequence number and the other for assigning the message receiving order, are needed so that the process can continue doing so upon recovery using the checkpoint The table for duplicate detection is needed for a similar reason. Why the message log must be checkpointed? The log is needed for the receiving processes to recover from a failure, and hence, cannot be garbage collected upon a checkpointing operation Additional mechanism is necessary to ensure that the message log does not grow indefinitely
Sender Based Message Logging Protocol: Message Types REGULAR: It is used for sending regular messages generated by the application process, and it has the form <REGULAR, seq, rsn,m> ORDER: It is used for the receiving process is notify the sending process the receiving order of the message. An order message carries the form <ORDER, [m], rsn>, [m] is the message identifier consisting of a tuple <sender_id, receiver_id, seq> ACK: It is used for the sending process (of a regular message) to acknowledge the receipt of the order message. It assumes the form <ACK, [m]>
Sender Based Message Logging Protocol: Normal Operation The protocol operates in three steps for each message: A regular message, <REGULAR,seq, rsn,m>, is sent from one process, e.g., Pi, to another process, e.g., Pj . Process Pj determines the receiving/execution order, rsn, of the regular message and informs the determinant information to Pi in an order message <ORDER, [m], rsn>. Process Pj waits until it has received the corresponding acknowledgment message, <ACK, [m]>, before it sends out any regular message.
Sender Based Message Logging Protocol: Normal Operation, Example
Sender Based Message Logging Protocol: Recovery Mechanism On recovering from a failure, a process first restores its state using the latest local checkpoint, and then it must broadcast a request to all other processes in the system to retransmit all their logged messages that were sent to the process The recovering process retransmit the regular messages or the ack messages based on the following rule: If the entry in the log for a message contains no rsn value, then a regular message is retransmitted because the intended receiving process might not have received this message. If the entry in the log for a message contains a valid rsn value, then an ack message is sent so that the receiving process can send regular messages When a process receives a regular message, it always sends a corresponding order message in response
Actions upon Receiving a Regular Message A process always sends a corresponding order msg in response Three scenarios with recovery The msg is a not duplicate: the current rsn counter value is assigned to the msg and the order msg is sent. The process must wait until it receives the ack msg before it can send any regular msg The msg is a duplicate, and the corresponding rsn is found in the history list: actions are identical to above except rsn is not newly assigned The msg is a duplicate, and no rsn is found in the history list: the process must have checkpointed its state after receiving the msg and the msg is no longer needed for recovery. Hence, the order msg includes a special constant indicating so. The sender can then purge the msg in its log The recovering process may receive two types of retransmitted regular messages: Those with a valid rsn value: the rsn must be already part of the checkpoint. It executes the msg according to the order Those without: can assign the msg to any order
Limitations of Sender Based Msg Logging Protocol Won’t work in the presence of 2 or more concurrent failures Determinant for some regular msgs (i.e., rsn) might be lost => orphan processes and cascading rollbacks P2 may become an orphan process if P0 and P1 both crash: received mt that no one has sent
Truncating Sender’s Message Log Once a process completes a local checkpoint, it broadcasts a message containing the highest rsn value for the messages that it has executed prior to the checkpoint. All messages sent by other processes to this process that were assigned a value that is smaller or equal to this rsn value can now to purged from its message log (including those in stable storage as part of a checkpoint) Alternatively, this highest rsn value can be piggybacked with each message (regular or control messages) sent to another process to enable asynchronous purging of the logged messages that are no longer needed
Exercise 1. Identify the set of most recent checkpoints that can be used to recover the system shown here after the crash of P1 11/30/2018 11/30/2018 EEC693: Secure and Dependable Computing EEC688: Secure & Dependable Computing Wenbing Zhao
EEC688: Secure & Dependable Computing Exercise 2.Chandy and Lamport distributed snapshot protocol is used to produce a consistent global state of the system shown below. Draw all control msgs sent in the CL protocol, the checkpoints taken at P1 and P2, and specify the channel state for the P0 to/from P1 channels, the P1 to/from P2 channels, and P2 to/from P0 channels Software control will be elaborated in more details in the next slide 11/30/2018 EEC688: Secure & Dependable Computing Wenbing Zhao 54