Download presentation
Presentation is loading. Please wait.
Published byHoratio Dean Modified over 8 years ago
1
EEC 688/788 Secure and Dependable Computing Lecture 5 Wenbing Zhao Cleveland State University wenbing@ieee.org
2
Wenbing Zhao Outline Checkpointing and logging System models Checkpoint-based protocols Uncoordinted checkpointing Coordinated checkpointing Logging-based protocols Pessimistic logging Optimistic logging Causal logging
3
EEC688 Midterm Result Average: 83.5, low: 60, high: 100 (4 of you!) P1-42/50, P2-17.4/20, P3-9.5/10, P4-4.4/10, P5-10/10
4
Wenbing Zhao Checkpointing and Logging: Checkpointing and logging are the most essential techniques to achieve dependability By themselves, they provide rollback recovery They are used for more sophisticated dependability schemes Checkpoint: a copy of the system state Can be used to recover the system to the state when the checkpoint was taken Checkpointing: the action of taking a copy of the system state, typically periodically Logging: log incoming/outgoing messages, etc.
5
Wenbing Zhao Rollback Recovery vs. Rollforward Recovery
6
System Models Distributed system model Global state: consistent, inconsistent Distributed system model redefined Piecewise deterministic assumption Output commit Stable storage 6
7
System Models Distributed system A DS consists of N processes A process may interact with other processes only by means of sending and receiving messages A process may interact with another process within the DS, or a process in the outside world Fault Model: fail stop 7
8
System Models Process state Defined by its entire address space in OS Relevant info can be captured by user-supplied APIs Global state The state of the entire distributed systems Not a simple aggregation of the states of the processes 8
9
Capturing Global State Global state can be captured using a set of individual checkpoints Inconsistent state: checkpoints reflects message received but not sent 9
10
Capturing Global State: Example P0: bank account A, P1: bank account B m0: deposit $100 to B (after A has debited A) P0 takes checkpoint C0 before debit op P1 takes checkpoint C1 after depositing $100 Scenario: P0 crashes after sending m0, and P1 crashes after taking C1 If the global state is reconstructed based on C0 and C1, it would appear that P1 got $100 from nowhere 10
11
Capturing Global State: Example P0 takes checkpoint C0 after sending m0 (reflect debit of $100) P1 takes checkpoint C1 after depositing $100 Dependency of P0 and P1 is captured by C0 and C1 Global state can be reconstructed based on C0 and C1 correctly 11
12
Capturing Global State: Example P0 takes checkpoint C0 after sending m0 (reflect debit of $100) P1 takes checkpoint C1 before receiving m0 but after sending m1 P2 takes checkpoint C3 before receiving m1 If using C0, C1, C3 to reconstruct global state, it would appear that m0 is sent but not received Debit $100 from A, but not deposited to B However, the reconstructed global state is still regarded as consistent because this state could have happened: m0 and m1 are still in transit => channel state 12
13
Distributed System Model Redefined A distributed system consists of the following: A set of N processes Each process consists of a set of states and a set of events One of the states is the initial state The change of states is caused by an event A set of channels Each channel is a uni-directional reliable communication channel between two processes The state of a channel consists of the set of messages in transit in the channel A pair of neighboring processes are connected by a pair of channels, one in each direction. An event (such as the sending or receiving of a message) at a process may change the state of the process and the state of the channel it is associated with, if any 13
14
Back on the Global State Example Global state consists of C0, C1, and C2 Channel state from P0 to P1: m0 Channel state from P1 to P2: m1 14
15
Piecewise Deterministic Assumption Using checkpoints to restore system state (after a crash) would mean that any execution after a checkpoint is lost Logging of events in between two checkpoints would ensure full recovery Piecewise deterministic assumption: All nondeterministic events can be identified Sufficient information (referred to as determinant) that can be used to recreate the event deterministically must be logged for each event Examples: receiving of a message, system calls, timeouts, etc. Note that the sending of a message is not a nondeterministic event (it is determined by another nondeterministic event or the initial state) 15
16
Output Commit Once a message is sent to the outside world, the state of the distributed system may be exposed to the outside world Should a failure occur, the outside world cannot be relied upon for recovery Output commit problem: To ensure that the recovered state is consistent with the external view, sufficient recovery information must be logged prior to the sending of a message to the outside world. 16
17
Stable Storage Checkpoints and events must be logged to stable storage that can survive failures for recovery Various forms of stable storage Redundant disks: RAID-1, RAID-5 Replicated file systems: GFS 17
18
Checkpoint-Based Protocols Uncoordinated protocols Coordinated protocols 18
19
Uncoordinated Checkpointing Uncoordinated checkpointing: full autonomy, appears to be simple. However, we do not recommend it for two reasons Checkpoints taken might not be useful to reconstruct a consistent global state Cascading rollback to the initial state (domino effect) To enable the selection of a set of consistent checkpoints during a recovery, the dependency of checkpoints has to be determined and recorded together with each checkpoint Extra overhead and complexity => not simple after all 19
20
Cascading Rollback Problem Last checkpoint: C 1,1 by P1, before P1 crashed Cannot use C 0,1 at P0 because it is inconsistent with C 1,1 => P0 rollbacks to C 0,0 P2 would have to rollback to C 2,1 because C 0,0 does not reflect the sending of m9 Cannot use C 2,1 at P2 because it fails to reflect the sending of m6 => P2 rollbacks to C 2,0 Cannot use C 3,1 and C 3,0 as a result => P3 rollbacks to initial state 20
21
Cascading Rollback Problem The rollback of P3 to initial state would invalidate C 2,0 => P2 rollbacks to initial state P1 rollbacks to C 1,0 due to the rollback of P2 to initial state This would invalidate the use of C 0,0 at P0 => P0 rollbacks to initial state The rollback of P0 to initial state would invalidate the use of C 1,0 at P1 => P1 rollbacks to initial state 21
22
Tamir and Sequin Global Checkpointing Protocol One of the processes is designated as the coordinator Others are participants The coordinator uses a two-phase commit protocol for consistency on the checkpoints Global checkpointing is carried out atomically: all or nothing First phase: create a quiescent point of the distributed system Second phase: ensure the atomic switchover from old checkpoint to the new one 22
23
Tamir and Sequin Global Checkpointing Protocol Control messages for coordination CHECKPOINT message: initiate a global checkpoint & to create quiescent point SAVED message: to inform the coordinator that local checkpoint is done by participant FAULT message: a timeout occurred, global checkpointing should abort RESUME message: to inform participants that it is time to resume normal operation Sending a control message: broadcast to all 23
24
Tamir and Sequin Global Checkpointing Protocol 24
25
Tamir and Sequin Global Checkpointing Protocol 25
26
Tamir and Sequin Global Checkpointing Protocol: Example 26
27
Tamir and Sequin Global Checkpointing Protocol: Proof of Correctness The protocol produces consistent global state Proof: a consistent global state consists of only two scenarios: All msgs sent by one process prior to its taking a local checkpoint have been received prior to the other process taking its local checkpointing This is the case if no process sends any msg after the global checkpoint is initiated Some msgs sent by one process prior to its taking a local checkpoint might arrive after the other process has checkpointed its state, but they are logged for replay Msgs received after the initiation of global checkpointing are logged, but not executed, ensuring this property Note that if a process fails, the global checkpointing would abort 27
28
Chandy and Lamport Distributed Snapshot Protocol CL snapshot protocol is a nonblocking protocol TS checkpointing protocol is blocking CL protocol is more desirable for applications that do not wish to suspect normal operation However, CL protocol is only concerned how to obtain a consistent global checkpoint CL Protocol: no coordinator, any node may initiate a global checkpointing Data structure Marker message: equivalent to the CHECKPOINT message Marker certificate: keep track to see if a marker is received from every incoming channel
29
CL Distributed Snapshot Protocol
30
Example P0 channel state: m0 (p1 to p0 channel) P1 channel state: m1 (p2 to p1 channel) P2 channel state: empty
31
Comparison of TS & CL Protocols Similarity Both rely on control msgs to coordinate checkpointing Both capture channel state in virtually the same way Start logging channel state upon receiving the 1 st checkpoint msg from another channel Stop logging channel state after received checkpoint on the incoming channel Communication overhead similar
32
Comparison of TS & CL Protocols Differences: strategies in producing a global checkpoint TS protocol suspends normal operation upon 1 st checkpoint msg while CL does not TS protocol captures channel state prior to taking a checkpoint, while CL captures channel state after taking a checkpoint TS protocol more complete and robust than CL Has fault handling mechanism
33
Log Based Protocols Work might be lost upon recovery using checkpoint- based protocols By logging messages, we may be able to recover the system to where it was prior to the failure System mode: the execution of a process is modeled as a set of consecutive state intervals Each interval is initiated by a nondeterministic state or initial state We assume the only type of nondeterministic event is receiving of a message
34
Log Based Protocols In practice, logging is always used together with checkpointing Limits the recovery time: start with the latest checkpoint instead of from the initial state Limits the size of the log: after taking a checkpoint, previously logged events can be purged Logging protocol types: Pessimistic logging: msgs are logged prior to execution Optimistic logging: msgs are logged asynchronously Causal logging: nondeterministic events that not yet logged (to stable storage) are piggybacked with each msg sent For optimistic and causal logging, dependency of processes has to be tracked => more complexity, longer recovery time
35
Pessimistic Logging Synchronously log every incoming message to stable storage prior to execution Each process periodically checkpoints its state: no need for coordination Recovery: a process restores its state using the last checkpoint and replay all logged incoming msgs
36
Pessimistic Logging: Example Pessimistic logging can cope with concurrent failures and the recovery of two or more processes
37
Benefits of Pessimistic Logging Processes do not need to track their dependencies Logging mechanism is easy to implement and less error prone Output commit is automatically ensured No need to carry out coordinated global checkpointing By replaying the logged msgs, a process can always bring itself to be consistent with other processes Recovery can be done completely locally Only impact to other processes: duplicate msgs (can be discarded)
38
Pessimistic Logging: Discussion Reconnection A process must be able to cope with temporary connection failures and be ready to accept reconnections from other processes Application logic should be made independent from the transport level events: event-based or document-based computing paradigm Message duplicate detection Messages may be replayed during recovery => duplicate messages Transport level duplicate detection irrelevant. Must add mechanism in application level protocols, e.g., WS-ReliableMessaging Atomic message receiving and logging A process may fail right after the receiving of a message before it has a chance to log it to stable storage Need application-level reliable messaging mechanism
39
Application-Level Reliable Messaging Sender buffers message sent until receives an application-level ack
40
Application-Level Reliable Messaging Benefits of application-level reliable messaging Atomic message receiving and logging Facilitate distributed system recovery from process failures: enables reconnection Enables optimization: message received can be executed immediately and the logging can be deferred until another message is to be sent Logging and msg execution can be done concurrently If a process sends out a message after receiving several msgs, logging of msgs can be batched A process does not ack until it has logged the message No outgoing message => no impact to other processes
41
Sender Based Message Logging Basic idea Log the message at the sending side in volatile memory Should the receiving process fail, it could obtain t he messages logged at the sending processes for recovery. To avoid restarting from the initial state after a failure, a process can periodically checkpoint its local state and write the message log in stable storage (as part of the checkpoint) asynchronously Tradeoff Relative ordering of messages must be explicitly supplied by the receiver to the sender (quite counter-intuitive!) The receiver must wait for an explicit ack for the ordering message before it send any msgs to other processes (however, it can execute the message received immediately without delay) The mechanism is to prevent the formation of orphan messages and orphan processes
42
Orphan Message and Orphan Process An orphan message is one that was sent by a process prior to a failure, but cannot be guaranteed to be regenerated upon the recovery of the process An orphan process is a process that receives an orphan message If a process sends out a message and subsequently fails before the determinants of the messages it has received are properly logged, the message sent becomes an orphan message
43
Sender Based Message Logging Protocol: Data Structures A counter, seq_counter, used to assign a sequence number (using the current value of the counter) to each outgoing message Needed for duplicate detection A table for duplicate detection Each entry has the form, where max_seq is the maximum sequence number that the current process has received from a process with an identifier of process_id. A message is deemed as a duplicate if it carries a sequence number lower or equal to max_seq for the corresponding process Another counter, rsn_counter, used to record the receiving/execution order of an incoming message The counter is initialized to 0 and incremented by one for each message received
44
Sender Based Message Logging Protocol: Data Structures A message log (in volatile memory) for msg sent by the process. In addition to the msg sent, the following meta data is also recorded: Destination process id: receiver_id Sending sequence number: seq Receiving sequence number: rsn A history list for the messages received since the last checkpoint. It is used to find the receiving order number for a duplicate msg. Upon receiving a duplicate message, the process should supply the corresponding (original) receiving order number so that the sender of the message can log such ordering information properly Each entry in the list has the following information: Sending process id: sender_id Sending sequence number: seq Receiving sequence number: rsn (assigned by the current process).
45
What Should be Checkpointed? All the data structures described above except the history list must be checkpointed together with the process state The two counters, one for assigning the message sequence number and the other for assigning the message receiving order, are needed so that the process can continue doing so upon recovery using the checkpoint The table for duplicate detection is needed for a similar reason. Why the message log must be checkpointed? The log is needed for the receiving processes to recover from a failure, and hence, cannot be garbage collected upon a checkpointing operation Additional mechanism is necessary to ensure that the message log does not grow indefinitely
46
Sender Based Message Logging Protocol: Message Types REGULAR: It is used for sending regular messages generated by the application process, and it has the form ORDER: It is used for the receiving process is notify the sending process the receiving order of the message. An order message carries the form, [m] is the message identifier consisting of a tuple ACK: It is used for the sending process (of a regular message) to acknowledge the receipt of the order message. It assumes the form
47
Sender Based Message Logging Protocol: Normal Operation The protocol operates in three steps for each message: A regular message,, is sent from one process, e.g., Pi, to another process, e.g., Pj. Process Pj determines the receiving/execution order, rsn, of the regular message and informs the determinant information to Pi in an order message. Process Pj waits until it has received the corresponding acknowledgment message,, before it sends out any regular message.
49
Sender Based Message Logging Protocol: Recovery Mechanism On recovering from a failure, a process first restores its state using the latest local checkpoint, and then it must broadcast a request to all other processes in the system to retransmit all their logged messages that were sent to the process The recovering process retransmit the regular messages or the ack messages based on the following rule: If the entry in the log for a message contains no rsn value, then a regular message is retransmitted because the intended receiving process might not have received this message. If the entry in the log for a message contains a valid rsn value, then an ack message is sent so that the receiving process can send regular messages When a process receives a regular message, it always sends a corresponding order message in response
50
Actions upon Receiving a Regular Message A process always sends a corresponding order msg in response Three scenarios with recovery The msg is a not duplicate: the current rsn counter value is assigned to the msg and the order msg is sent. The process must wait until it receives the ack msg before it can send any regular msg The msg is a duplicate, and the corresponding rsn is found in the history list: actions are identical to above except rsn is not newly assigned The msg is a duplicate, and no rsn is found in the history list: the process must have checkpointed its state after receiving the msg and the msg is no longer needed for recovery. Hence, the order msg includes a special constant indicating so. The sender can then purge the msg in its log The recovering process may receive two types of retransmitted regular messages: Those with a valid rsn value: the rsn must be already part of the checkpoint. It executes the msg according to the order Those without: can assign the msg to any order
51
Limitations of Sender Based Msg Logging Protocol Won’t work in the presence of 2 or more concurrent failures Determinant for some regular msgs (i.e., rsn) might be lost => orphan processes and cascading rollbacks P2 may become an orphan process if P0 and P1 both crash: received mt that no one has sent
52
Truncating Sender’s Message Log Once a process completes a local checkpoint, it broadcasts a message containing the highest rsn value for the messages that it has executed prior to the checkpoint. All messages sent by other processes to this process that were assigned a value that is smaller or equal to this rsn value can now to purged from its message log (including those in stable storage as part of a checkpoint) Alternatively, this highest rsn value can be piggybacked with each message (regular or control messages) sent to another process to enable asynchronous purging of the logged messages that are no longer needed
53
3/11/2016 EEC693: Secure and Dependable Computing Wenbing Zhao Exercise 1. Identify the set of most recent checkpoints that can be used to recover the system shown here after the crash of P1 3/11/2016 EEC688: Secure & Dependable Computing 53
54
3/11/2016 EEC688: Secure & Dependable Computing Wenbing Zhao Exercise 2.Chandy and Lamport distributed snapshot protocol is used to produce a consistent global state of the system shown below. Draw all control msgs sent in the CL protocol, the checkpoints taken at P1 and P2, and specify the channel state for the P0 to/from P1 channels, the P1 to/from P2 channels, and P2 to/from P0 channels
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.