CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.

Slides:



Advertisements
Similar presentations
TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
Advertisements

Lock-Based Concurrency Control
Consistent Cuts Ken Birman. Idea  We would like to take a snapshot of the state of a distributed computation  We’ll do this by asking participants to.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.
Chapter 8 : Transaction Management. u Function and importance of transactions. u Properties of transactions. u Concurrency Control – Meaning of serializability.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Distributed Commit. Example Consider a chain of stores and suppose a manager – wants to query all the stores, – find the inventory of toothbrushes at.
Transaction Management WXES 2103 Database. Content What is transaction Transaction properties Transaction management with SQL Transaction log DBMS Transaction.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering.
SM3121 Software Technology Mark Green School of Creative Media.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Transactions and Recovery
Nachos Phase 1 Code -Hints and Comments
EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Chapter 10 Recovery System. ACID Properties  Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
REPLICATING FILES AND OTHER BIG OBJECTS “OUT OF BAND” WITH ISIS2 Ken Birman 1 Cornell University.
Section 06 (a)RDBMS (a) Supplement RDBMS Issues 2 HSQ - DATABASES & SQL And Franchise Colleges By MANSHA NAWAZ.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
EEC 688/788 Secure and Dependable Computing Lecture 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 Controlled concurrency Now we start looking at what kind of concurrency we should allow We first look at uncontrolled concurrency and see what happens.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
1 Distributed Systems 2007/08 Rollback-Recovery Alberto Montresor Università di Trento This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike.
Snapshots, checkpoints, rollback, and restart
Database Recovery Techniques
Memory Management.
Resource Management IB Computer Science.
CS 6560: Operating Systems Design
Mechanism: Limited Direct Execution
Introduction to Operating System (OS)
Operating System Reliability
Operating System Reliability
Chapter 2: System Structures
Outline Announcements Fault Tolerance.
Operating System Reliability
Fault Tolerance Distributed Web-based Systems
Operating System Reliability
EEC 688/788 Secure and Dependable Computing
Concurrency.
CS703 - Advanced Operating Systems
Lecture Topics: 11/1 General Operating System Concepts Processes
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Recovery System.
Chapter 2: Operating-System Structures
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Operating System Reliability
Database Recovery 1 Purpose of Database Recovery
EEC 688/788 Secure and Dependable Computing
SE350: Operating Systems Lecture 5: Deadlock.
CS514: Intermediate Course in Operating Systems
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
CSE 153 Design of Operating Systems Winter 2019
Chapter 2: Operating-System Structures
EEC 688/788 Secure and Dependable Computing
Last Class: Fault Tolerance
Operating System Reliability
Operating System Reliability
Transactions, Properties of Transactions
Presentation transcript:

CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA

Checkpoints Many fault-tolerant systems need to create and recover from some form of checkpoint Many systems use “transactions”, our main topic next week. These systems can be understood as periodically entering a checkpoint state A common method for dealing with failure is to simply restart the program from a checkpoint For long-running scientific computing, checkpoint creation often uses compiler techniques An important issue if your program will run for weeks!

Common approach Periodically make a “big” checkpoint Then can more frequently make an incremental addition to it For example: the checkpoint could be copies of some files or of a database Looking ahead, the incremental data could be “operations run on the database since the last transaction finished (committed)”

What needs to be in a checkpoint? Scientific computing system might have massive data structures while it runs But perhaps not all needs to be “checkpointed” For example, if it uses big but unchanging tables, why write them out? In general, checkpoint only needs to include data that can’t be “regenerated” in some simple, quick way

Checkpoints have other uses We discussed two styles of air traffic control systems Recall that in the French system, normal programs can join “groups” within which data is replicated One group per pattern of replication Data might be, e.g., “information about ATC sector D-9”

Checkpoints Programs r,s, t join group for ATC sector “D-9” Group members replicate the associated data crash D-9 0 ={p,q} D-9 1 ={p,q,r,s} D-9 2 ={q,r,s} D-9 3 ={q,r,s,t} pqrstpqrst r, s request to join r,s added; state xfer t added, state xfer t requests to join p fails p makes a checkpoint r, s initialize from checkpoint q makes a checkpoint t inializes from checkpoint

What needs to be in a state transfer? Depends on the situation We use state transfer in a group replicating some variable, set of variables, or data structure Checkpoint should include the data in that group Just like printing the data, or writing it to a file, except that we use an in-memory structure In C#, this is called an “in-memory serialization method” Interface is “iserializable” and you can write the code to produce the serialized version yourself

State transfer So to transfer state Pick an appropriate point in the “timeline” Basically, a cut with respect to incoming multicasts Ask some member of the group to checkpoint Also called the “leader election” problem Easy solution: the “oldest” current member It writes this checkpoint to a byte stream Data sent to the new member(s) They rebuild the data structure out of the data as they read it in We’ll see how these subproblems can be solved in lectures over the coming weeks

Still more uses of checkpoints In primary-backup systems We can stream data to the backup and keep it “warm” Or we can create checkpoints and have the backup restart from them Sometimes this is done using some form of shared media, like a dual-ported disk Backup reads the checkpoint when appropriate

Extreme checkpointing At the extreme, a checkpoint could include the entire state of a process Write out its memory “layout” Contents of all pages Contents of registers Now can restart the process by simply reloading its entire state. Windows XP does this for “hibernate” feature; Linux has a similar feature Potentially, very fast

Extreme checkpointing Worry here is that if a program is “temporarily deterministic” it may Crash due to a corrupt data structure Roll back and reload that same structure Crash again Advantage of “rebuilding” data structures is that we avoid this risk

Checkpointing limitations Coping with input channels At a minimum must reopen them, set seek pos’n Dealing with non-determinism Sources include multi-threading Applications that receive user input, timer interrupts, I/O from devices, or messages on multiple connections Basic concern: What if, after roll back to a checkpoint, the application doesn’t repeat the actions that occurred “last time” the process was in that same state

Problems with checkpoints P and Q are interacting Each makes checkpoints now and then p q requestreply

Problems with checkpoints Q crashes and rolls back to checkpoint p q requestreply

Problems with checkpoints Q crashes and rolls back to checkpoint It will have “forgotten” message from P p q request

Problems with checkpoints … Yet Q may even have replied. Who would care? Suppose reply was “OK to release the cash. Account has been debited” p q requestreply

Two related concerns First, Q needs to see that request again, so that it will reenter the state in which it sent the reply Need to regenerate the input request But if Q is non-deterministic, it might not repeat those actions even with identical input So that might not be “enough”

Rollback can leave inconsistency! In this example, we see that checkpoints must somehow be coordinated with communication If we allow programs to communicate and don’t coordinate checkpoints with message passing, system state becomes inconsistent even if individual processes are otherwise healthy

More problems with checkpoints P crashes and rolls back p q requestreply

More problems with checkpoints P crashes and rolls back Will P “reissue” the same request? Recall our non-determinism assumption: it might not! p q requestreply

Solution? One idea: if a process rolls back, roll others back to a consistent state If a message was sent after checkpoint, roll receiver back to a state before that message was received If a message was received after checkpoint roll the sender back to a state prior to sending it Assumes channels will be “empty” after doing this

Problems with checkpoints Q crashes and rolls back p q requestreply

Problems with checkpoints Q crashes and rolls back p q requestreply q rolled back to a state before this was received, or reply was sent

Problems with checkpoints P must also roll back Now it won’t upset us if P happens not to resend the same request p q

Problems with checkpoints But now we can get a cascade effect p q

Problems with checkpoints Q crashes, restarts from checkpoint… p q

Problems with checkpoints Forcing P to rollback for consistency… p q

Problems with checkpoints New inconsistency forces Q to rollback ever further p q

Problems with checkpoints New inconsistency forces P to rollback ever further p q

This is a “cascaded” rollback It arises when the creation of checkpoints is uncoordinated w.r.t. communication Can force a system to roll back to initial state Clearly undesirable in the extreme case… Could be avoided in our example if we had a log for the channel from P to Q

Sometimes action is “external” to system Suppose that P is an ATM machine Asks: Can I give Ken $100 Q debits account and says “OK” P gives out the money We can’t roll P back in this case since the money is already gone

External actions In fact dealing with external actions is a bit like Sam and Jill’s lunch date At best we can checkpoint right before issuing cash from the ATM We can’t get a stronger certainty… so may have to audit the ATM machine after a nasty crash and rollback We won’t discuss this more, but keep in mind that the world is full of limits… sigh…

Bigger issue is non-determinism P’s actions could be tied to something random For example, perhaps a timeout caused P to send this message After rollback these non-deterministic events might occur in some other order Results in a different behavior, like not sending that same request… yet Q saw it, acted on it, and even replied!

Issue has two sides One involves reconstructing P’s message to Q in our examples We don’t want P to roll back, since it might not send the same message But if we had a log with P’s message in it we would be fine, could just replay it The other is that Q might not send the same response (non-determinism) If Q did send a response and doesn’t send the identical one again, we must roll P back

Options? One idea is to coordinate the creation of checkpoints and logging of messages In effect, find a point at which we can pause the system All processes make a checkpoint in a coordinated way (“consistent snapshot”) Then resume Protocols for doing this are well known and isomorphic to to consistent cuts

Why isn’t this common? Often we can’t control processes we didn’t code ourselves Most systems have many black-box components Can’t expect them to implement the checkpoint/rollback policy Hence it isn’t really practical to do coordinated checkpointing if it includes system components

Why isn’t this common? Further concern: not every process can make a checkpoint “on request” Might be in the middle of a costly computation that left big data structures around Or might adopt the policy that “I won’t do checkpoints while I’m waiting for responses from black box components” This interferes with coordination protocols

Implications? Some researchers have studied ensuring that devices, timers, etc, can behave identically if we roll a process back and then restart it This approach was common in 1980’s For example, the “Swallow” operating system Knowing that programs will re-do identical actions eliminates need to cascade rollbacks

Implications? Must also cope with thread preemption Occurs when we use lightweight threads, as in Java or C# Thread scheduler might context switch at times determined by when an interrupt happens Must force the same behavior again later, when restarting, or program could behave differently Schneider/Bressoud: showed how to do this with a special “microcycle” timer register in hardware But not common on modern CPUs

Determinism Despite these issues, often see mechanisms that assume determinism Basically they are saying Either don’t use threads, timers, I/O from multiple incoming channels, shared memory, etc Or use a “determinism forcing mechanism” like the Schneider/Bressoud idea

With determinism… We can revisit the checkpoint rollback problem and do much better Eliminates need for cascaded rollbacks But we do need a way to replay the identical inputs that were received after the checkpoint was made Forces us to think about keeping logs of the channels between processes

Three popular options Receiver based logging Log received messages; like an “extension” of the checkpoint Sender based logging Log messages when you send them, ensures you can resend them if needed Mixed mode (Alvisi) Does both, optimizes to log where doing so is most efficient (results in smallest log/overhead)

Why do these work? Recall the reasons for cascaded rollback A cascade occurs if Q received a message, then rolls back to “before” that happened Now, Q can regenerate the input and re-read the message Only works for messages sent if we have deterministic processes, but often some are deterministic even if others aren’t

With these varied options When Q rolls back we can Re-run Q with identical inputs if Q is deterministic, or Nobody saw messages from Q after checkpoint state was recorded, or We roll back the receivers of those messages An issue: deterministic programs often crash in the identical way if we forced identical execution But here we have flexibility to either force identical executions or do a coordinated rollback

Alvisi developed a general theory Imagine a set of “dials” for each program One shows the estimated cost of sender logging Another shows estimated cost of receive logging One is a switch: deterministic/non-deterministic A meter gives “current cost of doing a checkpoint” Alvisi can collect this sort of input and offer choices to the system Idea is to mix and match, picking cheap solutions Then do coordinated rollback selectively

Connection to consistent cuts A system-wide checkpoint is just a consistent snapshot Checkpoints for each process, plus Log of contents of each channel, Key insight is that we can get this behavior in ways that also exploit optimizations where possible In fact the algorithm is extremely similar and we won’t cover it in detail today

Take-aways? Fault-tolerant systems often use forms of replication to gain availability Including replicating process state by keeping a checkpoint And messages, by logging at sender or receiver And perhaps even recording unpredictable stuff like scheduling decisions, PC when a thread was prempted, or what a system call returned Checkpoint/rollback is best seen as an instance of a broader approach

Take-aways? What makes it hard? It can be slow/expensive to checkpoint in some situations Coordinating the actions of processes is often needed because programs don’t live in isolation Consistency is an underlying theme Outside user shouldn’t be able to tell that restart occurred – it should be “hidden”

Open questions? We haven’t discussed failure detection Discovery of a fault triggers recovery But what if our detector makes mistakes? Could, for example, mistake a timeout for evidence of a crash In upcoming lectures we’ll look at mechanisms a system can use to track its own state and maintain consistency