© Chinese University, CSE Dept. Distributed Systems / 8 - 1 Distributed Systems Topic 8: Fault Tolerance and Replication Dr. Michael R. Lyu Computer Science.

Slides:



Advertisements
Similar presentations
Two phase commit. Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers.
Advertisements

CS542: Topics in Distributed Systems Distributed Transactions and Two Phase Commit Protocol.
Slides for Chapter 13: Distributed transactions
DISTRIBUTED SYSTEMS II REPLICATION CNT. II Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Computer Science Lecture 18, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
CS 582 / CMPE 481 Distributed Systems
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
CSS490 Replication & Fault Tolerance
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 9: Time, Coordination and Replication Dr. Michael R. Lyu Computer.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 12: Recovery and Fault Tolerance Computer Science & Engineering.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
1 CS 194: Distributed Systems Distributed Commit, Recovery Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 11: Transactions Dr. Michael R. Lyu Computer Science & Engineering.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
Distributed Deadlocks and Transaction Recovery.
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
Distributed Transactions: Distributed deadlocks Recovery techniques.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Distributed Transactions Chapter 13
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
DISTRIBUTED SYSTEMS II AGREEMENT - COMMIT (2-3 PHASE COMMIT) Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Distributed Transactions Chapter – Vidya Satyanarayanan.
Fault Tolerant Services
Fault Tolerance and Replication
Fault Tolerance Chapter 7.
Chapter 11 Fault Tolerance. Topics Introduction Process Resilience Reliable Group Communication Recovery.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 17: Distributed.
IM NTU Distributed Information Systems 2004 Distributed Transactions -- 1 Distributed Transactions Yih-Kuen Tsay Dept. of Information Management National.
Replication and Group Communication. Management of Replicated Data FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM Instructor’s.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Lecture 13: Replication Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.
Fault Tolerance Chapter 7. Goal An important goal in distributed systems design is to construct the system in such a way that it can automatically recover.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
Distributed Computing Systems Replication Dr. Sunny Jeong. Mr. Colin Zhang With Thanks to Prof. G. Coulouris,
Operating System Reliability Andy Wang COP 5611 Advanced Operating Systems.
Replication Chapter Katherine Dawicki. Motivations Performance enhancement Increased availability Fault Tolerance.
Fault Tolerance Prof. Orhan Gemikonakli
Fault Tolerance Chap 7.
Recovery in Distributed Systems:
Two phase commit.
Operating System Reliability
Operating System Reliability
Distributed Systems Topic 8: Fault Tolerance and Replication
Outline Announcements Fault Tolerance.
Operating System Reliability
Operating System Reliability
Replication Improves reliability Improves availability
Active replication for fault tolerance
Slides for Chapter 14: Distributed transactions
Operating System Reliability
Abstractions for Fault Tolerance
Last Class: Fault Tolerance
Operating System Reliability
Operating System Reliability
Presentation transcript:

© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 8: Fault Tolerance and Replication Dr. Michael R. Lyu Computer Science & Engineering Department The Chinese University of Hong Kong

© Chinese University, CSE Dept. Distributed Systems / Outline 1 Introduction 2 Transaction Recovery 3 Failure Classification and Masking 4 Replication 5 Summary

© Chinese University, CSE Dept. Distributed Systems / Introduction  Achieving Reliability: Fault Tolerance and Recovery  Fault-tolerant applications: –transaction based –process control  Recovery aspects of distributed transactions.  The design of real time services. –Fail-stop vs Byzantine failure.  Masking failures in a service.

© Chinese University, CSE Dept. Distributed Systems / Basic Approaches  Fault Detection: –Push Model: Server objects send heartbeat messages to Fault Manager. –Pull Model: Fault Manager polls (or pings) server objects through their is_alive() interface.  Data Recovery: –Checkpoint and rollback: Save the server object states. Roll back to checkpointed states at recovery. –Message logging and replay: Log all messages. Replay them at recovery.

© Chinese University, CSE Dept. Distributed Systems / Transaction Recovery  Recovery concerns data durability (permanent and volatile data) and failure atomicity.  A server keeps data in volatile memory and records committed data in a recovery file.  Recovery manager –Save data items in permanent storage –Restore the server’s data items after a crash –Reorganize the recovery file for better performance –Reclaim storage space (in the recovery file)

© Chinese University, CSE Dept. Distributed Systems / Type of entryDescription of contents of entry Object A value of an object Transaction statusTransaction identifier, transaction status (prepared, committed, aborted) and other status values used for two-phase commit Intentions listTransaction identifier and a sequence of intentions, each of which consists of, where Pi is the position in the recovery file of the value of the object. 2 Entries in Recovery File

© Chinese University, CSE Dept. Distributed Systems / Intentions List  An intentions list of a server is a list of data item names and values altered by a transaction.  The server uses the intentions list when a transaction commits or aborts.  When a server prepares to commit, it must have saved the intentions list in its recovery file.  The recovery files contain sufficient information to ensure the transaction is committed by all the servers.  Two approaches: logging and shadow versions

© Chinese University, CSE Dept. Distributed Systems / Logging  A log contains history of all the transactions performed by a server.  The recovery file contains a recent snapshot of the values of all the data items in the server followed by a history of transactions.  When a server is prepared to commit, the recovery manager appends all the data items in its intentions list to the recovery file.  The recovery manager associates a unique identifier with each data item.

© Chinese University, CSE Dept. Distributed Systems / Log for Banking Service

© Chinese University, CSE Dept. Distributed Systems / Recovery by Logging  Recovery of data items –Recovery manager is responsible for restoring the server’s data items. –The most recent information is at the end of the log. –A recovery manager gets corresponding intentions list from the recovery file.  Reorganizing the recovery file –Checkpointing: the process of writing the current committed values (checkpoint) to a new recovery file. –Can be done periodically or right after recovery.

© Chinese University, CSE Dept. Distributed Systems / Shadow Versions  Shadow versions technique uses a map to locate versions of the server’s data items in a file called a version store.  The versions written by each transaction are shadows of the previous committed versions.  When prepared to commit, any changed data are appended to the version store.  When committing, a new map is made. When complete, new map replaces the old map.

© Chinese University, CSE Dept. Distributed Systems / Shadow Versions Example Map at startMap when T commits A P 0 A P 1 B P 0 'B P 2 C P 0 "C P 0 " P 0 P 0 ' P 0 " P 1 P 2 P 3 P 4 Version store Checkpoint → → → → → →

© Chinese University, CSE Dept. Distributed Systems / Log and 2PC Trans:TCoord’r:TTrans:T UPart’pant: U Trans:U U preparedpart’pant list:... committedpreparedCoord’r:..uncertaincommitted intentions list intentions list

© Chinese University, CSE Dept. Distributed Systems / Recovery of 2PC RoleStatusAction of recovery manager CoordinatorpreparedNo decision had been reached before the server failed. It sends abortTransaction to all the servers in the participant list and adds the transaction statusaborted in its recovery file. Same action for state aborted. If there is no participant list, the participants will eventually timeout and abort the transaction. CoordinatorcommittedA decision to commit had been reached before the server failed. It sends adoCommit to all the participants in its participant list (in case it had not done so before) and resumes the two-phase protocol at step 4 (Fig 17.5). ParticipantcommittedThe participant sends ahaveCommitted message to the coordinator (in case this was not done before it failed). This will allow the coordinator to discard information about this transaction at the next checkpoint. ParticipantuncertainThe participant failed before it knew the outcome of the transaction. It cannot determine the status of the transaction until the coordinator informs it of the decision. It will send agetDecision to the coordinator to determine the status of the transaction. When it receives the reply it will commit or abort accordingly. ParticipantpreparedThe participant has not yet voted and can abort the transaction. Coordinatordone No action is required.

© Chinese University, CSE Dept. Distributed Systems / Failures Classification and Masking  Two contrasting points on distributed systems: –The operation of a service depends on the correct operation of other services. –Joint execution of a set of servers is less likely to fail than any one of the individual components.  Designers of a service should specify its correct behavior and the way it may fail  Failure semantics: a description of the ways a service may fail. Can be used for its clients to mask its failures.

© Chinese University, CSE Dept. Distributed Systems / Class of failureSubclassDescription Omission failure A server omits to respond to a request Response failure Server responds incorrectly to a request Value failureReturn wrong value State transitionHas wrong effect on resources (for failureexample, sets wrong values in data items) Timing failure Response not within a specified time interval Crash failure Repeated omission failure: a server repeatedly fails to respond to requests until it is restarted Amnesia-crashA server starts in its initial state, having forgotten its state at the time of the crash Pause-crashA server restarts in the state before the crash Halting-crashServer never restarts 3.1 Characteristics of Failures

© Chinese University, CSE Dept. Distributed Systems / Fail-Stop vs Byzantine Failures  A fail-stop failure is one that server fails cleanly. That is, it either functions, or else it crashes.  Byzantine failure behavior is used to describe the worse possible failure semantics of a server: it fails maliciously or arbitrarily.  Byzantine agreement is intended for correct behaviors within response time requirement in the presence of faulty hardware.  It depends on if messages can be authenticated.

© Chinese University, CSE Dept. Distributed Systems / Byzantine Generals

© Chinese University, CSE Dept. Distributed Systems / Byzantine Agreement Algorithms  Byzantine agreement algorithms send more messages and use more active servers.  When messages can be authenticated, 2N+1 servers are required to tolerate N bad servers.  When messages cannot be authenticated, 3N+1 servers are required.  With enough good servers, solutions require O(N 2 ) messages with constant delay time.  Fortunately, the good news is...

© Chinese University, CSE Dept. Distributed Systems / Hierarchical Masking of Failures  We describe two approaches to masking failures: hierarchical failure masking and group failure masking.  In hierarchical failure masking, a server of higher level tries to mask failures at lower-level.  When a lower-level failure cannot be masked, it is converted to a higher level exception.  Example: Server crash is masked in RR protocol by raising an exception to the client.

© Chinese University, CSE Dept. Distributed Systems / Group Failure Masking  A service can be made fault tolerant by implementing it by a group of servers.  A group is t-fault tolerant if it can tolerate up to t member failures.  For fail-stop failures, t+1 servers are needed.  For Byzantine failures, 2t+1 servers needed.  To ensure correctness, the server program must be deterministic, and each operation must be atomic w.r.t. other operations.

© Chinese University, CSE Dept. Distributed Systems / Group Failure Masking  A group can be closely synchronized or loosely synchronized.  In a closely synchronized group of servers: –All members execute requests immediately. –Server programs are both deterministic and atomic. –Suitable for real time system and Byzantine failures.  In a loosely synchronized group of servers: –One server (primary) performs requests, others (backup) log the requests and take over if needed. –Requires less resource but takes longer to recover.

© Chinese University, CSE Dept. Distributed Systems / Replication  Replication is the maintenance of on-line copies of data and resources  For performance, availability, fault tolerance.  Basic architectural model.  The process group approach.  The primary-backup replication model.  The active replication model.  The gossip architecture.

© Chinese University, CSE Dept. Distributed Systems / Replication Issues  Replica management models consider trade- off between accuracy and response time. –Simple asynchronous model –Totally synchronous model –Quorum-based schemes –Causality-ordered  Multicast updates to a process group.  Read/write ratio.

© Chinese University, CSE Dept. Distributed Systems / Basic Architectural Model FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM

© Chinese University, CSE Dept. Distributed Systems / System Model  FE issues the request to one or more RMs.  Coordination: RMs coordinate in preparation for executing the request consistently. –FIFO ordering: If FE issues request r then r’, then all RMs handle r before r’. –Causal ordering: If r  r’ then all RMs handle r before r’. –Total ordering: If an RM handles r before r’ then all other RMs handles r before r’.  Execution: RMs execute the request tentatively.  Agreement: RMs reach consensus on the effect.  Response: One or more RMs responds to FE.

© Chinese University, CSE Dept. Distributed Systems / Process Group Approach  Process group and group communication.  Group structure –peer group –server group –client-server group –subscription group –hierarchical groups

© Chinese University, CSE Dept. Distributed Systems / Process Group Services  Group membership management –Create –Join –Leave  Group address expansion  Multicast communication –unreliable multicast –reliable multicast –atomic multicast Join Group address expansion Multicast communication Group send Fail Group membership management Leave Process group

© Chinese University, CSE Dept. Distributed Systems / Passive (Primary-Backup) Replication FE C C RM Primary Backup RM

© Chinese University, CSE Dept. Distributed Systems / Active Replication FEC CRM

© Chinese University, CSE Dept. Distributed Systems / The Gossip Architecture Query & Update Ops.Clients Communication via FE Query Val FE RM Query,prevVal,new Update FE Update,prev Update id Service Clients gossip FE Clients FE Service Vector timestamps RM gossip

© Chinese University, CSE Dept. Distributed Systems / Summary  Transaction recovery –long-life application and data integrity –atomic commit protocol is the key –checkpoints and logging in a recovery file  Fault tolerance and replication –real-time application –importance of fault semantics –primary-backup server for fail-stop failures –closely synchronized group for Byzantine failures  Read textbook Chapter 15.5 for Byzantine Generals Problem, Chapter 17.6 for Transaction Recovery, and Chapter 18 for Replication.