Providing High Availability Using Lazy Replication Rivaka Ladin, Barbara Liskov, Liuba Shrira, Sanjay Ghemawat Presented by Huang-Ming Huang.

Slides:



Advertisements
Similar presentations
Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.
Advertisements

Replication and Consistency (2). Reference r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba.
Consistency and Replication (3). Topics Consistency protocols.
Efficient Solutions to the Replicated Log and Dictionary Problems
1 Linearizability (p566) the strictest criterion for a replication system  The correctness criteria for replicated objects are defined by referring to.
DISTRIBUTED SYSTEMS II REPLICATION –QUORUM CONSENSUS Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Exercises for Chapter 17: Distributed Transactions
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CS542: Topics in Distributed Systems
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CS 582 / CMPE 481 Distributed Systems
Slides for Chapter 10: Time and Global State
CS 582 / CMPE 481 Distributed Systems Replication.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
CSS490 Replication & Fault Tolerance
Distributed Systems Fall 2011 Gossip and highly available services.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 9: Time, Coordination and Replication Dr. Michael R. Lyu Computer.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Chapter 18: Distributed Coordination (Chapter 18.1 – 18.5)
EEC 688 Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
6.4 Data and File Replication Gang Shen. Why replicate  Performance  Reliability  Resource sharing  Network resource saving.
6.4 Data And File Replication Presenter : Jing He Instructor: Dr. Yanqing Zhang.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Replication ( ) by Ramya Balakumar
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 18: Replication.
Slides for Chapter 14: Replication From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley 2001.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
DISTRIBUTED SYSTEMS II REPLICATION CNT. Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Exercises for Chapter 18: Replication From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley 2001.
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.
Practical Byzantine Fault Tolerance
From Viewstamped Replication to BFT Barbara Liskov MIT CSAIL November 2007.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
1 Highly available services  we discuss the application of replication techniques to make services highly available. –we aim to give clients access to.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
DISTRIBUTED SYSTEMS II REPLICATION Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Fault Tolerant Services
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Lampson and Lomet’s Paper: A New Presumed Commit Optimization for Two Phase Commit Doug Cha COEN 317 – SCU Spring 05.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 8: Fault Tolerance and Replication Dr. Michael R. Lyu Computer Science.
Replication and Group Communication. Management of Replicated Data FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM Instructor’s.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Computer Science 425 Distributed Systems (Fall 2009) Lecture 24 Transactions with Replication Reading: Section 15.5 Klara Nahrstedt.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
Lecture 19-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) October 29, 2013 Lecture 19 Gossiping Reading: Section.
CSE 486/586 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 19-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 30, 2012 Lecture 19 Gossiping.
Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2011 Gossiping Reading: Section 15.4 / 18.4  2011, N. Borisov, I. Gupta, K. Nahrtstedt,
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
PERFORMANCE MANAGEMENT IMPROVING PERFORMANCE TECHNIQUES Network management system 1.
1 Highly available services  we discuss the application of replication techniques to make services highly available. –we aim to give clients access to.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Exercises for Chapter 14: Replication
Lecturer : Dr. Pavle Mogin
6.4 Data and File Replication
Chapter 14: Replication Introduction
Distributed systems II Replication Cnt.
RELIABILITY.
Outline Announcements Fault Tolerance.
Slides for Chapter 15: Replication
Exercises for Chapter 14: Distributed Transactions
Slides for Chapter 11: Time and Global State
Slides for Chapter 18: Replication
Distributed Systems Course Replication
Network management system
Presentation transcript:

Providing High Availability Using Lazy Replication Rivaka Ladin, Barbara Liskov, Liuba Shrira, Sanjay Ghemawat Presented by Huang-Ming Huang

Outline Model Algorithm Performance Analysis Discussion

Replication Model client RM FE Service Replication Manager Front ends Excerpt from “Distributed Systems – Concept and Design” by Coulouris, Dollimore and Kindberg

System Guarantees Each client obtains a consistent service over time Relaxed consistency between replicas Updates are applied with ordering guarantees that make the replicas sufficiently similar.

Operation Classification RM FE Client queryval update Query, prevVal, new Update, prev Update id gossip Excerpt from “Distributed Systems – Concept and Design” by Coulouris, Dollimore and Kindberg

Update operation classification Causal update Forced update : performed in the same order (relative to one another) at all replicas. Immediate update : performed at all replicas in the same order relative to all other operations.

Vector timestamp Given two timestamps T = (t 1,t 2, ,t n ) S = (s 1,s 2, ,s n ) T ≤ S ≡ t i ≤ s i for all i merge(T,S)= (max(t 1,s 1 ),…,max(t n,s n )) Each part of the vector timestamp corresponds to each replica manager in the system.

RM components Replica timestamp Update log Value Timestamp Value Timestamp table Executed operation table FE Other replicas Gossip Messages Updates Operationprevid Replica TimestampReplica log stable updates Excerpt from “Distributed Systems – Concept and Design” by Coulouris, Dollimore and Kindberg

Query The replica manager blocks the query q operation until the condition holds: q.prev <= valueTS The replica manger returns valueTS back to FE. FE updates its own timestamp frontEndTS := merge(frontEndTS, new)

( r 1,r 2,…,r i +1,…,r n ) Causal Update ( r 1,r 2,…,r i,…,r n ) Update log FE ValueTS Value Executed operation table (p 1,p 2,…p n,) operationid ts=(p 1,p 2,…,p i +1,…,p n ) logRecord =(i, ts, u.op, u.prev, u.id) ts r.u.prev ≤ valueTS merge(ValueTS, r.ts) apply(value.r.u.op) executed  r.u.id Replication Manager i

Gossip messages Goal : bring the states of replication managers up to date. Consists of : Replication timestamp Update log Upon receiving gossip Merge the arriving log with its own Apply any unexecuted stable updates Eliminate redundant log and executed operation table entries

Control the size of update log Timestamp table keeps recent timestamps from messages sent by all other replicas. A log record r can be removed from the log when r.ts r.i < timestamp_table[j] r.i, for all j

Control the size of executed operation table Each update carries an extra time field FE returns an ACK Contains FE ’ s clock time after receiving the response for an update from RM. RM inserts the received ACK to the log.

Control the size of executed operation table (con ’ t) A message m from FE is late if m.time + δ< replica ’ s clock time An update is discard if it is late An ACK is kept at least until it is late Remove an entry c in executed operation table when an ACK for c ’ s update is received all records for c ’ s update have been discarded.

Forced Update Use the primary to assign a global unique identifier. The primary carries out a two phase protocol for updates.

Two phase protocol Upon receiving an update, the primary sends it to all other replicas. Upon receiving responses from all most half of the backups, the primary commit the update by insert the record to its log. Backups know the commitment from gossip messages.

Fail Recovery New coordinator informs participants about the failure. Participants inform coordinator about most recent forced updates Coordinator assign UID with the largest it knows after the sub-majority of replicas has responded.

Immediate Update Primary use 3 phase protocol. Pre-prepare Prepare Commit

3 phase protocol FE Update log primary backup update Give me your log and timestamp logRecord Update id

Number of Messages for different operations Query : 2 Casual : 2 + (N-1)/K Forced : 2  N/2  + (N-1)/K Immediate : 2N +2(  N/2  -1)+(N-1)K N : the number of replicas K : the number of update/ack pairs in a gossip.

Capacity of a 3-replica system Excerpt from “ Providing high Availability Using Lazy Replication ” by Ladin, Liskov, Shrira and Ghemawat

Capacity of the Unreplicated System Excerpt from “ Providing high Availability Using Lazy Replication ” by Ladin, Liskov, Shrira and Ghemawat

Discussion No time guarantee for gossip messages Not generally suitable for real-time application such as realtime conference updating shared document. Scalability Timestamp space grows as number of replicas grow. can be increased by making most of the replicas read-only

Qustions?