Flexible Update Propagation for Weakly Consistent Replication Karin Petersen, Mike K. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers.

Slides:



Advertisements
Similar presentations
Megastore: Providing Scalable, Highly Available Storage for Interactive Services. Presented by: Hanan Hamdan Supervised by: Dr. Amer Badarneh 1.
Advertisements

Eventual Consistency Jinyang. Sequential consistency Sequential consistency properties: –Latest read must see latest write Handles caching –All writes.
Concurrency Control Part 2 R&G - Chapter 17 The sequel was far better than the original! -- Nobody.
Kaushik Parasam 03 Nov, 2011 EPIDEMIC TECHNIQUES.
Feb 7, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements.
Efficient Solutions to the Replicated Log and Dictionary Problems
Presented by Vigneshwar Raghuram
CS 582 / CMPE 481 Distributed Systems
“Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System ” Distributed Systems Κωνσταντακοπούλου Τζένη.
Academic Advisor: Prof. Ronen Brafman Team Members: Ran Isenberg Mirit Markovich Noa Aharon Alon Furman.
Department of Electrical Engineering
Overview Distributed vs. decentralized Why distributed databases
Distributed Systems Fall 2011 Gossip and highly available services.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
G Robert Grimm New York University Bayou: A Weakly Connected Replicated Storage System.
Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li.
What Can Databases Do for Peer-to-Peer Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu Presented by: Ryan Huebsch CS294-4 P2P Systems.
Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Mobility Presented by: Mohamed Elhawary. Mobility Distributed file systems increase availability Remote failures may cause serious troubles Server replication.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
70-294: MCSE Guide to Microsoft Windows Server 2003 Active Directory, Enhanced Chapter 7: Active Directory Replication.
Distributed Deadlocks and Transaction Recovery.
Epidemic Algorithms for replicated Database maintenance Alan Demers et al Xerox Palo Alto Research Center, PODC 87 Presented by: Harshit Dokania.
Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall 9.1.
Practical Replication. Purposes of Replication Improve Availability Replicated databases can be accessed even if several replicas are unavailable Improve.
Mobility in Distributed Computing With Special Emphasis on Data Mobility.
Replication and Consistency. References r The Case for Non-transparent Replication: Examples from Bayou Douglas B. Terry, Karin Petersen, Mike J. Spreitzer,
Feb 7, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements.
Asynchronous Replication and Bayou. Asynchronous Replication client B Idea: build available/scalable information services with read-any-write-any replication.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
CS Storage Systems Lecture 14 Consistency and Availability Tradeoffs.
Feb 7, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements Tomorrow’s class is officially cancelled. If you need someone to go over the reference implementation.
Bayou. References r The Case for Non-transparent Replication: Examples from Bayou Douglas B. Terry, Karin Petersen, Mike J. Spreitzer, and Marvin M. Theimer.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
1 The Design of a Robust Peer-to-Peer System Rodrigo Rodrigues, Barbara Liskov, Liuba Shrira Presented by Yi Chen Some slides are borrowed from the authors’
ACT: Attachment Chain Tracing Scheme for Virus Detection and Control Jintao Xiong Proceedings of the 2004 ACM workshop on Rapid malcode Presented.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
D u k e S y s t e m s Asynchronous/Causal Replication in Bayou Jeff Chase Duke University.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Serverless Network File Systems Overview by Joseph Thompson.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall 2000.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared.
Feb 1, 2001CSCI {4,6}900: Ubiquitous Computing1 Eager Replication and mobile nodes Read on disconnected clients may give stale data Eager replication prohibits.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Asynchronous Replication client B Idea: build available/scalable information services with read-any-write-any replication and a weak consistency model.
3/6/99 1 Replication CSE Transaction Processing Philip A. Bernstein.
Bayou: Replication with Weak Inter-Node Connectivity Brad Karp UCL Computer Science CS GZ03 / th November, 2007.
Eventual Consistency Jinyang. Review: Sequential consistency Sequential consistency properties: –All read/write ops follow some total ordering –Read must.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CPS 512/590 final exam, 12/8/2015 /60 Your name please: /50 /50 /60
Nomadic File Systems Uri Moszkowicz 05/02/02.
Eventual Consistency: Bayou
Chapter 19: Distributed Databases
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Outline The Case for Non-transparent Replication: Examples from Bayou Douglas B. Terry, Karin Petersen, Mike J. Spreitzer, and Marvin M. Theimer. IEEE.
EEC 688/788 Secure and Dependable Computing
Transactions in Distributed Systems
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

Flexible Update Propagation for Weakly Consistent Replication Karin Petersen, Mike K. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers Presented by: Ryan Huebsch CS294-4 P2P Systems – 10/13/03

Outline Anti-Entropy Goals Data Structures Ordering The Algorithm Creation and Retirement Discussion Performance P2P discussion/questions

Anti-Entropy Entropy - a process of degradation or running down or a trend to disorder. Bring 2 replicas up-to-date Three Major Design Decisions Pairwise communication between replicas Exchange of update operations Ordered propagation of operations

Goals Support for arbitrary communication topologies Operation over low-bandwidth networks Incremental progress Eventual consistency Efficient storage management Light-weight management of dynamic replica sets Arbitrary policy choices

Data Structures Replica: Database Write Log Server: Clock V, O CSN, OSN … Database Log Truncated Log Truncated (< OSN) A Highest A.Clock for server A that is in log BC Committed (< CSN) A Highest A.Clock for server A that has been truncated BC OV

Orderings Prefix Property If R has write W i that was accepted by server X, it has all writes X accepted before W i Stable (Committed Order) Decided by primary replica Assigns the final CSN, which is < infinity New CSN is propagated to nodes Accept Order Partial order of all writes accepted by a particular server Accept stamp – logical or real-time clock

Orderings, continued Causal-Accept Order Accept-stamp is a logical clock Clock is advanced when a write is received (through anti-entropy) that has a higher accept- stamp. Provides better chances of a node seeing the same database from different servers If they have the same writes, even if uncommitted, will be same order

The Algorithm (Quick Version) R is being updated by S S retrieves R.V and R.CSN STEP 1: Decide if a full transfer is needed IF (S.OSN > R.CSN) THEN [If S does have enough log] Rollback S’s database to the state corresponding to S.O [Remove all writes that S has a log for] OutputDatabase(S.DB) OutputVector(S.O) OutputOSN(S.OSN) [R now has the same database and truncated the write log to the same point as S] END

The Algorithm, continued Step 2: Bring R up-to-date with remaining committed writes IF R.CSN < S.CSN THEN [If R is missing committed writes] w = first write after CSN WHILE (w) DO IF w.accept-stamp <= R.V(w.server-id) THEN [Check R’s vector to see if it has the write] OutputCommitNotification(w) ELSE OutputWrite(w) END w = next commited write in S.log END END

The Algorithm, continued Step 3: Bring R up-to-date with remaining uncommitted writes w = first tentative write in S.log WHILE (w) DO IF R.V(w.server-id) < w.accept-stamp THEN [Check R’s vector to see if has the write] OutputWrite(w) END w = next write in S.log END Step 4: Finish Up OutputCSN(S.CSN) OutputVector(S.V)

Creation and Retirement Treated just like a write (elegant) S i is trying to join via server S x S x creates a new write Si is server id, Si sets clock to T k,i + 1 Notice the new server id is globally unique, recursive, and could be long The write is propagated to other nodes through anti- entropy

Creation and Retirement, continued Server S is updating server R Server S.V has an entry for server S i ( ), while R does not. 2 Cases: R has not seen the creation of S i Then R.V(S k ) < T k,i S has not seen the retirement of S i Then R.V(S k ) >= T k,i Why? Creation/Deletion is recorded as a normal write, thus the prefix property will hold. Recursive naming helps too, if S k retired, can still trace back and decide the proper state. This is explained as the virtual CompleteV in the paper.

Discussion

Discussion, continued Most properties are not special in themselves, the combination is novel Different decisions are mostly independent Ideas can be applied to other systems (other than Bayou) Security Use certificates to insure user can make update Not much detail given Used later on as an excuse for high overheads Lots of policy decisions to be made When to reconcile, with whom, when to truncate log

Performance 1316 bytes of update overhead 520 bytes for certificate Network transfer most significant cost

Performance, continued Hard to know if the numbers are good, nothing to compare them to Would have been nice to see a larger deployment and measure propagation delay, consistency, etc.

P2P? Is Anti-Entropy applicable to P2P systems? Review the goals… arbitrary topology, low b/w, aggressive storage management… There is a centralized component (the serializer)… is this okay? Can it handle failures/churn? Security, what happens if there is a faulty node?