Flexible Update Propagation for Weakly Consistent Replication Karin Petersen, Mike K. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers.

Flexible Update Propagation for Weakly Consistent Replication Karin Petersen, Mike K. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers Presented by: Ryan Huebsch CS294-4 P2P Systems – 10/13/03

Outline Anti-Entropy Goals Data Structures Ordering The Algorithm Creation and Retirement Discussion Performance P2P discussion/questions

Anti-Entropy Entropy - a process of degradation or running down or a trend to disorder. Bring 2 replicas up-to-date Three Major Design Decisions Pairwise communication between replicas Exchange of update operations Ordered propagation of operations

Goals Support for arbitrary communication topologies Operation over low-bandwidth networks Incremental progress Eventual consistency Efficient storage management Light-weight management of dynamic replica sets Arbitrary policy choices

Data Structures Replica: Database Write Log Server: Clock V, O CSN, OSN … Database Log Truncated Log Truncated (< OSN) A Highest A.Clock for server A that is in log BC Committed (< CSN) A Highest A.Clock for server A that has been truncated BC OV

Orderings Prefix Property If R has write W i that was accepted by server X, it has all writes X accepted before W i Stable (Committed Order) Decided by primary replica Assigns the final CSN, which is < infinity New CSN is propagated to nodes Accept Order Partial order of all writes accepted by a particular server Accept stamp – logical or real-time clock

Orderings, continued Causal-Accept Order Accept-stamp is a logical clock Clock is advanced when a write is received (through anti-entropy) that has a higher accept- stamp. Provides better chances of a node seeing the same database from different servers If they have the same writes, even if uncommitted, will be same order

The Algorithm (Quick Version) R is being updated by S S retrieves R.V and R.CSN STEP 1: Decide if a full transfer is needed IF (S.OSN > R.CSN) THEN [If S does have enough log] Rollback S’s database to the state corresponding to S.O [Remove all writes that S has a log for] OutputDatabase(S.DB) OutputVector(S.O) OutputOSN(S.OSN) [R now has the same database and truncated the write log to the same point as S] END

The Algorithm, continued Step 2: Bring R up-to-date with remaining committed writes IF R.CSN < S.CSN THEN [If R is missing committed writes] w = first write after CSN WHILE (w) DO IF w.accept-stamp <= R.V(w.server-id) THEN [Check R’s vector to see if it has the write] OutputCommitNotification(w) ELSE OutputWrite(w) END w = next commited write in S.log END END

The Algorithm, continued Step 3: Bring R up-to-date with remaining uncommitted writes w = first tentative write in S.log WHILE (w) DO IF R.V(w.server-id) < w.accept-stamp THEN [Check R’s vector to see if has the write] OutputWrite(w) END w = next write in S.log END Step 4: Finish Up OutputCSN(S.CSN) OutputVector(S.V)

Creation and Retirement Treated just like a write (elegant) S i is trying to join via server S x S x creates a new write Si is server id, Si sets clock to T k,i + 1 Notice the new server id is globally unique, recursive, and could be long The write is propagated to other nodes through anti- entropy

Creation and Retirement, continued Server S is updating server R Server S.V has an entry for server S i ( ), while R does not. 2 Cases: R has not seen the creation of S i Then R.V(S k ) < T k,i S has not seen the retirement of S i Then R.V(S k ) >= T k,i Why? Creation/Deletion is recorded as a normal write, thus the prefix property will hold. Recursive naming helps too, if S k retired, can still trace back and decide the proper state. This is explained as the virtual CompleteV in the paper.

Discussion

Discussion, continued Most properties are not special in themselves, the combination is novel Different decisions are mostly independent Ideas can be applied to other systems (other than Bayou) Security Use certificates to insure user can make update Not much detail given Used later on as an excuse for high overheads Lots of policy decisions to be made When to reconcile, with whom, when to truncate log

Performance 1316 bytes of update overhead 520 bytes for certificate Network transfer most significant cost

Performance, continued Hard to know if the numbers are good, nothing to compare them to Would have been nice to see a larger deployment and measure propagation delay, consistency, etc.

P2P? Is Anti-Entropy applicable to P2P systems? Review the goals… arbitrary topology, low b/w, aggressive storage management… There is a centralized component (the serializer)… is this okay? Can it handle failures/churn? Security, what happens if there is a faulty node?

Flexible Update Propagation for Weakly Consistent Replication Karin Petersen, Mike K. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers.

Similar presentations

Presentation on theme: "Flexible Update Propagation for Weakly Consistent Replication Karin Petersen, Mike K. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Flexible Update Propagation for Weakly Consistent Replication Karin Petersen, Mike K. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers.

Similar presentations

Presentation on theme: "Flexible Update Propagation for Weakly Consistent Replication Karin Petersen, Mike K. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers."— Presentation transcript:

Similar presentations

About project

Feedback