Download presentation
Presentation is loading. Please wait.
1
OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman
2
OPODIS 05 Goals Reconfigurable Distributed Storage (RDS) Atomic consistency (read/write) Fault Tolerance …in Dynamic and Asynchronous Systems.
3
OPODIS 05 Distributed Storage
4
OPODIS 05 Distributed Storage Data is replicated at several network locations
5
OPODIS 05 Distributed Storage Write Read Operation policy
6
OPODIS 05 …in Dynamic Networks
7
OPODIS 05 Distributed Storage in Dynamic Networks
8
OPODIS 05 Distributed Storage in Dynamic Networks leaving nodes joining nodes
9
OPODIS 05 Distributed Storage in Dynamic Networks
10
OPODIS 05 Distributed Storage in Dynamic Networks …requires a reconfiguration process.
11
OPODIS 05 Distributed Storage in Dynamic Networks …by achieving agreement.
12
OPODIS 05 Model Distributed –Connected set of processors –Each processor has a unique id i I –MWMR, any processor is a potential client Asynchronous –Asynchronous processors –Point-to-point asynchronous unreliable channels Dynamic –Processors join and leave the system –Processors may crash
13
OPODIS 05 What is a configuration? Configuration –members is a set of processors, –read-quorums, write-quorums two sets of quorums – RQ read-quorums, WQ write-quorums RQ members WQ members RQ WQ (only for a given configuration) Every client maintains a set of configurations, initially containing the default one.
14
OPODIS 05 Single Object Operations Overview After [ABD95] tag = N I, val a possible value val = Read() i (,val)=query();[prop(,val);] Write(val) i (,val’)=query();prop(,val); 1.(tag,val) query(NULL) : gathers ( tag,val ) pairs of all processors of a RQ and returns the one with the largest tag. 2.NULL prop(tag,val) : updates ( tag,val ) pairs at all processors of a WQ. Write tag Read tag
15
OPODIS 05 Reconfiguration Design Goals Sound –Totally ordered configurations Flexible –No dependences between configurations Non-intrusive –Makes possible concurrent read/write operations Fast –Strengthening fault tolerance
16
OPODIS 05 Decoupling Reconfiguration Reconfiguration = Replacing Configurations –{I} Installing a new configuration –{R} Removing old configuration(s) If {R} ≺ {I} Operations are delayed If {I} ≺ {R} Stronger configuration viability assumption is required
17
OPODIS 05 Solution ({R} ≺ {I}) ({I} ≺ {R}) {I} // {R} Tighter coupling between removal and installation
18
OPODIS 05 RDS Reconfiguration Reconfiguration is based on Paxos (3 phases leader-based consensus alorithm) l is the leader c is the current configuration configs is the set of active configurations A ballot has a unique identifier b and a value v, which is a configuration Paxos phases: –Prepare: l creates a new ballot and chooses/gets the value to propose. –Propose: l proposes and gathers votes from a majority. –Propagate: l propagates decision
19
OPODIS 05 RDS Reconfiguration l RQWQ Recon(c,c’)
20
OPODIS 05 RDS Reconfiguration l RQWQ Prepare phase Recon(c,c’) Creates a new larger ballot b
21
OPODIS 05 RDS Reconfiguration l RQWQ Prepare phase Recon(c,c’)
22
OPODIS 05 RDS Reconfiguration l RQWQ > Updates its ballot’s value v with the one received Updates its configs set Prepare phase Recon(c,c’)
23
OPODIS 05 RDS Reconfiguration l RQWQ > Propose phase Recon(c,c’)
24
OPODIS 05 RDS Reconfiguration l RQWQ > Recon(c,c’) Propose phase Updates their tag and val Adds v to their configs set
25
OPODIS 05 RDS Reconfiguration l RQWQ > Recon(c,c’) Propagation phase Update their tag and val Remove configuration c from their configs set
26
OPODIS 05 Proving Atomicity Ordering configurations Ordering operations Theorem 1: The set of installed configurations in the system is totally ordered. Theorem 2: If operation 1 precedes operation 2 then 1 ’s tag is not larger than 2 ’s tag.
27
OPODIS 05 Additional Assumptions Eventual stabilization with –Unique leader l –Message delay bound d (unkown to the algorithm) –Gossip with frequency d –Restricted reconfiguration rate –Some quorums remain alive in active configurations tsts t s : System stabilization time Let’s t r be the Request time 2d t l : Algorithm stabilization time tltl
28
OPODIS 05 Reconfiguration Latency Worst case scenario: Last reconfiguration was done by a different leader. Prepare max(t l, t r ) ProposePropagate 2d d tete t e : end time Reconfiguration is complete 5d
29
OPODIS 05 Reconfiguration Latency Other cases: The leader made the previous reconfiguration. max(t l, t r ) ProposePropagate 2dd tete t e : end time Reconfiguration is complete 3d
30
OPODIS 05 Operation Latency Phase latency: 2d is sufficient for the phase round trip. In some cases (pending reconfiguration), the phase might be delayed twice. 1st round trip Operation latency: Operations are bounded by 8d. In some cases, the propagation phase of the read operation can be ignored, leading to a possible bound of 2d. 2nd round trip 2d New configuration discovered
31
OPODIS 05 Experimental Results IOA to Java code following set of rules. Implementation of Attiya, Bar-Noy, and Dolev algorithm « ABD » (w/o Reconfiguration) and RDS which shares parts of the ABD code. Using majority-based configurations. Measuring operation latency 1.While varying configuration size 2.While varying algorithm instances
32
OPODIS 05 Experimental Results Operation latency of RDS is competitive with ABD, confirming the theory. Reconfiguration messages contain operation information which might accelerate operations in RDS.
33
OPODIS 05 Conclusion RDS, Reconfigurable Distributed Storage. With sound, flexible, non-intrusive and fast reconfiguration. It solves two problems in one: Configuration replacement and Consensus. Reconfiguration is inexpensive (time). Fault tolerance is strenghtened. RAMBO can become more agressive: it is exactly what we did here!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.