Replica Control for Peer-to- Peer Storage Systems.

Replica Control for Peer-to- Peer Storage Systems

P2P Peer-to-peer (P2P) has emerged as an important paradigm model for sharing resources at the edges of the Internet. the most widely exploited resource is storage, as typified in P2P music file sharing –Napster –Gnutella Following the great success of P2P file sharing, a natural next step is to develop wide-area, P2P storage systems to aggregate the storage across the Internet.

Replica Control Protocol Replication – maintain multiple copies of some critical data to increase the availability – to reduce read access times Replica Control Protocol – to avoid inconsistent updates – to guarantee a consistent view of the replicated data

Resiliency Requirement Need data replication –Even if some nodes fail, the computation can progress –Consistency requirement –Failures may partition the network –Rejoining need to use consistency control algorithms

One-copy equivalence consistency criteria The set of replicas must behave as if there is only a single copy. Conditions to ensure one-copy equivalence are –no two write operations can proceed at the same time –no a pair of a read operation and a write operation can proceed at the same time –a read operation always returns the replica that the last write operation writes

Replica Control Methods Optimistic –Proceed with computation on the available subgroup –Optimistic to join later with consistency Pessimistic –Restrict computations with worst-case assumptions –Approaches Primary site Voting

Optimistic Approach Version vector for file f –N element vector, where N is the number of nodes in which f is stores –The i th element represents the number of updates done by node I A vector V dominated V ’ if –Every element in V >= corresponding element in V ’ Conflicts if neither dominates

Optimistic (cont ’ d) Consistency resolution –If V dominates V ’, inconsistent; can be resolved by copying V to V ’ –If V and V ’ conflict, inconsistency cannot be resolved Version vector can resolve only update conflicts; cannot resolve read-write conflicts

Primary Site Approach Data replicated on at least k+1 nodes (for k-resilient) One node acts as the primary site (PS) –Any read request is served by the PS –Any write request is copied to all other back- up sites –Any write request to back-up sites are forwarded to the PS

PS Failure Handling If back-up fails, no interruption in service If PS fails, there are two possibilities –If the network not segmented Choose another node in the set as the primary If checkpointing has been active, need to restart only from the previous checkpoint –If segmented Only the partition with PS can progress Other partitions stops updates on data Necessary to distinguish between site failures and network partitions

Witnesses Witness - small entity that maintains enough information to identity the replicas that contain the most recent version of the data - this information could be a timestamp containing the time of the latest update - replaced by a version number, which is an integer incremented each time the data are updated

Voting Approach V votes are distributed to n replicas with –Vw+Vr > V –Vw+Vw > V Obtain Vr or more votes to read Obtain Vw or more votes to write Quorum system is more general than voting

Quorum Systems Trees Grid-based (array-based) Torus Hierarchical Multi-column and so on…

Classification of P2P Storage Sys. Unstructured –“Replication Strategies for Highly Available Peer-to- peer Storage” –“Replication Strategies in Unstructured Peer-to-peer Networks” Structured –CFS –PAST –LAR –Ivy –Oasis –Om –Eliot –Sigma (for mutual exclusion primitive) Read only Read/Write (Mutable)

Ivy Stores a set of logs with the aid of distributed hash tables. Ivy keeps, for each participant, a log storing all its updates, and maintains data consistency optimistically by performing conflict resolutions among all logs. (Maintain data consistency in a best-effort manner) The logs should be kept indefinitely and a participant must scan all the logs related to a file to look up the up-to-date file data. Thus, Ivy is only suitable for small groups of participants.

Eliot Eliot relies a reliable, fault-tolerant, immutable P2P storage substrate  Charles to store data blocks, and uses an auxiliary metadata service (MS) for storing mutable metadata. It supports NFS-like consistency semantics; however, the traffic between MS and the client is high for such semantics. It also supports AFS open-close consistency semantics; however, this semantics may cause the problem of lost updates. The MS service is provided by a conventional replicated database, which may be not fit for dynamic P2P environments.

Oasis Oasis is based on Gifford’s weighted voting quorum concept and allows dynamic quorum membership. It spreads versioned metadata along with data replicas over the P2P network. To complete an operation on a data object, a client must first find a metadata related to the object and figure out the total number of votes, required votes for read/write operations, replica list, and so on, to form a quorum accordingly. One drawback of Oasis is that if a node happens to use a stale metadata, the data consistency may be violated.

Om Om is based on the concepts of automatic replica regeneration and replica membership reconfiguration. The consistency is maintained by two quorum systems: a read-one-write-all quorum system for accessing replicas, and a witness-modeled quorum system for reconfiguration. Om allows replica regeneration from single replica. However, a write in Om is always first forwarded to the primary copy, which serializing all writes and uses a two- phase procedure to propagate the write to all secondary replicas. The drawbacks of Om are (1) the primary replica may become a bottleneck (2) the overhead incurred by the two-phase procedure may be too high (3) the reconfiguration by witness model has the probability of violating consistency.

Sigma The Sigma protocol intelligently collect states from all replicas to achieve mutual exclusion. The basic idea of the Sigma protocol is as follows. A node u wishing to be the winner of the mutual exclusion sends a timestamped request for each of the totally n (n=3k+1) replicas and waits for replies. On receiving a request from u, a node v should put u’s request in a local queue by the timestamp order, takes the node as the winner whose request is in the front of the queue, and reply the winner ID to u.

Sigma When the number of replies received by u exceeds m (m=2k+1), u acts according to the following conditions: (1) if more than m replies take v as the winner, then u is the winner. (2) if more than m replies take w (w  u) as the winner, then w is the winner and u just keeps waiting. (3) if no node is regarded as the winner by more than m replies, then u sends YIELD message to cancel its request temporarily and then re-inserts its request again. In this manner, one node can eventually be elected as the winner even when communication delay variance is large. A drawback of the Sigma protocol is that a node needs to send requests to all replicas and gets advantaged replies from a large portion (  2/3) of nodes to be the winner of the mutual exclusion, which will incur large overhead. Moreover, the overhead will even be larger under an environment of high contention.

MUREX comes to the rescue!

Replica Control for Peer-to- Peer Storage Systems.

Similar presentations

Presentation on theme: "Replica Control for Peer-to- Peer Storage Systems."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Replica Control for Peer-to- Peer Storage Systems.

Similar presentations

Presentation on theme: "Replica Control for Peer-to- Peer Storage Systems."— Presentation transcript:

Similar presentations

About project

Feedback