Presentation is loading. Please wait.

Presentation is loading. Please wait.

Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.

Similar presentations


Presentation on theme: "Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea."— Presentation transcript:

1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea

2 What is PAST ? Archival storage and content distribution utility Not a general purpose file system Stores multiple replicas of files Caches additional copies of popular files in the local file system

3 How it works Built over a self-organizing, Internet-based overlay network Based on Pastry routing scheme Offers persistent storage services for replicated read-only files Owners can insert/reclaim files Clients just lookup

4 PAST Nodes The collection of PAST nodes form an overlay network Minimally, a PAST node is an access point Optionally, it contributes to storage and participate in the routing

5 PAST operations fileId = Insert(name, owner-credentials, k, file); file = Lookup(fileId); Reclaim(fileId, owner-credentials);

6 Insertion fileId computed as the secure hash of name, owner’s public key, salt Stores the file on the k nodes whose nodeIds are numerically closest to the 128 msb of fileId Remember from Pastry: each node has a 128-bit nodeId (circular namespace)

7 Insert contd The required storage is debited against the owner’s storage quota A file certificate is returned Signed with owner’s private key Contains: fileId, hash of content, replication factor + others The file & certificate are routed via Pastry Each node of the k replica storing nodes attach a store receipt Ack sent back after all k-nodes have accepted the file

8 Lookup & Reclaim Lookup: Pastry locates a “near” node that has a copy and retrieves it Reclaim: weak consistency After it, a lookup is no longer guaranteed to retrieve the file But, it does not guarantee that the file I no longer available

9 Security Each PAST node and each user of the system hold a smartcard Private/public key pair is associated with each card Smartcards generate and verify certificates and maintain storage quotas

10 More on Security Smartcards ensures integrity of nodeId and fileId assignments Store receipts prevent malicious nodes to create fewer than k copies File certificates allow storage nodes and clients to verify integrity and authenticity of stored content, or to enforce the storage quota

11 Storage Management Based on local coordination among nodes nearby with nearby nodeIds Responsibilities: Balance the free storage among nodes Maintain the invariant that replicas for each file are are stored on k nodes closest to its fileId

12 Causes for storage imbalance & solutions The number of files assigned to each node may vary The size of the inserted files may vary The storage capacity of PAST nodes differs Solutions Replica diversion File diversion

13 Replica diversion Recall: each node maintains a leaf set l nodes with nodeIds numerically closest to given node If a node A cannot accommodate a copy locally, it considers replica diversion A chooses B in its leaf set and asks it to store the replica Then, enters a pointer to B’s copy in its table and issues a store receipt

14 Policies for accepting a replica If (file size/remaining free storage) > t Reject t is a fixed threshold T has different values for primary replica ( nodes among k numerically closest ) and diverted replica ( nodes in the same leaf set, but not k closest ) t(primary) > t(diverted)

15 File diversion When one of the k nodes declines to store a replica  try replica diversion If the chosen node for diverted replica also declines  the entire file is diverted Negative ack is sent, the client will generate another fileId, and start again After 3 rejections the user is announced

16 Maintaining replicas Pastry uses keep-alive messages and it adjusts the leaf set after failures The same adjustment takes place at join What happens with the copies stored by a failed node ? How about the copies stored by a node that leaves or enters a new leaf set ?

17 Maintaining replicas contd To maintain the invariant ( k copies )  the replicas have to be re-created in the previous cases Big overhead Proposed solution for join: lazy re-creation First insert a pointer to the node that holds them, then migrate them gradually

18 Caching The k replicas are maintained in PAST for availability The fetch distance is measured in terms of overlay network hops ( which doesn’t mean anything for the real case ) Caching is used to improve performance

19 Caching contd PAST uses the “unused” portion of their advertised disk space to cache files When store a new primary or a diverted replica, a node evicts one or more cached copies How it works: a file that is routed through a node by Pastry ( insert or lookup ) is inserted into the local cache f its size < c c is a fraction of the current cache size

20 Conclusions Along with Tapestry, Chord(CFS), and CAN represent peer-to-peer routing and location schemes for storage The ideas are almost the same in all of them Questions raised at SOSP about them: Is there any real application for them ? Who will trust these infrastructures to store his/her files ?


Download ppt "Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea."

Similar presentations


Ads by Google