Download presentation
Presentation is loading. Please wait.
1
Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel (Rice Univ) Presented by: Rama Alebouyeh
2
Outline Goals PAST Security Storage management Caching Experimental results Notes
3
Goals Strong persistence by providing persistent storage for replicated read-only files High availability through replication and caching Scalability by obtaining high storage utilization via local cooperation Security by using smart cards and store receipts PAST is archival storage and content distribution utility PAST is not a replacement for traditional file systems but it assumes that traditional FSs could be used as local cache for PAST.
4
PAST overview PAST is built on PASTRY fileId=160 bits nodeId=128 bits fileId and nodeIds are uniformly distributed in their respective domains. fileId is computed as a secure hash (SHA-1) of the file’s name, the owner public key, and a salt. Stores the file on k PAST nodes with numerically closest nodeIds to the 128 msb of fileId.
5
PAST operations fileId=Insert (name, owner-credentials, k,file) k is user specified number of file replicas k replica is maintained over the life time of the file file= Lookup (fileId) Client must provide fileId Retrieve form live node closest to client Reclaim (fileId, owner-credentials) Does not guarantee deletion of all replicas Does not guarantee return from Lookup
6
PAST operations (2) Insert: File certificate is issued and signed by owner’s private key. File certificate contains fileId, SHA-1 of file content, k, salt, date, file meta data. File and its associate certificate will be routed to node with closest nodeId to 128 msb of fileId. On success, store receipt will be sent back to the client, other wise an error will be reported to the client
7
PAST operations (3) Lookup: Sends a request message with fileId as the destination As soon as request reaches a node with the file, node sends the file and its certificate and stop forwarding the request. Reclaim: Analogous to Insert Client issues a reclaim certificate
8
PAST Security PAST provides security by: Smart cards (node and user) File and reclaim certificates Store and reclaim receipts Randomized PASTRY routing scheme Routing table entries signed by associated nodes
9
Storage management The goal is to achieve high global storage utilization and graceful degradation as system reaches its maximum utilization. The Responsibilities of storage management are to: Balance the remaining free space among nodes as utilization approaches its maximum. Maintain the invariant that copies of each file are maintained by k nodes with the closest nodeId to the fileId It relies on local coordination of nodes
10
Replica diversion If a node A can not store a replica, it chooses node B in its leaf-set to divert the replica B shouldn’t be among the k closest node B shouldn’t already hold a directed replica A keeps a pointer to B in its table and issue a store receipt A also enters a pointer on the k+1th closest node C If B fails a replacement replica created If C fails, A installs another pointer on the current k+1 th node
11
File diversion goal is to balance the remaining free storage space among different portions of nodeId space When a client receives a NACK back in response of Insert operation Create another fileId with different salt Retry Insert operation Try three time
12
Storage management policy File acceptance policy: if S D / F N <= t S D size of file D F N node N free storage space T pri : k closest node to fileId T div : nodes that are not among k
13
Maintaining replicas Nodes are aware of their neighbors by PASTRY leaf-set periodically keep-alive messages When a node joins or gets back on-line it enters a pointer to replica of the file and gradually transfer files Nodes also exchange explicit keep alive messages with the node that holds their replica In high utilization nodes may ask their the two most distant nodes in their leaf-set to locate a node in their leaf-set that can store the file. In high utilization is possible that number of replicas goes below k
14
Caching Goal is to minimize client access latency, maximize query throughput, and balance the query load in the system Unused portion of advertised storage is used as cache Cache files can be evicted at any time Cache when a file is routed through a node as part of lookup or insert File size is smaller than a fraction (c) of the node’s current cache size Cache replacement policy is GreedyDual-Size (GD-S)
15
Experimental Results Two sets of data: a data set from 8 web proxy logs, another data set from file system K=5, b=4 (PASTRY), N=2250 First experiment with no diversion T pri =1, t div =0 51.1% of file insertions failed Global storage utilization only 60.8 % Results obviates the need for storage management in a system like PAST
16
Experimental Results T pri =0.1, T div =0.05, l=16 or 32 l=16 utilization > 94% l=32 utilization > 98% Larger leaf set increases the scope for load balancing Larger l increases cost of node arrivals/departures
17
Experimental Results Varying t pri lower the value of t pri less likely a large file can be stored on a node Many small files can be stored, therefore number of files stored increases as t pri decreases Utilization drops b/c large files are rejected at low utilization levels
18
Experimental Results Varying t div As t div is increased fewer successful files insertions but higher storage utilization
19
Impact of File and Replica Diversion File diversion negligible if storage utilization below 83 % Number of diverted replicas remain small even at high utilization 10 % at 80% util
20
Impact of caching
21
Notes Key lookup and directory search are needed Immutable file property and lack of directory search limit the applications of PAST File reclaim effect on performance is not measured
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.