Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable peer-to-peer substrates: A new foundation for distributed applications? Peter Druschel, Rice University Antony Rowstron, Microsoft Research Cambridge,

Similar presentations


Presentation on theme: "Scalable peer-to-peer substrates: A new foundation for distributed applications? Peter Druschel, Rice University Antony Rowstron, Microsoft Research Cambridge,"— Presentation transcript:

1 Scalable peer-to-peer substrates: A new foundation for distributed applications? Peter Druschel, Rice University Antony Rowstron, Microsoft Research Cambridge, UK Collaborators: Miguel Castro, Anne-Marie Kermarrec, MSR Cambridge Y. Charlie Hu, Sitaram Iyer, Animesh Nandi, Atul Singh, Dan Wallach, Rice University

2 Outline Background Pastry Pastry proximity routing PAST SCRIBE Conclusions

3 Background Peer-to-peer systems distribution decentralized control self-organization symmetry (communication, node roles)

4 Peer-to-peer applications Pioneers: Napster, Gnutella, FreeNet File sharing: CFS, PAST [SOSP’01] Network storage: FarSite [Sigmetrics’00], Oceanstore [ASPLOS’00], PAST [SOSP’01] Web caching: Squirrel[PODC’02] Event notification/multicast: Herald [HotOS’01], Bayeux [NOSDAV’01], CAN-multicast [NGC’01], SCRIBE [NGC’01], SplitStream [submitted] Anonymity: Crowds [CACM’99], Onion routing [JSAC’98] Censorship-resistance: Tangler [CCS’02]

5 Common issues Organize, maintain overlay network –node arrivals –node failures Resource allocation/load balancing Resource location Network proximity routing Idea: provide a generic p2p substrate

6 Architecture TCP/IP Pastry Network storage Event notification Internet P2p substrate (self-organizing overlay network) P2p application layer ?

7 Structured p2p overlays One primitive: route(M, X): route message M to the live node with nodeId closest to key X nodeIds and keys are from a large, sparse id space

8 Distributed Hash Tables (DHT) k6,v6 k1,v1 k5,v5 k2,v2 k4,v4 k3,v3 nodes Operations: insert(k,v) lookup(k) P2P overlay network p2p overlay maps keys to nodes completely decentralized and self-organizing robust, scalable

9 Why structured p2p overlays? Leverage pooled resources (storage, bandwidth, CPU) Leverage resource diversity (geographic, ownership) Leverage existing shared infrastructure Scalability Robustness Self-organization

10 Outline Background Pastry Pastry proximity routing PAST SCRIBE Conclusions

11 Pastry Generic p2p location and routing substrate Self-organizing overlay network Lookup/insert object in < log 16 N routing steps (expected) O(log N) per-node state Network proximity routing

12 Pastry: Related work Chord [Sigcomm’01] CAN [Sigcomm’01] Tapestry [TR UCB/CSD-01-1141] PNRP [unpub.] Viceroy [PODC’02] Kademlia [IPTPS’02] Small World [Kleinberg ’99, ‘00] Plaxton Trees [Plaxton et al. ’97]

13 Pastry: Object distribution objId Consistent hashing [Karger et al. ‘97] 128 bit circular id space nodeIds (uniform random) objIds (uniform random) Invariant: node with numerically closest nodeId maintains object nodeIds O 2 128 -1

14 Pastry: Object insertion/lookup X Route(X) Msg with key X is routed to live node with nodeId closest to X Problem: complete routing table not feasible O 2 128 -1

15 Pastry: Routing Tradeoff O(log N) routing table size O(log N) message forwarding steps

16 Pastry: Routing table (# 65a1fcx) log 16 N rows Row 0 Row 1 Row 2 Row 3

17 Pastry: Routing Properties log 16 N steps O(log N) state d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1

18 Pastry: Leaf sets Each node maintains IP addresses of the nodes with the L/2 numerically closest larger and smaller nodeIds, respectively. routing efficiency/robustness fault detection (keep-alive) application-specific local coordination

19 Pastry: Routing procedure if (destination is within range of our leaf set) forward to numerically closest member else let l = length of shared prefix let d = value of l-th digit in D’s address if ( R l d exists) forward to R l d else forward to a known node that (a) shares at least as long a prefix (b) is numerically closer than this node

20 Pastry: Performance Integrity of overlay/ message delivery: guaranteed unless L/2 simultaneous failures of nodes with adjacent nodeIds Number of routing hops: No failures: < log 16 N expected, 128/b + 1 max During failure recovery: –O(N) worst case, average case much better

21 Pastry: Self-organization Initializing and maintaining routing tables and leaf sets Node addition Node departure (failure)

22 Pastry: Node addition d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1 New node: d46a1c

23 Node departure (failure) Leaf set members exchange keep-alive messages Leaf set repair (eager): request set from farthest live node in set Routing table repair (lazy): get table from peers in the same row, then higher rows

24 Pastry: Experimental results Prototype implemented in Java emulated network deployed testbed (currently ~25 sites worldwide)

25 Pastry: Average # of hops L=16, 100k random queries

26 Pastry: # of hops (100k nodes) L=16, 100k random queries

27 Pastry: # routing hops (failures) L=16, 100k random queries, 5k nodes, 500 failures 2.73 2.96 2.74 2.6 2.65 2.7 2.75 2.8 2.85 2.9 2.95 3 No FailureFailureAfter routing table repair Average hops per lookup

28 Outline Background Pastry Pastry proximity routing PAST SCRIBE Conclusions

29 Pastry: Proximity routing Assumption: scalar proximity metric e.g. ping delay, # IP hops a node can probe distance to any other node Proximity invariant: Each routing table entry refers to a node close to the local node (in the proximity space), among all nodes with the appropriate nodeId prefix. Locality-related route qualities: Distance traveled Likelihood of locating the nearest replica

30 Pastry: Routes in proximity space d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1 NodeId space d467c4 65a1fc d13da3 d4213f d462ba Proximity space

31 Pastry: Distance traveled L=16, 100k random queries, Euclidean proximity space

32 Pastry: Locality properties 1) Expected distance traveled by a message in the proximity space is within a small constant of the minimum 2) Routes of messages sent by nearby nodes with same keys converge at a node near the source nodes 3) Among k nodes with nodeIds closest to the key, message likely to reach the node closest to the source node first

33 d467c4 65a1fc d13da3 d4213f d462ba Proximity space Pastry: Node addition New node: d46a1c d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1 NodeId space

34 Pastry delay vs IP delay GATech top.,.5M hosts, 60K nodes, 20K random messages

35 Pastry: API route(M, X): route message M to node with nodeId numerically closest to X deliver(M): deliver message M to application forwarding(M, X): message M is being forwarded towards key X newLeaf(L): report change in leaf set L to application

36 Pastry: Security Secure nodeId assignment Secure node join protocols Randomized routing Byzantine fault-tolerant leaf set membership protocol

37 Pastry: Summary Generic p2p overlay network Scalable, fault resilient, self-organizing, secure O(log N) routing steps (expected) O(log N) routing table size Network proximity routing

38 Outline Background Pastry Pastry proximity routing PAST SCRIBE Conclusions

39 PAST: Cooperative, archival file storage and distribution Layered on top of Pastry Strong persistence High availability Scalability Reduced cost (no backup) Efficient use of pooled resources

40 PAST API Insert - store replica of a file at k diverse storage nodes Lookup - retrieve file from a nearby live storage node that holds a copy Reclaim - free storage associated with a file Files are immutable

41 PAST: File storage fileId Insert fileId

42 PAST: File storage Storage Invariant: File “replicas” are stored on k nodes with nodeIds closest to fileId (k is bounded by the leaf set size) fileId Insert fileId k=4

43 PAST: File Retrieval fileId file located in log 16 N steps (expected) usually locates replica nearest client C Lookup k replicas C

44 PAST: Exploiting Pastry Random, uniformly distributed nodeIds –replicas stored on diverse nodes Uniformly distributed fileIds –e.g. SHA-1(filename,public key, salt) –approximate load balance Pastry routes to closest live nodeId –availability, fault-tolerance

45 PAST: Storage management Maintain storage invariant Balance free space when global utilization is high –statistical variation in assignment of files to nodes (fileId/nodeId) –file size variations –node storage capacity variations Local coordination only (leaf sets)

46 Experimental setup Web proxy traces from NLANR –18.7 Gbytes, 10.5K mean, 1.4K median, 0 min, 138MB max Filesystem –166.6 Gbytes. 88K mean, 4.5K median, 0 min, 2.7 GB max 2250 PAST nodes (k = 5) –truncated normal distributions of node storage sizes, mean = 27/270 MB

47 Need for storage management No diversion (t pri = 1, t div = 0): –max utilization 60.8% –51.1% inserts failed Replica/file diversion (t pri =.1, t div =.05): –max utilization > 98% –< 1% inserts failed

48 PAST: File insertion failures

49 PAST: Caching Nodes cache files in the unused portion of their allocated disk space Files caches on nodes along the route of lookup and insert messages Goals: maximize query xput for popular documents balance query load improve client latency

50 PAST: Caching fileId Lookup topicId

51 PAST: Caching

52 PAST: Security No read access control; users may encrypt content for privacy File authenticity: file certificates System integrity: nodeIds, fileIds non- forgeable, sensitive messages signed Routing randomized

53 PAST: Storage quotas Balance storage supply and demand user holds smartcard issued by brokers –hides user private key, usage quota –debits quota upon issuing file certificate storage nodes hold smartcards –advertise supply quota –storage nodes subject to random audits within leaf sets

54 PAST: Related Work CFS [SOSP’01] OceanStore [ASPLOS 2000] FarSite [Sigmetrics 2000]

55 Outline Background Pastry Pastry locality properties PAST SCRIBE Conclusions

56 SCRIBE: Large-scale, decentralized multicast Infrastructure to support topic-based publish-subscribe applications Scalable: large numbers of topics, subscribers, wide range of subscribers/topic Efficient: low delay, low link stress, low node overhead

57 SCRIBE: Large scale multicast topicId Subscribe topicId Publish topicId

58 Scribe: Results Simulation results Comparison with IP multicast: delay, node stress and link stress Experimental setup –Georgia Tech Transit-Stub model –100,000 nodes randomly selected out of.5M –Zipf-like subscription distribution, 1500 topics

59 Scribe: Topic popularity gsize(r) = floor(Nr -1.25 + 0.5); N=100,000; 1500 topics

60 Scribe: Delay penalty Relative delay penalty, average and maximum

61 Scribe: Node stress

62 Scribe: Link stress One message published in each of the 1,500 topics

63 Related works Narada Bayeux/Tapestry CAN-Multicast

64 Summary Self-configuring P2P framework for topic- based publish-subscribe Scribe achieves reasonable performance when compared to IP multicast –Scales to a large number of subscribers –Scales to a large number of topics –Good distribution of load

65 Status Functional prototypes Pastry [Middleware 2001] PAST [HotOS-VIII, SOSP’01] SCRIBE [NGC 2001, IEEE JSAC] SplitStream [submitted] Squirrel [PODC’02] http://www.cs.rice.edu/CS/Systems/Pastry

66 Current Work Security –secure routing/overlay maintenance/nodeId assignment –quota system Keyword search capabilities Support for mutable files in PAST Anonymity/Anti-censorship New applications Free software releases

67 Conclusion For more information http://www.cs.rice.edu/CS/Systems/Pastry


Download ppt "Scalable peer-to-peer substrates: A new foundation for distributed applications? Peter Druschel, Rice University Antony Rowstron, Microsoft Research Cambridge,"

Similar presentations


Ads by Google