Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems Peter Druschel, Rice University Antony Rowstron, Microsoft.

Similar presentations


Presentation on theme: "Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems Peter Druschel, Rice University Antony Rowstron, Microsoft."— Presentation transcript:

1 Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems Peter Druschel, Rice University Antony Rowstron, Microsoft Research Cambridge, UK Modified and Presented by: JinYoung You 1

2 Outline Background Pastry Pastry proximity routing Related Work Conclusions 2

3 Background Peer-to-peer systems distribution decentralized control self-organization symmetry (communication, node roles) 3

4 Common issues Organize, maintain overlay network – node arrivals – node failures Resource allocation/load balancing Resource location Network proximity routing IDea: provide a generic p2p substrate 4

5 Architecture TCP/IP Pastry Network storage Event notification Internet p2p substrate (self-organizing overlay network) p2p application layer ? 5

6 Structured p2p overlays One primitive: route(M, X): route message M to the live node with nodeID closest to key X nodeIDs and keys are from a large, sparse id space 6

7 Distributed Hash Tables (DHT) k6,v6 k1,v1 k5,v5 k2,v2 k4,v4 k3,v3 nodes Operations: insert(k,v) lookup(k) P2P overlay network p2p overlay maps keys to nodes completely decentralized and self-organizing robust, scalable 7

8 Why structured p2p overlays? Leverage pooled resources (storage, bandwidth, CPU) Leverage resource diversity (geographic, ownership) Leverage existing shared infrastructure Scalability Robustness Self-organization 8

9 Outline Background Pastry Pastry proximity routing Related Work Conclusions 9

10 Pastry Generic p2p location and routing substrate Self-organizing overlay network Lookup/insert object in < log 16 N routing steps (expected) O(log N) per-node state Network proximity routing 10

11 Pastry: Object distribution objID Consistent hashing [Karger et al. ‘97] 128 bit circular id space nodeIDs (uniform random) objIDs (uniform random) Invariant: node with numerically closest nodeID maintains object nodeIDs O 2 128 -1 11

12 Pastry: Object insertion/lookup X Route(X) Msg with key X is routed to live node with nodeID closest to X Problem: complete routing table not feasible O 2 128 -1 12

13 Pastry: Routing Tradeoff O(log N) routing table size O(log N) message forwarding steps 13

14 Pastry: Routing table (# 65a1fcx) log 16 N rows Row 0 Row 1 Row 2 Row 3 14

15 Pastry: Routing Properties log 16 N steps O(log N) state d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1 15

16 Pastry: Leaf sets Each node maintains IP addresses of the nodes with the L/2 numerically closest larger and smaller nodeIDs, respectively. routing efficiency/robustness fault detection (keep-alive) application-specific local coordination 16

17 Pastry: Routing procedure if (destination is within range of our leaf set) forward to numerically closest member else let l = length of shared prefix let d = value of l-th digit in D’s address if ( R l d exists) forward to R l d else forward to a known node that (a) shares at least as long a prefix (b) is numerically closer than this node 17

18 Pastry: Performance Integrity of overlay/ message delivery: guaranteed unless L/2 simultaneous failures of nodes with adjacent nodeIDs Number of routing hops: No failures: < log 16 N expected, 128/b + 1 max During failure recovery: – O(N) worst case, average case much better 18

19 Pastry: Self-organization Initializing and maintaining routing tables and leaf sets Node addition Node departure (failure) 19

20 Pastry: Node addition d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1 New node: d46a1c 20

21 Node departure (failure) Leaf set members exchange keep-alive messages Leaf set repair (eager): request set from farthest live node in set Routing table repair (lazy): get table from peers in the same row, then higher rows 21

22 Pastry: Experimental results Prototype implemented in Java emulated network deployed testbed (currently ~25 sites worldwide) 22

23 Pastry: Average # of hops L=16, 100k random queries 23

24 Pastry: # of hops (100k nodes) L=16, 100k random queries 24

25 Pastry: # routing hops (failures) L=16, 100k random queries, 5k nodes, 500 failures 2.73 2.96 2.74 2.6 2.65 2.7 2.75 2.8 2.85 2.9 2.95 3 No FailureFailureAfter routing table repair Average hops per lookup 25

26 Outline Background Pastry Pastry proximity routing Related Work Conclusions 26

27 Pastry: Proximity routing Assumption: scalar proximity metric e.g. ping delay, # IP hops a node can probe distance to any other node Proximity invariant: Each routing table entry refers to a node close to the local node (in the proximity space), among all nodes with the appropriate nodeID prefix. 27

28 Pastry: Routes in proximity space d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1 NodeID space d467c4 65a1fc d13da3 d4213f d462ba Proximity space 28

29 Pastry: Distance traveled L=16, 100k random queries, Euclidean proximity space 29

30 Pastry: Locality properties 1)Expected distance traveled by a message in the proximity space is within a small constant of the minimum 2)Routes of messages sent by nearby nodes with same keys converge at a node near the source nodes 3) Among k nodes with nodeIDs closest to the key, message likely to reach the node closest to the source node first 30

31 d467c4 65a1fc d13da3 d4213f d462ba Proximity space Pastry: Node addition New node: d46a1c d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1 NodeID space 31

32 Pastry delay GATech top.,.5M hosts, 60K nodes, 20K random messages 32

33 Pastry: API route(M, X): route message M to node with nodeID numerically closest to X deliver(M): deliver message M to application forwarding(M, X): message M is being forwarded towards key X newLeaf(L): report change in leaf set L to application 33

34 Pastry: Security Secure nodeID assignment Secure node join protocols Randomized routing Byzantine fault-tolerant leaf set membership protocol 34

35 Outline Background Pastry Pastry proximity routing Related Work Conclusions 35

36 Pastry: Related work Chord [Sigcomm’01] CAN [Sigcomm’01] Tapestry [TR UCB/CSD-01-1141] PAST [SOSP’01] SCRIBE [NGC’01] 36

37 Outline Background Pastry Pastry proximity routing Related Work Conclusions 37

38 Conclusions Generic p2p overlay network Scalable, fault resilient, self-organizing, secure O(log N) routing steps (expected) O(log N) routing table size Network proximity routing For more information http://www.cs.rice.edu/CS/Systems/Pastry 38

39 Thanks Any Questions? 39

40 40

41 41

42 Outline PAST SCRIBE 42

43 PAST: Cooperative, archival file storage and distribution Layered on top of Pastry Strong persistence High availability Scalability Reduced cost (no backup) Efficient use of pooled resources 43

44 PAST API Insert - store replica of a file at k diverse storage nodes Lookup - retrieve file from a nearby live storage node that holds a copy Reclaim - free storage associated with a file Files are immutable 44

45 PAST: File storage fileID Insert fileID 45

46 PAST: File storage Storage Invariant: File “replicas” are stored on k nodes with nodeIDs closest to fileID (k is bounded by the leaf set size) fileID Insert fileID k=4 46

47 PAST: File Retrieval fileID file located in log 16 N steps (expected) usually locates replica nearest client C Lookup k replicas C 47

48 PAST: Exploiting Pastry Random, uniformly distributed nodeIDs – replicas stored on diverse nodes Uniformly distributed fileIDs – e.g. SHA-1(filename,public key, salt) – approximate load balance Pastry routes to closest live nodeID – availability, fault-tolerance 48

49 PAST: Storage management Maintain storage invariant Balance free space when global utilization is high – statistical variation in assignment of files to nodes (fileID/nodeID) – file size variations – node storage capacity variations Local coordination only (leaf sets) 49

50 Experimental setup Web proxy traces from NLANR – 18.7 Gbytes, 10.5K mean, 1.4K median, 0 min, 138MB max Filesystem – 166.6 Gbytes. 88K mean, 4.5K median, 0 min, 2.7 GB max 2250 PAST nodes (k = 5) – truncated normal distributions of node storage sizes, mean = 27/270 MB 50

51 Need for storage management No diversion (t pri = 1, t div = 0): – max utilization 60.8% – 51.1% inserts failed Replica/file diversion (t pri =.1, t div =.05): – max utilization > 98% – < 1% inserts failed 51

52 PAST: File insertion failures 52

53 PAST: Caching Nodes cache files in the unused portion of their allocated disk space Files caches on nodes along the route of lookup and insert messages Goals: maximize query xput for popular documents balance query load improve client latency 53

54 PAST: Caching fileID Lookup topicID 54

55 PAST: Caching 55

56 PAST: Security No read access control; users may encrypt content for privacy File authenticity: file certificates System integrity: nodeIDs, fileIDs non- forgeable, sensitive messages signed Routing randomized 56

57 PAST: Storage quotas Balance storage supply and demand user holds smartcard issued by brokers – hides user private key, usage quota – debits quota upon issuing file certificate storage nodes hold smartcards – advertise supply quota – storage nodes subject to random audits within leaf sets 57

58 PAST: Related Work CFS [SOSP’01] OceanStore [ASPLOS 2000] FarSite [Sigmetrics 2000] 58

59 Outline PAST SCRIBE 59

60 SCRIBE: Large-scale, decentralized multicast Infrastructure to support topic-based publish-subscribe applications Scalable: large numbers of topics, subscribers, wide range of subscribers/topic Efficient: low delay, low link stress, low node overhead 60

61 SCRIBE: Large scale multicast topicID Subscribe topicID Publish topicID 61

62 Scribe: Results Simulation results Comparison with IP multicast: delay, node stress and link stress Experimental setup – Georgia Tech Transit-Stub model – 100,000 nodes randomly selected out of.5M – Zipf-like subscription distribution, 1500 topics 62

63 Scribe: Topic popularity gsize(r) = floor(Nr -1.25 + 0.5); N=100,000; 1500 topics 63

64 Scribe: Delay penalty Relative delay penalty, average and maximum 64

65 Scribe: Node stress 65

66 Scribe: Link stress One message published in each of the 1,500 topics 66

67 Scribe: Summary Self-configuring P2P framework for topic- based publish-subscribe Scribe achieves reasonable performance when compared to IP multicast – Scales to a large number of subscribers – Scales to a large number of topics – Good distribution of load For more information http://www.cs.rice.edu/CS/Systems/Pastry 67

68 Status Functional prototypes Pastry [Middleware 2001] PAST [HotOS-VIII, SOSP’01] SCRIBE [NGC 2001, IEEE JSAC] SplitStream [submitted] Squirrel [PODC’02] http://www.cs.rice.edu/CS/Systems/Pastry 68

69 Current Work Security – secure routing/overlay maintenance/nodeID assignment – quota system Keyword search capabilities Support for mutable files in PAST Anonymity/Anti-censorship New applications Free software releases 69


Download ppt "Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems Peter Druschel, Rice University Antony Rowstron, Microsoft."

Similar presentations


Ads by Google