Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A Measurement Study of Peer-to-Peer File Sharing Systems by Stefan Saroiu P. Krishna Gummadi Steven D. Gribble Presentation by Nanda Kishore Lella

Similar presentations


Presentation on theme: "1 A Measurement Study of Peer-to-Peer File Sharing Systems by Stefan Saroiu P. Krishna Gummadi Steven D. Gribble Presentation by Nanda Kishore Lella"— Presentation transcript:

1 1 A Measurement Study of Peer-to-Peer File Sharing Systems by Stefan Saroiu P. Krishna Gummadi Steven D. Gribble Presentation by Nanda Kishore Lella Lella.2@wright.edu

2 2 Outline P2P Overview –What is a peer? –Example applications –Benefits of P2P P2P Content Sharing –Challenges –Group management/data placement approaches –Measurement studies Conclusion

3 3 What is Peer-to-Peer (P2P)? Most people think of P2P as music sharing Examples: Napster Gnutella

4 4 What is a peer? Contrasted with Client-Server model Servers are centrally maintained and administered Client has fewer resources than a server

5 5 What is a peer? A peer’s resources are similar to the resources of the other participants P2P – peers communicating directly with other peers and sharing resources

6 6 P2P Application Taxonomy P2P Systems Distributed ComputingFile SharingCollaboration Platforms JXTA

7 7 P2P Goals/Benefits Cost sharing Resource aggregation Improved scalability/reliability Increased autonomy Anonymity/privacy Dynamism Ad-hoc communication

8 8 P2P File Sharing Content exchange –Gnutella File systems –Oceanstore Filtering/mining –Opencola

9 9 Research Areas Peer discovery and group management Data location and placement Reliable and efficient file exchange Security/privacy/anonymity/trust

10 10 Current Research Group management and data placement –Chord, CAN, Tapestry, Pastry Anonymity –Publius Performance studies –Gnutella measurement study etc.

11 11 Management/Placement Challenges Per-node state Bandwidth usage Search time Fault tolerance/resiliency

12 12 Approaches Centralized Flooding Document Routing

13 13 Centralized Napster model Benefits: –Efficient search –Limited bandwidth usage –Efficient network handling Drawbacks: –Central point of failure –Limited scale BobAlice JaneJudy

14 14 Flooding Gnutella model Benefits: –No central point of failure –Limited per-node state Drawbacks: –Slow searches –Bandwidth intensive Bob Alice Jane Judy Carl

15 15 Document Routing FreeNet, Chord, CAN, Tapestry, Pastry model Benefits: –More efficient searching –Limited per-node state Drawbacks: –Limited fault-tolerance vs redundancy 001 012 212 305 332 212 ?

16 16 Document Routing – CAN Associate to each node and item a unique id in an d-dimensional space Goals –Scales to hundreds of thousands of nodes –Handles rapid arrival and failure of nodes Properties –Routing table size O(d) –Guarantees that a file is found in at most d*n 1/d steps, where n is the total number of nodes Slide modified from another presentation

17 17 CAN Example: Two Dimensional Space Space divided between nodes All nodes cover the entire space Each node covers either a square or a rectangular area Example: –Node n1:(1, 2) first node that joins  cover the entire space 1 234 5 670 1 2 3 4 5 6 7 0 n1 Slide modified from another presentation

18 18 CAN Example: Two Dimensional Space Node n2:(4, 2) joins  space is divided between n1 and n2 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 Slide modified from another presentation

19 19 CAN Example: Two Dimensional Space Node n2:(4, 2) joins  space is divided between n1 and n2 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 Slide modified from another presentation

20 20 CAN Example: Two Dimensional Space Nodes n4:(5, 5) and n5:(6,6) join 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 Slide modified from another presentation

21 21 CAN Example: Two Dimensional Space Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4 Slide modified from another presentation

22 22 CAN Example: Two Dimensional Space Each item is stored by the node who owns its mapping in the space 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4 Slide modified from another presentation

23 23 CAN: Query Example Each node knows its neighbors in the d-space Forward query to the neighbor that is closest to the query id Can route around some failures –some failures require local flooding Example: assume n1 queries f4 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4 Slide modified from another presentation

24 24 CAN: Query Example 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4 Slide modified from another presentation

25 25 CAN: Query Example 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4 Slide modified from another presentation

26 26 CAN: Query Example 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4 Slide modified from another presentation

27 27 Node Failure Recovery Simple failures –know your neighbor’s neighbors –when a node fails, one of its neighbors takes over its zone More complex failure modes –simultaneous failure of multiple adjacent nodes –scoped flooding to discover neighbors –hopefully, a rare event Slide modified from another presentation

28 28 Document Routing – Chord MIT project Uni-dimensional ID space Keep track of log N nodes Search through log N nodes to find desired key N32 N10 N5 N20 N110 N99 N80 N60 K19

29 29 Document Routing – Chord(2) N32 N10 N5 N20 N110 N99 N80 N60 K19 Each node and key is assigned an id. If a node needs a key, searches in its table of n nodes for the key. If fails, goes to the last node of its table and repeats until it finds the key. Search through log N nodes to find desired key

30 30 Doc Routing – Tapestry/Pastry Global mesh of meshes Suffix-based routing Uses underlying network distance in constructing mesh 13FE ABFE 1290 239E 73FE 9990 F990 993E 04FE 43FE

31 31 Naming in Tapestry 13FE ABFE 1290 239E 73FE 9990 F990 993E 04FE 43FE Every node has a 4 bit name similar to IP address Each bit in the name can hold 16 types Keys present at the node are in accordance with the node name.

32 32 Tapestry Routing 6789 B4F8 9098 7598 4598 Msg to 4598 B437

33 33 Remaining Problems? Hard to handle highly dynamic environments Methods don’t consider peer characteristics

34 34 Measurement Studies Gnutella vs. Napster

35 35 SS SS napster.com P P P P P P Q R D P P P P P P P Q Q Q Q D Q R P S peer server Q R D response query file download NapsterGnutella R

36 36 Methodology 2 stages: 1.periodically crawl Gnutella/Napster discover peers and their metadata 2.feed output from crawl into measurement tools: bottleneck bandwidth – SProbe latency – SProbe peer availability – LF degree of content sharing – Napster crawler

37 37 Crawling May 2001 Napster crawl –query index server and keep track of results –query about returned peers –don’t capture users sharing unpopular content Gnutella crawl –send out ping messages with large TTL

38 38

39 39 Measurement Study How many peers are server-like…client- like? Bandwidth, latency Connectivity Who is sharing what?

40 40

41 41 Graph results CDF: cumulative distribution function From this graph, we see that while 78% of the participating peers have downstream bottleneck bandwidths of at least 1000Kbps Only 8% of the peers have upstream bottleneck bandwidths of at least 10Mbps. 22% of the participating peers have upstream bottleneck bandwidths of 100Kbps or less.

42 42

43 43 Reported Bandwidth

44 44 Graph results The percentage of Napster users connected with modems (of 64Kbps or less) is About 25%, while the percentage of Gnutella users with similar connectivity is as low as 8%. 50% of the users in Napster and 60% of the users in Gnutella use broadband connections only about 20% of the users in Napster and 30% of the users in Gnutella have very high bandwidth connections Overall, Gnutella users on average tend to have higher downstream bottleneck bandwidths than Napster users.

45 45

46 46 Graph results Approximately 20% of the peers have latencies of at least 280ms, Another 20% have latencies of at most 70ms

47 47

48 48 Graph results This graph illustrates the presence of two clusters; a smaller one situated at (20- 60Kbps, 100-1,000ms) and a larger one at over (1,000Kbps, 60-300ms). horizontal lines in the graph they predicate that the latency also depends on the location of the peer for measuring system.

49 49 Measured Uptime

50 50

51 51 Number of Shared Files

52 52

53 53 Correlation of Free-Riding with B/W

54 54

55 55

56 56

57 57 Power law A connected cluster of peers that spans the entire network survives even in the presence of a large percentage p of random peer breakdowns, where p can be as large as: where m is the minimum node degree and K is the maximum node degree. α <3.

58 58

59 59 Gnutella Fri Feb 16 05:21:52-05:23:22 PST1771 hosts Popular sites: 212.239.171.174 adams-00-305a.Stanford.EDU 0.0.0.0

60 60 30% random failures 1771 – 471 – 294 hostsFri Feb 16 05:21:52-05:23:22 PST

61 61 4% orchestrated failures Fri Feb 16 05:21:52-05:23:22 PST1771 - 63 hosts

62 62 Results Overview Lots of heterogeneity between peers –Systems should consider peer capabilities Peers lie –Systems must be able to verify reported peer capabilities or measure true capabilities

63 63 Points of Discussion Is it all hype? Should P2P be a research area? Do P2P applications/systems have common research questions? What are the “killer apps” for P2P systems?

64 64 Conclusion P2P is an interesting and useful model There are lots of technical challenges to be solved


Download ppt "1 A Measurement Study of Peer-to-Peer File Sharing Systems by Stefan Saroiu P. Krishna Gummadi Steven D. Gribble Presentation by Nanda Kishore Lella"

Similar presentations


Ads by Google