Presentation is loading. Please wait.

Presentation is loading. Please wait.

Characteristics of Current P2P File-Sharing Systems (with a brief excursion into network measurement tools) Stefan Saroiu P. Krishna Gummadi Steven Gribble.

Similar presentations


Presentation on theme: "Characteristics of Current P2P File-Sharing Systems (with a brief excursion into network measurement tools) Stefan Saroiu P. Krishna Gummadi Steven Gribble."— Presentation transcript:

1 Characteristics of Current P2P File-Sharing Systems (with a brief excursion into network measurement tools) Stefan Saroiu P. Krishna Gummadi Steven Gribble University of Washington

2 Peer-to-Peer Frenzy Both research and industrial excitement –CAN, Chord, Past, Tapestry, JXTA, Farsite, Publius, Morpheus, AudioGalaxy Basic Premise –wide-area, distributed system –voluntary, ad-hoc, dynamic home-user peers exchange information (mostly large files) Many proposals, yet nobody knows the participating peers’ characteristics and behavior

3 SS SS napster.com P P P P P P Q R D P P P P P P P Q Q Q Q Q D R P S peer server Q R D response query file download NapsterGnutella R Napster & Gnutella

4 Methodology 2 stages: 1.periodically crawl Gnutella/Napster discover peers and their metadata 2.feed output from crawl into measurement tools: bottleneck bandwidth – SProbe latency – SProbe peer availability – LF degree of content sharing – Napster crawler

5 Network Bandwidth Scenarios Network measurements Dynamic server/peer selection P2P overlay formation –or application-level multicast Placement of content replicas

6 Network Bandwidth 1.Throughput: –number of transferred bytes during a fix interval of time 2.Available bandwidth: –the maximum attainable throughput of a newly started flow 3.Bottleneck bandwidth: –maximum throughput ideally obtained across the slowest link Hard to measure: –throughput, available bandwidth Easier to measure: –bottleneck bandwidth

7 One-Packet Model slope = bandwidthbottleneck 1 probing packet Traversal Time Packet Size

8 Packet-Pair Model bottleneck bandwidth time dispersion proportional to bottleneck bandwidth Δt sizepacket bandwidthbottleneck 

9 Vital Properties of an Ideal Tool Accurate Fast: –1 min/measurement too slow Scalable: –flooding the network will not work Works in Uncooperative Environments –can’t deploy software at both endpoints

10 Properties of an Ideal Tool Active: –existent traffic might not be suitable TCP/UDP based: –ICMP heavily filtered Cross-traffic resilient: –should detect and give up in the face of cross traffic Works on Asymmetric Paths Flexible to Bandwidth Changes Controlled Evaluations

11 Current Tools Desired Properties Path- char pcharclinkbprobepathrateNettimerSProbe Accurate Fast Uncooperative Environments * Scalable TCP/UDP Active Cross-traffic * Asymmetric Bandwidth changes Controlled Evaluations

12 SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

13 SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

14 SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

15 SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

16 SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

17 SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

18 SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

19 SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

20 SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

21 SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

22 SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

23 SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

24 SProbe Uses TCP Tricks From remote To local –Involuntary cooperation of application layer LocalRemote (Web) HTTP Get request Data packet ACK (last data packet)

25 SProbe’s Accuracy

26

27 More SProbe Bottleneck Bandwidth Latency Availability (LF): –send a SYN packet –receive: SYN/ACK – host active RST – host inactive, but online nothing – host offline

28 P2P Characteristics How many peers are “server-like”? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?

29 P2P Characteristics How many peers are “server-like”? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?

30 Higher Downstream Bandwidths

31 Most Peers have Cable Modem-like Bandwidths

32 Yes, Lots of Cable Modems

33 Closest 20% are 4X closer than furthest 20%

34 Two horizontal bands – East Coast and Transoceanic Links

35 Availability Period probes yield data like: start end

36 Availability Period probes yield data like: Divide into two periods Keep segments that: –start in 1 st period –end in 1 st or 2 nd periods –draw conclusion only on segments no larger than 2 nd period start end 12 hours

37 Median Session is about one hour (same for both systems)

38 Gnutella/Napster Uptime

39 P2P Characteristics How many peers are “server-like”? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?

40 Who Has the Files?

41

42 Correlation of Free-Riding with B/W

43 P2P Characteristics How many peers are “server-like”? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?

44 It’s all about incentive!

45 Lack of Knowledge is Universal

46 P2P Characteristics How many peers are “server-like”? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?

47 Power-Law Networks are here to Stay Barabasi and Albert showed that networks which… –grow by continuous addition of new nodes –exhibit preferential attachment (likelihood of connecting to a node depends on the node’s degree) …power-law distribution of vertex degree Internet, WWW, Gnutella

48 Resilience to Failures Power-law networks (Cohen et al.): –very resilient in face of random node failures a giant spanning cluster still exists –fairly resilient in face of cascading failures –very vulnerable in face of orchestrated attacks (towards high-degree nodes)

49 Gnutella Fri Feb 16 05:21:52-05:23:22 PST1771 hosts Popular sites: 212.239.171.174 adams-00-305a.Stanford.EDU 0.0.0.0

50 30% random failures 1771 – 471 – 294 hostsFri Feb 16 05:21:52-05:23:22 PST

51 4% orchestrated failures Fri Feb 16 05:21:52-05:23:22 PST1771 - 63 hosts

52 Discussion Heterogeneity: –3 orders of magnitude of bandwidth 50Kbps-100Mbps –6 orders of magnitude of latency 10us-10s –>4 orders of magnitude in availability 1%-99.99% Peers should not be treated as equals

53 Cooperating, Well-Behaved Peers Incentive: –game-theoretic approaches of enforcing local behavior for global benefit System enforcement: –peers can: measure each others characteristics (SProbe) enforce the reported ones –a reported 56Kbps peer should not download content at higher speed

54 Feedback to Current Proposals CAN, Chord, Past: –great memory and lookup algorithms: log(N) time and space –at the price of maintaining rigid network structure: hypercubes, butterflies, Plaxton trees –unclear how network structure is maintained given heterogeneity and dynamics of peers Conjecture – these networks will have a hard time stabilizing: –will need lots of routine, maintenance traffic

55 Instead Gnutella… Easy join procedure: –this simplicity gave Gnutella its power-law shape Easy to implement protocol (broadcast) Lots of maintenance traffic already –although the protocol has become smarter with its subsequent versions Searching is a nightmare

56 Document Popularity Follows Zipf distribution –long-tailed Popular documents become more popular with Napster/Gnutella Currently, need to resubmit queries in the hope that someone will answer Wish-list based system

57 Wide-area Network Measurements Sending a few packets can be identified with hostile behavior Even a few SYN packets are sufficient to trigger software firewalls –dialogue box pops up – possible scan from washington.edu, click OK or Cancel Many confused, angry, threatening e-mails sent to many people (security, root, Ed): –active Internet measurements are not simple to perform

58 Excerpt from e-mail “Thank you for your reply. Unfortunately, I did not authorise anybody from washington.edu to attempt to crack into my computer. Attempting to break into computers is a crime in Australia. Please advise the names and contact details of the people involved in this "research" so that I can contact the Australian Federal Police, who will no doubt contact your Federal Bureau of Investigation to investigate this incident and institute criminal proceedings against those concerned.”

59 Current Work Quantify and show that current proposals are too rigid for Napter/Gnutella-like peers dynamics Wish-list, delayed exchange system –big distributed scheduling problem SGet –a downloading tool with automatic server selection –no bandwidth is wasted

60 Questions? Beautiful Sieg Hall “Pride of UW”


Download ppt "Characteristics of Current P2P File-Sharing Systems (with a brief excursion into network measurement tools) Stefan Saroiu P. Krishna Gummadi Steven Gribble."

Similar presentations


Ads by Google