Measuring and Analyzing the Characteristics of Napster and Gnutella Hosts S. Saroiu, P. Gummadi, and S. Gribble Multimedia Systems Journal Volume 8, Issue.

Slides:



Advertisements
Similar presentations
A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.
Advertisements

Peer-to-Peer and Social Networks An overview of Gnutella.
INF 123 SW ARCH, DIST SYS & INTEROP LECTURE 12 Prof. Crista Lopes.
Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y.
Characterizing Residential Broadband Networks Marcel Dischinger †, Andreas Haeberlen †‡, Krishna P. Gummadi †, Stefan Saroiu* † MPI-SWS, ‡ Rice University,
Receiver-driven Layered Multicast S. McCanne, V. Jacobsen and M. Vetterli University of Calif, Berkeley and Lawrence Berkeley National Laboratory SIGCOMM.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Resilient Peer-to-Peer Streaming Paper by: Venkata N. Padmanabhan Helen J. Wang Philip A. Chou Discussion Leader: Manfred Georg Presented by: Christoph.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Receiver-driven Layered Multicast S. McCanne, V. Jacobsen and M. Vetterli SIGCOMM 1996.
An Empirical Study of Real Audio Traffic A. Mena and J. Heidemann USC/Information Sciences Institute In Proceedings of IEEE Infocom Tel-Aviv, Israel March.
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
Spotlighting Decentralized P2P File Sharing Archie Kuo and Ethan Le Department of Computer Science San Jose State University.
 We developed a fast and tunable crawler, Cruiser.  Cruiser uses a master-slave architecture, parallel crawling, and leverages the two-tier topology.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Characteristics of Current P2P File-Sharing Systems (with a brief excursion into network measurement tools) Stefan Saroiu P. Krishna Gummadi Steven Gribble.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
1 Client-Server versus P2P  Client-server Computing  Purpose, definition, characteristics  Relationship to the GRID  Research issues  P2P Computing.
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.
A Measurement Study of Peer-to- Peer File Sharing Systems Sariou, Gummadi, and Gribble.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Measurement of the Congestion Responsiveness of RealPlayer Streaming Video Over UDP Jae Chung, Mark Claypool, Yali Zhu Proceedings of the International.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research.
Characterizing Residential Broadband Networks Marcel Dischinger †, Andreas Haeberlen †‡, Krishna P. Gummadi †, Stefan Saroiu* † MPI-SWS, ‡ Rice University,
Networking Components Chad Benedict – LTEC
Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.
Lecture 22 Page 1 Advanced Network Security Other Types of DDoS Attacks Advanced Network Security Peter Reiher August, 2014.
Presentation by Manasee Conjeepuram Krishnamoorthy.
P2P File Sharing Systems
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
P2P Architecture Case Study: Gnutella Network
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
By Shobana Padmanabhan Sep 12, 2007 CSE 473 Class #4: P2P Section 2.6 of textbook (some pictures here are from the book)
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
A P2P file distribution system ——BitTorrent Pegasus Team CMPE 208.
Peer-to-Peer Networks University of Jordan. Server/Client Model What?
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
An Efficient Approach for Content Delivery in Overlay Networks Mohammad Malli Chadi Barakat, Walid Dabbous Planete Project To appear in proceedings of.
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
SProbe: Another Tool for Measuring Bottleneck Link Bandwidth Stefan Saroiu P. Krishna Gummadi Steven Gribble University of Washington.
Super-peer Network. Motivation: Search in P2P Centralised (Napster) Flooding (Gnutella)  Essentially a breadth-first search using TTLs Distributed Hash.
Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
Content Distribution in Unstructured Peer-to-Peer Networks Daniel Stutzbach Committee Members: Professor Reza Rejaie Professor Ginnie Lo Professor Art.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
1 A connection management protocol for promoting cooperation in Peer-to-Peer networks Authors: Murat Karakaya, Ibrahim Korpeoglu, and Ozgur Ulusoy Source:
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
Flashback: A Peer-to-Peer Web Server for Flash Crowds Presented by Tom Batkiewicz CS 587x Fall ‘07.
NETE4631 Network Information Systems (NISs): Peer-to-Peer (P2P) Suronapee, PhD 1.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
1 A Measurement Study of Peer-to-Peer File Sharing Systems by Stefan Saroiu P. Krishna Gummadi Steven D. Gribble Presentation by Nanda Kishore Lella
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Hakim Weatherspoon CS294-4: Peer-to-Peer Systems By Stefan Saroiu, P. Krishna Gummadi,
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Accelerating Peer-to-Peer Networks for Video Streaming
Mohammad Malli Chadi Barakat, Walid Dabbous Alcatel meeting
Peer-to-Peer and Social Networks
A Measurement Study of Peer-to-Peer File Sharing Systems
A Measurement Study of Napster and Gnutella
CS 162: P2P Networks Computer Science Division
Algorithms for Selecting Mirror Sites for Parallel Download
Presentation transcript:

Measuring and Analyzing the Characteristics of Napster and Gnutella Hosts S. Saroiu, P. Gummadi, and S. Gribble Multimedia Systems Journal Volume 8, Issue 5 November 2002

Introduction (1 of 2) Peer-to-Peer (P2P) file sharing have created interest in P2P architectures Exact definition debatable, but P2P –Lacks centralized infrastructure –Depends upon voluntary participation for resources Membership ad-hoc and dynamic –Capacity, latency, availability of peers change  Must be aware of when deciding suitable peer for allocating resources

Introduction (2 of 2) However, few architectures are evaluated considering suitability of peers –Due do lack of characteristics on hosts This paper –Studies Napster and Gnutella (were the two most popular) –Seeks to precisely characterize the population of end-user hosts Typically home machines on the “edge” of the Internet

This Paper Characterization –Bottleneck capacities –Latencies –Availability –Number of files –Correlations between above stats Lessons –Heterogeneity – 3-5 orders of magnitude –Peers deliberately mis-report information if they have incentive to do so. Need: Built-in incentives to tell the truth Ability for system to verify peer information

Outline Introduction(done) Methodology(next) Results Recommendations Conclusions

Measurement Methodology Periodically crawl each system –Gather snapshots: IP and port and reported information Do some active measurements Sub-sections –Architectures –Crawling –Active Measurements –Limitations

Napster and Gnutella Architectures Napster has centralized server index –Servers keep track of peer information Gnutella has overlay network –floods requests (TTL to limit scope) –ping and pong messages to discover peers Peers function as client server Query for file, download from peer

The Napster Crawler Server architecture –~160 Napster servers, peers connect to 1 –Server reports “local” and “remote” users Actively query popular song artists, see what peers responded (do in parallel, so only takes 3-4 min.) –By comparing to global server stats, captured 40-60% of peers with 80-90% of traffic –Distribution of remainder traffic stats similar For each peer discovered, request –Capacity of peer as reported by peer –Number of files being shared –Number of uploads and downloads in progress –Names and sizes of files –IP address of peer

The Gnutella Crawler (1 of 2) Connect to several well-known peers –gnutellahosts.com, router.limewire.com Send ping messages with large TTL Add new peers based on pong messages –Gives IP, number and total size of files Should be no bias since not using “popular” songs Allow ~2 minutes, report peers –Usually, about hosts

The Gnutella Crawler (2 of 2) Based on clip2 (gnutella measurement) –about 25-50% of hosts at that ime

Measurement Methodology Periodically crawl each system –Gather snapshots: IP and port and reported information Do some active measurements Sub-sections –Architectures –Crawling –Active Measurements –Limitations

Active Measurements P2P sys only report limited info (and sometimes not accurate) about peers –Peers may choose not to report capacity –Peers may lie to discourage downloads For each snapshot, gather direct data –Capacity, latency, num files, lifetime Next, discuss: –Bottleneck capacity measurements –Latency measurements –Lifetime Measurements

Bottleneck Capacity Measurements Real number would be available capacity –But would require TCP connection, so costly Instead, try to measure maximum capacity –An approximation of available –Report bottleneck (lowest) capacity Existing techniques (flood one packet, or several packet-pairs) not acceptable –“flood” causes too much traffic –Several packet pairs can not take 1 minute, so 1 week for 10k measurements –Cannot deploy custom software on all hosts  Design SProbe

SProbe (1 of 4) Dispersion of two large packets gives measure of bottleneck –The larger, the slower the link How to get peers to report? Rely upon response

SProbe (2 of 4) Send two TCP SYN packets back-to-back –Add large payload –If port inactive, get RTS packet back Measure dispersion of RTS packets Note, some firewalls drop SYN to inactive port –SProbe cannot tell difference

SProbe (3 of 4) Cross traffic can interfere with dispersion –Current approaches send lots, but doesn’t scale and takes too long Send packet train, small at ends, large in middle If dispersion of small is larger than large, assume there may be cross traffic and return “unknown”

SProbe (4 of 4) For upstream, need peer to send Initiate Gnutella handshake Wait so build up large packets When send, measure dispersion

Latency Measurements TCP throughput directly dependent upon latency (round-trip time) –T = k / [RTT x sqrt(p)] Measure time for 40 byte TCP packet exchange (minimize bottleneck transmission) While P2P may be different than P2Server, distribution to well-connected server still of interest

Lifetime Measurements States: –Offline – not connected or behind firewall –Inactive – connected but not doing P2P –Active – participating in P2P Send TCP SYN to P2P port –If no packet, then offline –If RST then inactive –If SYN/ACK then active

Summary of Active Measurements Lifetime - random subset of peers –17,125 Gnutella peers over 60 hours, every 7 minutes –7,000 Napster peers over 25 hours, every 2 minutes Bottleneck and Latency –Tried 595,974 Gnutella peers, only 223,552 reliable downstream, 16,252 upstream, 339,502 latency –Tried 4079 Napster peers, with 2049 successful (complaints of “intrusive” forced to stop early)

Limitations of Methodology Ideal should include workload –So could see how to tune system Ideal should know “birth” rate so know how will scale May incorrectly classify peers –IP addresses may be shared (multiple hosts behind NAT box), but think are one –IP addresses may be re-used (DHCP), so “same” peer moves Little (scientific) knowledge about broadband so unclear of effects on performance –Packet loss, congestion … (queues! )

Outline Introduction(done) Methodology(done) Results(next) Recommendations Conclusions

Peers with Server-Like Capacity? (Measured) Gnutella Peers Upstream -Only 8% > 10Mbps -22% < 100Kbps Asymmetric -Good for downloads -Bad for server

Measured Download Capacity Broadband -Napster 50% -Gnutella 60% Modems -Napster 25% -Gnutella 8% -Gnutella needs flooding, so more capacity -Gnutella rumor is more technical, and they have more capacity

Reported Capacities for Napster Unknowns may be mis-reporting to avoid downloads (MLC: maybe they don’t know? Other ratios match)

Measured Latencies for Gnutella 20% > 280ms 20% < 70ms  4x closer! 5% > 1000ms

Server Like Gnutella Peers? Europe East Coast

Peers with Server-Like Uptimes? (Measured) (Uptime percentage) IP uptimes similar Napster peers participate more. Perhaps due to “chat” and “MP3” in Napster? Or maybe more useful?

Session Durations - ½ have uptime < 1 hour + About the time to download some songs - Since number of peers constant, the ½ is replaced by another ½

Number of Shared Files - Gnutella 25% “free riders” 7% > Offer more than all the others combined (Don’t have data on 0 files for Napster)

Number of Shared Files (“Zero” Files Removed) -Slightly more consistent in Napster -Still, suggests many free riders in Napster, too

Number of Shared Files – Napster (Reported Capacity) -Capacity has little correlation with number of files shared

Number of Downloads -Lower bandwidth users do most of the downloads?

Downloads versus Sharing - Napster -Users that share less perhaps are less interested

Shared Files versus Total Size -Slope is about 3.7 MB (typical MP3) -Gnutella allows any file, so varies more Napster Gnutella

Measured Downstream Bottleneck -30% report modem but over 100k + Intentional? -Only 10% T1’s low -Correlation between Sprobe and Reported is good

Measured Downstream Bottleneck -Suggests “unknown” users really do not know

Resiliency in the Face of Failure - Predicted resiliency based on measured connectivity - Uses power-law that is based on degree of connectivity + But nodes “prefer” good nodes so some well-connected -But what if a “malicious” attack at best connected nodes?

Gnutella Topologies 1771 Peers, 02/16/0130% randomly removed 4% best connected removed -Malicious, well-placed attack can shatter even reslient P2P networks!

Recommendations P2P systems assume equal peers –But extreme heterogeneity across capacity, latencies, lifetimes and shared data  Instead, should delegate across hosts based on physical characteristics P2P systems assume equal participation –But clearly some download most, serve least  Maybe impose equality if want equal performance P2P systems assume users want to cooperate –But users will misrepresent if it gives them advantage  Instead, should try to measure instead of trusting

Conclusions Measured popular P2P file sharing systems with many voluntary users Lessons: –Significant heterogeneity –Clear asymmetric behavior in users –Peers deliberately mis-report information if it helps them to do so

Future Work (MLC) Other systems: KaZaa, E-Donkey, BitTorrent… New P2P systems based on lessons from this paper –Delegate based on capabilities, for example Measure total downloads, characteristics of content