Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.

Slides:



Advertisements
Similar presentations
A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.
Advertisements

Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
University of Cincinnati1 Towards A Content-Based Aggregation Network By Shagun Kakkar May 29, 2002.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Peer to Peer (P2P) Networks and File sharing. By: Ryan Farrell.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
1 School of Computing Science Simon Fraser University, Canada Modeling and Caching of P2P Traffic Mohamed Hefeeda Osama Saleh ICNP’06 15 November 2006.
Web Applications: Peer-to-Peer Networks Presentation by Michael Smathers Chapter 7.4 Internet Measurement: Infrastructure, Traffic and Applications by.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
Spotlighting Decentralized P2P File Sharing Archie Kuo and Ethan Le Department of Computer Science San Jose State University.
 We developed a fast and tunable crawler, Cruiser.  Cruiser uses a master-slave architecture, parallel crawling, and leverages the two-tier topology.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Measuring and Analyzing the Characteristics of Napster and Gnutella Hosts S. Saroiu, P. Gummadi, and S. Gribble Multimedia Systems Journal Volume 8, Issue.
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
Presentation by Manasee Conjeepuram Krishnamoorthy.
P2P File Sharing Systems
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
P2P Architecture Case Study: Gnutella Network
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
By Shobana Padmanabhan Sep 12, 2007 CSE 473 Class #4: P2P Section 2.6 of textbook (some pictures here are from the book)
Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
DP&NM Lab. CSE, POSTECH - 1/25 - Peer-to-Peer Algorithms and System CS600 Assignment #5 Nov Byungchul Park DPNM Lab., Dept.
Peer-to-Peer Networking. Presentation Introduction Characteristics and Challenges of Peer-to-Peer Peer-to-Peer Applications Classification of Peer-to-Peer.
Peer-to-Peer Networks University of Jordan. Server/Client Model What?
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
Mapping the Gnutella Network Presented By: Tony Young M.Math Candidate October 7th, 2004.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
1 Analyzing Peer-To-Peer Traffic Across Large Networks Subhabrata Sen, Member, IEEE, and Jia Wang, Member, IEEE 組員:李英宗 d 林慶和 d 年 6.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
PEER TO PEER (P2P) NETWORK By: Linda Rockson 11/28/06.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
정하경 MMLAB Fundamentals of Internet Measurement: a Tutorial Nevil Brownlee, Chris Lossley, “Fundamentals of Internet Measurement: a Tutorial,” CMG journal.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design Authors: Matei Ripeanu Ian Foster Adriana.
1 A Measurement Study of Peer-to-Peer File Sharing Systems by Stefan Saroiu P. Krishna Gummadi Steven D. Gribble Presentation by Nanda Kishore Lella
Peer-to-Peer (P2P) Networks By Bongju Yu. Contents  What is P2P?  Features of P2P systems  P2P Architecture  P2P Protocols  P2P Projects  Reference.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Hakim Weatherspoon CS294-4: Peer-to-Peer Systems By Stefan Saroiu, P. Krishna Gummadi,
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
#16 Application Measurement Presentation by Bobin John.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Early Measurements of a Cluster-based Architecture for P2P Systems
A Measurement Study of Peer-to-Peer File Sharing Systems
A Measurement Study of Napster and Gnutella
An Overview of Peer-to-Peer
Presentation transcript:

Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar

Introduction to Peer-to-Peer (P2P) systems l End-systems (or peers), are capable of behaving as clients and servers of data, hence system is scalable and reliable l Peers participation is voluntary, membership is dynamic, hence topology keeps changing l Most popularly used for file sharing, hence peer-to-peer systems have become synonymous with peer-to-peer file sharing networks

Classification of P2P systems l P2P computation (e.g. l P2P communication (instant messaging) l P2P file-sharing networks l Centralized (e.g. Napster) l Decentralized l Structured (e.g. Chord, CAN, Pastry, Tapestry) l Unstructured (e.g. Gnutella, Kazaa, Freenet, eDonkey, eMule, Direct Connect, …)

Popularity of unstructured decentralized P2P networks l Gnutella host count, maintained by Limewire ( ) l good scope for measurement studies because: l deployed and widely used l use a lot of bandwidth during data transfer, hence a concern for network operators l quite a few measurement studies have been done on these systems, some of which we will discuss in this seminar

Outline l Characterization of users of P2P systems l Saroiu, et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, l Effect of P2P traffic on the underlying network l Sen, et.al., “Analyzing peer-to-peer traffic across large networks”, IMW’02 l Peer-to-Peer Topologies l Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design”, IEEE Internet Computing, l Searching on the P2P network l Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001 l Deciphering proprietary P2P systems (like Kazaa) l Leibowitz, et.al., “Deconstructing the Kazaa Network”, WIAPP, 2003.

Gnutella protocol overview l Connecting to the Gnutella network l bootstrap using GWebCache system and locally cached hostlist l Ping/Pong messages are exchanged with potential neighbors l Searching on the network l Query messages are flooded on the network l QueryHit messages are received (back-propagated along Query path) from peers having the requested content l Downloading the content l peers download files directly from peers having the requested content

Characterization of Users of P2P systems S. Saroiu, P. Gummadi and S. Gribble, “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN’02. l first paper to characterize p2p file sharing systems l Goal: To analyze the following user characteristics l latency l lifetime of peers l bottleneck bandwidth l number of files shared and downloaded l degree of cooperation l methodology: active crawling l systems studied: Napster and Gnutella l data collection: May 2001

Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002 Measurement Methodology l active crawling of the Napster and Gnutella systems l Napster: issued queries for popular content, and then queried central server for peer information l Gnutella: used ping/pong messages in protocol to get metadata about peers, and then their neighbors and so on l parallel measurement for: l peer lifetime- periodic probing of peers obtained from crawlers l offline if no response to TCP SYN l inactive if response to TCP SYN is a TCP RST l active if accepts the incoming TCP connection on that port l latency- RTT measurements from one host l bottleneck link bandwidth- active probing using Sprobe, a tool they developed based on packet-pair dispersion technique

Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002 Host Lifetime analysis l 20% peers in Napster, Gnutella have IP-level uptime of 93% or more l Napster peers have higher application uptimes than Gnutella peers l the best 20% of Napster peers have uptime of 83% or more and the best 20% of Gnutella peers have uptime of 45% or more l median session duration is 60 minutes for Napster and Gnutella

Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002 Latency analysis (Gnutella) l 20% peers have a latency of at most 70ms and 20% have a latency of at least 280ms l correlation between downstream bottleneck bandwidth and latency: two clusters for modems (20-60Kbps, ms) and broadband (1Mbps, ms)

Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002 Bottleneck Bandwidth Analysis (Gnutella) l 92% Gnutella peers have downstream bottleneck bandwidth of at least 100Kbps l 22% peers have upstream bottleneck bandwidth of 100Kbps or less l peers are unsuitable to serve content

Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002 Downloads, Uploads and Shared Files l relative number of downloads and uploads varies significantly across bandwidth classes l clear client/server behavior of different classes

Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002 Shared files v/s Shared Data (Napster and Gnutella) l Strong correlation between number of files shared and amount of shared MB of data l slope of both lines is 3.7MB, the size of a typical MP3 audio file

Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002 Degree of Cooperation (Napster) l 30% of the peers report bandwidth as 64Kbps or less, but actually have significantly higher bandwidths l 10% of the peers reporting higher bandwidths (3Mbps or higher) actually have significantly lower bandwidth

Effect of P2P traffic on underlying network S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW l Goal: To characterize p2p traffic at three aggregation levels- IP, prefix and AS l host distribution and host connectivity l traffic volume and mean bandwidth usage l traffic patterns over time l connection duration and on-time methodology: passive measurements at routers (port based) l systems studied: FastTrack(Kazaa), Gnutella, Direct Connect l analysis of flow-level data collected from multiple border routers across a large tier-1 ISP’s backbone

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Measurement Methodology l flow records from multiple border routers matching ports: l 6346/6347: Kazaa l 1214: FastTrack l 411/412: Direct Connect l processed data to eliminate l private IP addresses l invalid AS numbers l final data set contained 800 million flow records

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Datasets used for analysis l FastTrack is most popular in terms of number of hosts participating and average traffic volume per day l rapid growth of P2P traffic is mainly caused by increasing number of hosts in the system l Direct Connect systems have higher traffic volume per IP address

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Host distribution analysis l # of IP addresses in FastTrack ranges from 0.5 to 2 million l ratio of # of IP addresses in FastTrack:Gnutella:DirectConnect is 150:30:1 l Density of a prefix is the number of unique active IP addresses belonging to it l Density of an AS is the number of unique prefixes belonging to it l FastTrack hosts are distributed more densely than Gnutella and Direct Connect hosts (64:16:4)

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Host connectivity analysis (FastTrack) l 48% of individual IPs communicate with at most one IP and 89% with at most 10 IPs l 75% of prefixes and ASes communicate with at least 2 prefixes or ASes l very few hosts have very high connectivity and most hosts have very low connectivity

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Traffic volume analysis l CDF of traffic volume per IP/prefix/AS for FastTrack (one day) l distribution of P2P upstream traffic volume across three months

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Mean bandwidth usage (FastTrack and Direct Connect) l FastTrack: 33% IP addresses have mean downstream b/w 56Kbps or less; 50% have mean upstream b/w 56Kbps or less l Direct Connect: 20% IP addresses have mean downstream b/w 56Kbps or less; 33% have mean upstream b/w 56Kbps or less

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Traffic patterns over time (FastTrack) l traffic volume transferred every hour among FastTrack hosts l number of unique IP addresses, prefixes, ASes active every hour l number of active unique IP addresses in each bin of various sizes l system is very dynamic- hosts join and leave frequently

S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002 Connection duration and On-time (FastTrack) l 50% of the IPs are online for less than one minute/day l 60% IPs, 40% prefixes, 30% ASes stay for less than 10 mins/day l 65% of the IPs join only once l AS, prefix level- not very transient

Peer-to-Peer Topologies M. Ripeanu, I. Foster and A. Iamnitchi, “Mapping the Gnutella Network: Properties of Large-Scale Peer-to- Peer Systems and Implications for System Design”, IEEE Internet Computing Journal, l Goal: To discover and analyze the Gnutella overlay topology and evaluate generated traffic l methodology: active crawling l datasets: Nov 2000, March 2001 and May 2001

Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002 Gnutella Network Growth l number of nodes in the largest connected component in the Gnutella network l significantly larger network found during Memorial Day and Thanksgiving l 50 times increase within 6 months

Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002 Distribution of node-to-node shortest paths l more than 95% node pairs are at most 7 hops away l longest node-to- node path is 12 hops

Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002 Averag node connectivity l average number of connections per node remains constant = 3.4

Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002 Node connectivity distribution l Nov 2000: Gnutella nodes organize themselves in a power law l March 2001: connectivity does not look like a power law for all nodes; power law distribution is preserved for nodes with more than 10 links; for less than 10 links, the distribution is almost constant

Searching on the P2P network K. Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001, 2.cs.cmu.edu/~kunwadee/research/p2p/gnutella.html l methodology: passive measurements at one or two peers, made part of the Gnutella network, to log queries and query messages routed through it l data sets: Dec 2000, Jan 2001

K. Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, Top 20 most popular query types l 17% queries contained non-ASCII strings- filtered them out l most queries for artists, adult content and file extensions (audio) l some queries for books, software etc.

K. Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, Query popularity distribution l two distinct distributions of document popularity, with a break at query rank 100 l most popular documents are equally popular l less popular documents follow a Zipf-like distribution, with alpha beween 0.63 and 1.24

Deciphering proprietary P2P systems Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, l methodology: passive content-based data collection at a caching server installed at the border of a large ISP l L4 switch inspects first few packets of each TCP connection to detect Kazaa download traffic l redirects Kazaa download traffic through caching server l focus on download traffic only, not control traffic (since it is encrypted)

Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003 Characteristics of Collected Traces l 38% of all download sessions do not use standard Kazaa port (1214)

Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003 File download distribution by bytes l CDF of byte popularity distribution for 10%, 1% most popular files l 0.8 % of all files account for 80% of the generated traffic l 0.1% of the most bandwidth hungry files (top 1% of all files) generate 50% traffic

Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003 File size distribution l note the log-scale on X-axis l 3 distinct modes l 100KB for pictures l 2-5MB for music files l 700MB for movies

Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003 Quantity and Rate of Distinct Files l new files seen at different time scales- every day, hour, minute l 150,000 distinct files during a 17-day period l daily graph: new files seen continued to decrease, but no steady state value (rate of injection of files in the network) achieved l hourly graph: time of day effect l per-minute graph: 50 new files seen every minute on an average

Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003 Rate of change of popularity of files l percentage of files that make it to the N most popular files list- (a) in consecutive intervals and (b) after T intervals, compared with first list l measurement interval is 24 hours l 15% of the highly popular files remain popular throughout the experiment, and the rest are popular at short time intervals

Open Questions l Mapping a global snapshot of the entire Gnutella topology l Bootstrapping of peers in unstructured peer-to-peer systems (work in progress) l More efficient searching on P2P networks- efforts in this direction include random walks, bloom-filter based techniques etc. l End-point privacy/anonymity is absent in most of these peer-to-peer networks

References l Papers covered in the seminar: l S. Saroiu, P. Gummadi and S. Gribble, “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN l S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW l M. Ripeanu, I. Foster, A. Iamnitchi, “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design”, IEEE Internet Computing, l Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, l N. Leibowitz, M. Ripeanu, A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP l Papers not covered in the seminar: l J. Chu, K.Labonte and B. Levine, “Availability and Locality Measurements of Peer-to-Peer File Systems”, SPIE, July l F. Bustamante and Y. Qiao, “Friendships that last: Peer lifespan and its role in P2P protocols”, WCW l R. Bhagwan, S. Savage and G. Voelker, “Understanding Availability”, IPTPS l Saroiu, et.al., “An Analysis of Internet Content Delivery Systems”, OSDI l Markatos et.al., “Tracing a large-scale Peer-to-Peer System: An hour in the life of Gnutella”, CCGrid 2002.