"A Measurement Study of Peer-to-Peer File Sharing Systems" Stefan Saroiu, P. Krishna Gummadi Steven D. Gribble, "A Measurement Study of Peer-to-Peer File.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Intel Research Seattle Sylvia Ratnasamy, Lee Breslau, Scott Shenker, and Nick Lanham.
Efficient Search - Overview Improving Search In Peer-to-Peer Systems Presented By Jon Hess cs294-4 Fall 2003.
Technion –Israel Institute of Technology Computer Networks Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Improving Gnutella Willy Henrique Säuberli Seminar in Distributed Computing, 16. November 2005 Papers: I.Making Gnutella-like P2P Systems Scalable; SIGCOMM.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
FRIENDS: File Retrieval In a dEcentralized Network Distribution System Steven Huang, Kevin Li Computer Science and Engineering University of California,
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
 We developed a fast and tunable crawler, Cruiser.  Cruiser uses a master-slave architecture, parallel crawling, and leverages the two-tier topology.
Characterizing the Two-Tier Gnutella Topology  Gnutella, FastTrack, and eDonkey use two-tier overlay topologies.  Our initial study focuses on Gnutella.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Making Gnutella-like P2P Systems Scalable Presented by: Karthik Lakshminarayanan Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, and Scott.
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon.
Searching in Unstructured Networks Joining Theory with P-P2P.
P2P Course, Structured systems 1 Introduction (26/10/05)
Amir Rasti Daniel Stutzbach Reza Rejaie The ION P2P Project University of Oregon On the Long-term Evolution of the Two-Tier.
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Wide-Area Traffic Management COS 597E: Software Defined Networking.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Presentation by Manasee Conjeepuram Krishnamoorthy.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
P2P Group Meeting (ICS/FORTH) Monday, 21 February, 2005 Making Gnutella-like P2P Systems Scalable (Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick.
1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
M. Menelaou CCNA2 DYNAMIC ROUTING. M. Menelaou DYNAMIC ROUTING Dynamic routing protocols can help simplify the life of a network administrator Routing.
Thesis Proposal Data Consistency in DHTs. Background Peer-to-peer systems have become increasingly popular Lots of P2P applications around us –File sharing,
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
Structuring P2P networks for efficient searching Rishi Kant and Abderrahim Laabid Abderrahim Laabid.
A Routing Underlay for Overlay Networks Akihiro Nakao Larry Peterson Andy Bavier SIGCOMM’03 Reviewer: Jing lu.
Super-peer Network. Motivation: Search in P2P Centralised (Napster) Flooding (Gnutella)  Essentially a breadth-first search using TTLs Distributed Hash.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau (Several slides have been taken.
Peer Pressure: Distributed Recovery in Gnutella Pedram Keyani Brian Larson Muthukumar Senthil Computer Science Department Stanford University.
Data Communications and Networking Chapter 11 Routing in Switched Networks References: Book Chapters 12.1, 12.3 Data and Computer Communications, 8th edition.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Sampling Techniques for Large, Dynamic Graphs Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Nick Duffield – AT&T Labs—Research.
By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.
An overview of Gnutella
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Hakim Weatherspoon CS294-4: Peer-to-Peer Systems By Stefan Saroiu, P. Krishna Gummadi,
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Peer-to-Peer and Social Networks
Early Measurements of a Cluster-based Architecture for P2P Systems
A Measurement Study of Peer-to-Peer File Sharing Systems
GIA: Making Gnutella-like P2P Systems Scalable
Presentation transcript:

"A Measurement Study of Peer-to-Peer File Sharing Systems" Stefan Saroiu, P. Krishna Gummadi Steven D. Gribble, "A Measurement Study of Peer-to-Peer File Sharing Systems", Proceedings of the Multimedia Computing and Networking (MMCN), San Jose, January, 2002.

Peer-to-Peer Membership Ad-hoc Dynamic Goals Design architecture that encourages cooperation Examples Napster Gnutella BitTorrent

Server maintains index of files contained in connected peers Peer queries server for file Server returns peer who has file and then queries other servers Direct link between peers to transfer file Locating Files (Napster)

Locating Files (Gnutella) Queries are performed by flooding the network Responses are returned only if the peer has the file, queries are forwarded to neighbors Files are downloaded via direct link between two peers

Collecting Metrics (Napster) Napster Crawler Query for “popular” files using multiple simultaneous connections, maintain list of peers returned by server For each peer collect metadata from the server  Peers reported bandwidth  Number of files shared  Current number of uploads/downloads  Names and sizes of all files shared  IP address

Collecting Metrics (Gnutella) Gnutella Crawler Connect to popular peers Send ping messages with large TTLs to known peers Add newly discovered peers based on pong messages Pong messages include metadata about peer  Number of files  Total size of files

Metadata collected Bottleneck bandwidth Latency Number of shared files Lifetime/Uptime Distribution across DNS domains

Approx. 60% of peers have a session duration shorter than 1 hour Napster peers are up a larger percentage of the time as compared to Gnutella peers Gnutella –20% of hosts are available >45% of the time Napster – 20% of hosts are available >80% of the time

Approximately 30% of Napster peers misreport their bandwidth (Modems + ISDN < 64Kbps) Authors argue that misreporting is one indication that many users in a p2p system are not willing to cooperate.

Gnutella Approximately 25% of Gnutella peers share no files 75% of peers share less than 100 files 7% of peers share more than 1000 files (which is more than the other 93% combined) Napster 40%-60% of users share only 5-20% of files

(a) Gnutella network of 1771 peers (b) 30% of peers randomly removed (a) 1106 of remaining 1300 nodes still connected (c) 4% of peers selectively removed (63 best connected peers) (a) Network becomes highly fragmented

"Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems" Daniel Stutzbach, Reza Rejaie, and Subhabrata Sen, "Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems," Networking, 2008.

Gnutella Characteristics in Depth Previous studies as claimed in the paper: –Lack the accuracy of the captured snapshots. –The crawlers that have been used weren’t fast enough. Resulting in distorted snapshots And partial view of the topology. – Simulations are standing on invalid assumption (e.g. power- law distributing). –“Finally, to our knowledge, the dynamics of unstructured P2P overlay topologies have not been studied in detail in any prior work.”

Cruiser: Fast and Accurate Crawler “Cruiser can accurately capture a complete snapshot of the Gnutella network with more than one million peers in just a few minutes”. Faster than any crawler ever built, as claimed. Enabled capturing the overlay dynamics which led to more accurate characterization. Data Set: 18,000 snapshot captured in 11 month period. Weekly intervals and daily random captures.

Some of the Findings As mentioned, the study refutes the power-law distribution of the node degree. Online-like architecture. Limewire v.s BearShare. Reachability: 30-38% are unreachable. Previous studies ignored this factor which forms a non- negligible fraction of the peers WikiPedia

Two-Tier Topology Top-level overlay Leaf Ultrapeer

Fast Crawler vs. Slow Crawler Why the Power-Law Distribution is a measurement artifact? Slow Crawler: Cruiser with less concurrent connection. Form Two-piece Power-Law distribution.

Top-Level Overlay Analysis The actual distribution is two-piece Power-Law distribution Peaks at 30 degree Due to the fact of the pre- configuration of Limewire and BearShare. New peers, approaching 30 degree Re-configured or other implementation to have higher degree

Leaves Overlay Analysis Limewire: 30 leaves BearShare: 45 leaves Majority have 3 or less parent < 0.02% parent Other: 75 leaves

Reachability Flood-based Query: New peers are discovered exponentially up to a certain point. Pair-wise Distance: 60% have a length of 4 as shortest path distance Effect of two-tier topology on leaves: –One Parent: we get a distribution similar to the Utlrapeers shifted by 2. –More than that: 50% -> length of 5 (+1). 50% -> length of 6 (+2).

Overlay Dynamics As the number of peers increases, the uptime (in hours) decreases exponentially.

Overlay Dynamics-2 What are the causes ? –Protocol Driven: As the peers select their neighbors according to the protocol. –User Driven: Peer participation Definition of a stable peer: –a peer is stable if it manages to have a connection duration of time t. t=48 hours as in the study.

Internal Connectivity of Stable Core Stable core: peers with t >= 48. [Excluding the connection between unstable peers] 88-94% remained connected. Observations: – Stable core are clustered together – Peers with higher uptime are more biased to establish connections with each others.

External Connectivity of Stable Core Peers (not in the stable core) are following the same behavior. (i.e biased to connect with peers have >= uptime) This behavior led to form onion-like connections. –The core of the onion is the Stable Core SC(t) < P1(t) … Pn-1(t) < Pn(t)

Overlay Dynamics.. The Main Cause User driven dynamics are the major factor of the overlay dynamics.

"Making Gnutella-like P2P Systems Scalable" Y. Chawathe, S. Ratnaswamy, L. Breslau, N. Lanham, S. Shenker, "Making Gnutella-like P2P Systems Scalable," ACM Sigcomm 2003,Dongyu Qiu and R. Srikant, "Modeling and performance analysis of BitTorrent-like peer-to-peer networks", Proceedings of ACM Sigcomm, 2004.

Review of Gnutella Uses an unstructured overlay network Distributed download AND search Floods query across this overlay with a limited scope Notorious for poor scaling

Scaling Gnutella Previously proposed solution: hash table to wide-area file search The paper advocates maintaining simplicity of unstructured system but with new mechanisms –propose solution using aspects of a system similar to KaZaA and its supernodes model supernodes have higher bandwidth connectivity searches are routed to supernodes which hold pointers to peer data

Distributed Hash Tables (structured overlay) Pros: –Looking up using DHT requires O(log n) steps vs. Gnutella's O(n) steps Cons: Doesn’t deal well with… –Transience of nodes in P2P network: DHTs require repair operations to preserve efficiency and correctness of routing. –Situations where keyword searches are more prevalent, and important, than exact-match queries –Situations where most queries are for relatively well-replicated files not single copies of files

Gia’s Improvements All messages exchanged by clients are tagged at their origin with a globally unique identifier Explicitly accounts for node heterogeneity and capacity constraints Replace flooding with “biased” random walks It is not any single of the following components, but in fact, the combination of them all that provides GIA's large performance advantage…

Dynamic topology adaptation: –Puts most nodes within short reach of high capacity nodes and makes sure that the well- connected high-degree nodes can actual handle the large number of queries by calculating “satisfaction level” (discussed later) Active flow control: –Avoid overloading by assigning flow-control tokens based on "available capacity" –Dropping queries not an option in Gia where random walks are taken instead of floods –A node that advertises high capacity to handle incoming queries is in turn assigned more tokens for its own outgoing queries.

One-hop replication: of pointers to content offered by immediate neighbors –This way a node can also answer queries for its neighbors (resulting in fewer hops) –When a node goes offline, its information is flushed from its neighbors to maintain consistency and accuracy Search protocol: –Based on biased random walks that directs queries towards high-capacity nodes (instead of purely random) –TTLs (and MAX_RESPONSES) bound duration of the biased random walks and book-keeping avoids redundant paths

“Satisfaction Level” A measure of how close the sum of the capacities of all of a node's neighbors (normalized by their degrees) is to the node's own capacity. To add a new neighbor, a node randomly selects a small number of candidate entries. From these randomly chosen entries, it selects the node with maximum capacity greater than its own capacity. If no such candidate entry exists, it selects one at random. (see Algorithm 1)

Nodes with low satisfaction levels perform topology adaptation more frequently than satisfied nodes. I = T x K^(1-S) S : Satisfaction level (see algorithm 2) I : Adaptation interval T : maximum interval between adaptation iterations K : aggressiveness of the adaptation. After each interval I, if a node's S < 1.0, it attempts to add a new neighbor. If S = 1.0, it still continues to iterate through the adaptation process, checking its satisfaction level every T seconds.

Algorithms (regarding Satisfaction)

Gia, Flood, Random Walk over Random Topology, and Supernode models. When the query load increases, we notice a sharp "knee" in the curves beyond which the success rate drops sharply and delays increase rapidly. The hop- count holds steady until the knee-point and then decreases.

Gia, Flood, Random Walk over Random Topology, and Supernode models. Compare by observing Collapse Point (CP) and Hop- count before Collapse Point (CP-HC)

Single Search Responses Aggregate System Capacity of Gia is 3 to 5 orders of magnitude higher than Flood and Random Walk Random Topology. RWRT performs better than Flood typically but can be about the same when there are fewer nodes since RWRT may end up visiting practically all the nodes anyway. Flood and Supernode models retain low hop counts whereas Gia may sometimes need to traverse quite a bit. RWRT, being random, has hop counts that are inversely proportional to the replication factor. Flood and Supernode models have a collapse point that falls as the number of nodes increases due to the greater query load at each node.

Multiple Search Responses MAX_RESPONSES parameter only affect Gia and RWRT since Flood and Supernode flood throughout the network Naturally, higher MAX_RESPONSES causes more hop counts in Gia and RWRT models. Collapse point is also drops when the replication factor is low. Note: A search for k responses at r% replication is equivalent to one for a single answer at r/k% replication.

Node failure When a node leaves the network, its queued queries are resumed by the nodes that originally generated them. How do we make sure a query isn't lost? –Keep-alive messages (query responses sent back to the originator of the query act as implicit keep- alives). If no actual matches have been found yet, we’ll still just send a dummy query response message.

Next bottleneck: Download process? The authors believe that the technique of directing towards higher capacity nodes helps alleviate this issue. To make the above benefit more significant, it would be better to have the higher capacity nodes store more files (rather than simply pointers). Simple solution would be to have popular files at low capacity nodes replicated to the higher capacity nodes.