Download presentation
Presentation is loading. Please wait.
1
Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Subhabrata Sen – AT&T Labs Internet Measurement Conference Berkeley, CA, USA October 19 th, 2005
2
Slide 2/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Motivation P2P file-sharing systems are very popular in practice. Several million simultaneous users collectively. 60% of all Internet traffic [CacheLogic Research 2005] Most use an unstructured overlay Understanding overlay properties is important: Understanding how existing P2P systems function Developing and evaluating new systems Unstructured overlays are not well-understood. We studied overlay properties in Gnutella. Size: one of the largest P2P systems; more than 1 million users Mature: In use for several years; older studies for comparisons Open: No reverse-engineering needed
3
Slide 3/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Defining the Problem Gnutella uses a two-tier overlay. Improves scalability. Ultrapeers form an unstructured mesh. Leaf peers connect to the ultrapeers. eDonkey, FastTrack are similar. Studying the overlay requires snapshots. Snapshots capture the overlay as a graph. Individual snapshots reveal graph properties. Consecutive snapshots reveal dynamics. However, capturing accurate snapshots is difficult. Top-level overlay Leaf Ultrapeer
4
Slide 4/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Challenges in Capturing Accurate Snapshots Snapshots are captured iteratively by a crawler. An ideal snapshot is instantaneous. But the overlay is large and rapidly changing. Therefore, captured snapshots are distorted. Sampling: Partial snapshots are less distorted, but may be unrepresentative For some types of analysis, the whole graph is needed. Previous studies capture either: Complete snapshots slowly, or Partial snapshots.
5
Slide 5/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Cruiser: a Fast Gnutella Crawler Features: Distributed, highly parallelized implementation Dynamic adaptation to bandwidth and CPU constraints Cruiser is orders of magnitude faster. Captures one million nodes in around 7 minutes 140,000 peers/min, compared to 2,500 peers/min [Saroiu 02] We investigated the effects of speed on distortion. Daniel Stutzbach and Reza Rejaie, “Capturing Accurate Snapshots of the Gnutella Network”, the Global Internet Symposium, March, 2005. 4% node distortion 15% edge distortion
6
Slide 6/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Data Set More than 80,000 snapshots, over the past year. To examine static properties, we focus on four: To examine dynamic properties, we use slices: Each slice is 2 days of ~500 back-to-back snapshots Captured starting 10/14/04, 10/21/04, 11/25/04, 12/21/04, and 12/27/04 DateTotal NodesLeavesUltrapeersTop-level Edges 9/27/04725,120614,912110,2081,212,772 10/11/04779,535662,568116,9671,244,219 10/18/04806,948686,719120,2291,331,745 2/2/051,031,471873,130158,3451,964,121
7
Slide 7/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Summary of Characterizations Graph Properties Implementation heterogeneity Degree Distribution: Top-level degree distribution Ultrapeer-leaf connectivity Degree-distance correlation Reachability: Path lengths Eccentricity Small world properties Resiliency Dynamic Properties Existence of stable core: Uptime distribution Biased connectivity Properties of stable core: Largest connected component Path lengths Clustering coefficient
8
Slide 8/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Top-level Degree This is the degree distribution among ultrapeers. There are obvious peaks at 30 and 70 neighbors. A substantial number of ultrapeers have fewer than 30. What happened to the power-law seen in prior studies? Max 30 in most clients Max 75 in some clients Custom
9
Slide 9/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P What happened to power-law? When a crawl is slow, many short-lived peers report long-lived peers as neighbors. However, those neighbors are not all present at the same time. Degree distribution from a slow crawl resembles prior results. [Ripeanu 02 ICJ]
10
Slide 10/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Shortest-Path Distances Distribution of distances among ultrapeers and among all peers In the top-level, 70% of distances are exactly 4 hops. Across all peers, most distances are 5 or 6 hops. Shows the effect of the two-tier with multiple parents Despite large size, distances are short.
11
Slide 11/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Small worlds arise naturally in many places. Movies actors, power grid, co-authors of papers They have short distances, but significant clustering, compared to a similar random graph. Conclusion: Gnutella is a small world. Very high clustering adversely affects flooding queries But Gnutella isn’t clustered enough to affect performance. Is Gnutella a Small World? Mean Distance Clustering Coefficient Gnutella4.20.018 Random3.80.00038
12
Slide 12/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Resiliency to Node Failure After removing nodes, this figure shows how many remain connected. The Gnutella topology is extremely resilient to random node failure. It’s resilient even when the highest-degree nodes are removed first. Complex algorithms are not necessary for ensuring resilience. Random Highest degree first
13
Slide 13/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P What about Dynamic Properties? Prior work suggests many peers are short-lived while others are very long-lived. How do these nodes interact? Methodology: Capture a long series of back-to-back snapshots Annotate the last snapshot with the uptime of each peer Examine the properties of the annotated topology Group peers by uptime Time Newly arrived peer Departed peer Present for 2 snapshots Present for 5 snapshots
14
Slide 14/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Stable Core Most peers are recent arrivals. Other peers have been around for a long time. We can select a set of peers based on a minimum uptime threshold. We call this the stable core. Does the longevity of a peer affect who its neighbors are? > 20 h > 10 h
15
Slide 15/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Biased Connectivity Hypothesis: long-lived nodes tend to be more connected to other long-lived nodes Rationale: Once connected, they stay connected. The longer they’re around, the more opportunities they have to neighbor. Approach: Check for biased connectivity Randomize the edges to create a graph without biased connectivity Then compare Are there more edges in the observed stable core compared to random?
16
Slide 16/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Stable Core Edges 20%—40% more edges in the stable core compared to random. There is an onion-like bias where long-lived peers are more likely to be connected to other long-lived peers. We examined other properties of the stable core. Despite high churn, there is a relatively stable “backbone”.
17
Slide 17/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Summary Characterizations of recent and accurate snapshots Graph properties: The degree distribution in Gnutella is not power law. Gnutella exhibits small world characteristics. Gnutella is resilient. Dynamic properties: There is a stable core within the topology Peer churn causes the stable core to have an onion-like shape. This effect is likely to occur in any unstructured system.
18
Slide 18/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Future Work Examining long-term trends in Gnutella using many snapshots. Characterizing churn Characterizing properties of other widely- deployed P2P systems Kad (a DHT with more than 1 million users) BitTorrent Developing sampling techniques for P2P
19
Slide 19/18 IMC 2005 October 19 th, 2005http://mirage.cs.uoregon.edu/P2P Ultrapeer->Leaf Degree LimeWire ultrapeers have a limit of 30 leaf peers. BearShare ultrapeers have a limit of 45 leaf peers. There are distinct spikes at those points, with an even distribution of fewer leaf peers. LimeWire BearShare Other Custom
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.