Download presentation
Presentation is loading. Please wait.
1
We developed a fast and tunable crawler, Cruiser. Cruiser uses a master-slave architecture, parallel crawling, and leverages the two-tier topology adopted in popular P2P applications. Cruiser captures a Gnutella snapshot with 1-million nodes in around 7 minutes (140,000 peers/min). Cruiser enables us to examine the effect of various crawling parameters on snapshot accuracy. There are two dimensions of snapshot accuracy: 1)Completeness: the fraction of the topology captured 2)Distortion: the percentage difference between the snapshot and the real topology Evaluating the Accuracy of Captured Snapshots by Peer-to-Peer Crawlers 3. Two-Tier Topologies Gnutella, FastTrack and eDonkey use a two- tier overlay topology. Top-level nodes form the core overlay. Leaves connect to a few top-level nodes. We initially focus on Gnutella. 1. Motivation Characterizing graph-related properties of individual snapshots of overlay topology: 1)Degree distribution 2)Resiliency of overlay to node departure 3)Distribution of pair-wise path lengths 4)Small-world properties 5. Future Work Peer-to-Peer (P2P) applications have millions of users and make up a significant and growing fraction of Internet traffic. Little is known about the properties and dynamics of unstructured overlays in deployed P2P applications. Characterization of P2P overlays requires capturing accurate and fine-grain snapshots of the overlays. Snapshots (as graphs) are captured with a crawler, recording peers (as nodes) & connections (as edges). Captured snapshots by a crawler can be distorted (or stretched) for two reasons: Dynamic changes of the overlay during a crawl Peers unreachable by the crawler Previous studies used slow crawlers and have not examined the accuracy of their snapshots: 8K peers, speed: 133 peers/min, in 1 hr [Clip2 00] 30K peers, speed: 250 peers/min, in 2+ hrs [Ripeanu 02] However, average peer uptime is just minutes! 2. Approach Characterizing dynamics of overlay topologies: 1)Peer departure/arrival (or churn) 2)Changes in connectivity among peers 3)Properties of long-lived versus short-lived peers Building an overlay topology generator for simulation The incremental value of contacting more peers indicates snapshots are reasonably complete (Fig. 1). Top-level peers are discovered quickly. Leaf nodes and top-level links are well-discovered by the end of the crawl. There is a fundamental tradeoff between completeness and granularity. Longer crawls are more complete but reduce granularity for studying dynamics. There is a sweet spot (Fig. 2). Distortion and granularity are determined primarily by crawl speed. (Fig. 3) Decreasing speed significantly increases distortion Cruiser captures complete & accurate snapshots. (Fig. 1, 2 & 3) Distorted snapshots lead to inaccurate characterization of overlay topology (Fig. 4). A slow crawler reports a power-law tail for node degree distribution. Distortion and Speed 4. Results Completeness- Granularity Tradeoff Completeness Effects on Derived Characterization Fig. 1 Fig. 4 Fig. 3 Fig. 2 Daniel Stutzbach and Reza Rejaie – University of Oregon http://mirage.cs.uoregon.edu/P2P
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.