Presentation is loading. Please wait.

Presentation is loading. Please wait.

 We developed a fast and tunable crawler, Cruiser.  Cruiser uses a master-slave architecture, parallel crawling, and leverages the two-tier topology.

Similar presentations


Presentation on theme: " We developed a fast and tunable crawler, Cruiser.  Cruiser uses a master-slave architecture, parallel crawling, and leverages the two-tier topology."— Presentation transcript:

1  We developed a fast and tunable crawler, Cruiser.  Cruiser uses a master-slave architecture, parallel crawling, and leverages the two-tier topology adopted in popular P2P applications.  Cruiser captures a Gnutella snapshot with 1-million nodes in around 7 minutes (140,000 peers/min).  Cruiser enables us to examine the effect of various crawling parameters on snapshot accuracy.  There are two dimensions of snapshot accuracy: 1)Completeness: the fraction of the topology captured 2)Distortion: the percentage difference between the snapshot and the real topology Evaluating the Accuracy of Captured Snapshots by Peer-to-Peer Crawlers 3. Two-Tier Topologies  Gnutella, FastTrack and eDonkey use a two- tier overlay topology.  Top-level nodes form the core overlay.  Leaves connect to a few top-level nodes.  We initially focus on Gnutella. 1. Motivation  Characterizing graph-related properties of individual snapshots of overlay topology: 1)Degree distribution 2)Resiliency of overlay to node departure 3)Distribution of pair-wise path lengths 4)Small-world properties 5. Future Work  Peer-to-Peer (P2P) applications have millions of users and make up a significant and growing fraction of Internet traffic.  Little is known about the properties and dynamics of unstructured overlays in deployed P2P applications.  Characterization of P2P overlays requires capturing accurate and fine-grain snapshots of the overlays.  Snapshots (as graphs) are captured with a crawler, recording peers (as nodes) & connections (as edges).  Captured snapshots by a crawler can be distorted (or stretched) for two reasons:  Dynamic changes of the overlay during a crawl  Peers unreachable by the crawler  Previous studies used slow crawlers and have not examined the accuracy of their snapshots:  8K peers, speed: 133 peers/min, in 1 hr [Clip2 00]  30K peers, speed: 250 peers/min, in 2+ hrs [Ripeanu 02]  However, average peer uptime is just minutes! 2. Approach  Characterizing dynamics of overlay topologies: 1)Peer departure/arrival (or churn) 2)Changes in connectivity among peers 3)Properties of long-lived versus short-lived peers  Building an overlay topology generator for simulation  The incremental value of contacting more peers indicates snapshots are reasonably complete (Fig. 1).  Top-level peers are discovered quickly.  Leaf nodes and top-level links are well-discovered by the end of the crawl.  There is a fundamental tradeoff between completeness and granularity. Longer crawls are more complete but reduce granularity for studying dynamics. There is a sweet spot (Fig. 2).  Distortion and granularity are determined primarily by crawl speed. (Fig. 3)  Decreasing speed significantly increases distortion  Cruiser captures complete & accurate snapshots. (Fig. 1, 2 & 3)  Distorted snapshots lead to inaccurate characterization of overlay topology (Fig. 4).  A slow crawler reports a power-law tail for node degree distribution. Distortion and Speed 4. Results Completeness- Granularity Tradeoff Completeness Effects on Derived Characterization Fig. 1 Fig. 4 Fig. 3 Fig. 2 Daniel Stutzbach and Reza Rejaie – University of Oregon http://mirage.cs.uoregon.edu/P2P


Download ppt " We developed a fast and tunable crawler, Cruiser.  Cruiser uses a master-slave architecture, parallel crawling, and leverages the two-tier topology."

Similar presentations


Ads by Google