On Unbiased Sampling for Unstructured Peer-to-Peer Networks Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Nick Duffield –

Slides:



Advertisements
Similar presentations
Characterizing Overlay Topologies & Dynamics in Peer-to-Peer Networks Daniel Stutzbach, Reza Rejaie University of Oregon Subhabrata Sen AT&T Labs IEEE.
Advertisements

Peer-to-Peer and Social Networks An overview of Gnutella.
Respondent-driven Sampling for Characterizing Unstructured Overlays A. H. Rasti University of Oregon M. Torkjazi R. Rejaie N. Duffield AT&T Labs - Research.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Highly-Resilient, Energy-Efficient Multipath Routing in Wireless Sensor Networks Computer Science Department, UCLA International Computer Science Institute,
Walter Willinger AT&T Research Labs Reza Rejaie, Mojtaba Torkjazi, Masoud Valafar University of Oregon Mauro Maggioni Duke University HotMetrics’09, Seattle.
Amir Rasti Reza Rejaie Dept. of Computer Science University of Oregon.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.
QBM117 Business Statistics Statistical Inference Sampling 1.
Small-world Overlay P2P Network
1 Statistical Inference H Plan: –Discuss statistical methods in simulations –Define concepts and terminology –Traditional approaches: u Hypothesis testing.
Evaluating Search Engine
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
An Analysis of the Optimum Node Density for Ad hoc Mobile Networks Elizabeth M. Royer, P. Michael Melliar-Smith and Louise E. Moser Presented by Aki Happonen.
UNIVERSITY OF JYVÄSKYLÄ Yevgeniy Ivanchenko Yevgeniy Ivanchenko University of Jyväskylä
Improving Lookup Performance over a Widely-Deployed DHT Daniel Stutzbach Reza Rejaie The ION P2P Project University of.
Taming Dynamic and Selfish Peers “Peer-to-Peer Systems and Applications” Dagstuhl Seminar March 26th-29th, 2006 Stefan Schmid Distributed Computing Group.
Improving ISP Locality in BitTorrent Traffic via Biased Neighbor Selection Ruchir Bindal, Pei Cao, William Chan Stanford University Jan Medved, George.
 We developed a fast and tunable crawler, Cruiser.  Cruiser uses a master-slave architecture, parallel crawling, and leverages the two-tier topology.
1 Denial-of-Service Resilience in P2P File Sharing Systems Dan Dumitriu (EPFL) Ed Knightly (Rice) Aleksandar Kuzmanovic (Northwestern) Ion Stoica (Berkeley)
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
1 Energy-Efficient localization for networks of underwater drifters Diba Mirza Curt Schurgers Department of Electrical and Computer Engineering.
1 SLIC: A Selfish Link-based Incentive Mechanism for Unstructured P2P Networks Qixiang Sun Hector Garcia-Molina Stanford University.
Characterizing the Two-Tier Gnutella Topology  Gnutella, FastTrack, and eDonkey use two-tier overlay topologies.  Our initial study focuses on Gnutella.
Data Mining.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao Cisco Systems, Inc. (Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker)
Vassilios V. Dimakopoulos and Evaggelia Pitoura Distributed Data Management Lab Dept. of Computer Science, Univ. of Ioannina, Greece
Understanding Churn in Peer-to-Peer Networks Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Internet Measurement Conference.
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.
6/28/2015Reza Rejaie INFOCOM 07 1 Nazanin Magharei, Reza Rejaie University of Oregon PRIME: P2P Receiver-drIven MEsh based.
Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon.
1 Characterizing Selfishly Constructed Overlay Routing Networks March 11, 2004 Byung-Gon Chun, Rodrigo Fonseca, Ion Stoica, and John Kubiatowicz University.
Minas Gjoka, UC IrvineWalking in Facebook 1 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant ‡, Carter Butts,
Amir Rasti Daniel Stutzbach Reza Rejaie The ION P2P Project University of Oregon On the Long-term Evolution of the Two-Tier.
UNIVERSITY OF JYVÄSKYLÄ Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on Mikko Vapa, research student.
Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 2 Ryan Kinworthy CSCE Advanced Constraint Processing.
On Self Adaptive Routing in Dynamic Environments -- A probabilistic routing scheme Haiyong Xie, Lili Qiu, Yang Richard Yang and Yin Yale, MR and.
Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.
Multiple testing correction
Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou 1Multigraph sampling.
The Research Enterprise in Psychology. The Scientific Method: Terminology Operational definitions are used to clarify precisely what is meant by each.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
CCAN: Cache-based CAN Using the Small World Model Shanghai Jiaotong University Internet Computing R&D Center.
Multimedia Computing & Networking Shanyu Zhao, Daniel Stutzbach, Reza Rejaie Multimedia & Internetworking Research Group (Mirage) Computer & Information.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
PRIME: P2P Receiver-drIven MEsh based Streaming Nazanin Magharei, Reza Rejaie University of Oregon Presenter Jungsik Yoon.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.
Sampling Techniques for Large, Dynamic Graphs Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Nick Duffield – AT&T Labs—Research.
Evaluating VR Systems. Scenario You determine that while looking around virtual worlds is natural and well supported in VR, moving about them is a difficult.
Gerhard Haßlinger Search Methods in Dynamic Wireless Networks  Challenges for search in wireless networks  Random walks and flooding for search with.
1 Page Quality: In Search of an Unbiased Web Ranking Presented by: Arjun Dasgupta Adapted from slides by Junghoo Cho and Robert E. Adams SIGMOD 2005.
Peter R Pietzuch and Jean Bacon Peer-to-Peer Overlay Networks in an Event-Based Middleware DEBS’03, San Diego, CA, USA,
Internet research Needs Better Models Sally Floyd, Eddie Kohler ISCI Center for Internet Research, Berkeley, California Presented by Max Podlesny.
A Simulation-Based Study of Overlay Routing Performance CS 268 Course Project Andrey Ermolinskiy, Hovig Bayandorian, Daniel Chen.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
Large-Scale Monitoring of DHT Traffic Ghulam Memon – University of Oregon Reza Rejaie – University of Oregon Yang Guo – Corporate Research, Thomson Daniel.
Performance Comparison of Ad Hoc Network Routing Protocols Presented by Venkata Suresh Tamminiedi Computer Science Department Georgia State University.
Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer.
1 “Hybrid Search Schemes for Unstructured Peer- to-Peer Networks” “Random Walks in Peer-to-Peer Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
Peer-to-Peer and Social Networks
Chapter 10 Verification and Validation of Simulation Models
Early Measurements of a Cluster-based Architecture for P2P Systems
GIA: Making Gnutella-like P2P Systems Scalable
Presentation transcript:

On Unbiased Sampling for Unstructured Peer-to-Peer Networks Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Nick Duffield – AT&T Labs—Research Subhabrata Sen – AT&T Labs—Research Walter Willinger – AT&T Labs—Research Internet Measurement Conference Rio de Janeiro, Brazil October 25 th, 2006

Daniel Stutzbach The ION P2P Project 2/19 Motivation P2P systems are very popular in practice. Several million simultaneous users collectively. 60% of all Internet traffic [CacheLogic Research 2005] Measurement studies aid understanding existing systems and user behavior. Capturing a accurate global picture is often infeasible. P2P systems are distributed, large, and rapidly changing. Capturing a global picture is time-consuming, resulting in a blurry picture. Sampling is a natural approach, and has been used implicitly in most earlier P2P measurement studies. But how do we know the samples are representative?

Daniel Stutzbach The ION P2P Project 3/19 The Problem We focus on sampling peer properties. Number of neighbors (degree) Link bandwidth Number of shared files Remaining uptime Sampling peer properties occurs in two steps: Discover and select peers Collect measurements from the selected peers Selecting peers uniformly at random is hard. Temporal: Peer dynamics can introduce bias. Topological: The graph topology can introduce bias. We first examine these two problems in isolation. We then examine them together.

Daniel Stutzbach The ION P2P Project 4/19 Sampling with Dynamics Define V t as the set of peers present at time t. We gather samples over a measurement window of length Δ. The most common approach is to gather peers from the set present during the window:

Daniel Stutzbach The ION P2P Project 5/19 Bias towards Short-Lived Peers Time Short-lived peers Long-lived peer Consider a simple two-peer system, containing: One long-lived peer One rapidly-changing short-lived peer The common approach over-selects short-lived peers. Short-lived

Daniel Stutzbach The ION P2P Project 6/19 Handling Temporal Causes of Bias The common approach is intuitive but incorrect. Sampling peers is the wrong goal. We want to sample peer properties. Two samples from the same peer, but at different times, are distinct. Allow sampling the same peer more than once, at different points in time.

Daniel Stutzbach The ION P2P Project 7/19 Example of avoiding bias towards Short-Lived Peers Time Short-lived peers Long-lived peer Allowing re-selecting a peer solves the problem. The long-lived peer will be selected half the time, reflecting the actual state of the system. How do we select a peer uniformly at random at a particular moment? Short-lived

Daniel Stutzbach The ION P2P Project 8/19 Sampling from Static Graphs Assume for the moment a static graph… Goal: Select a peer uniformly from the graph Discover: Begin with one peer. Query peers to discover neighbors. Classic algorithms: Breadth-First Search, Depth- First Search Select: Choose a subset of discovered peers Gather samples from the selected peers

Daniel Stutzbach The ION P2P Project 9/19 Advantages of Random Walks Problems with classic approaches: Peers are correlated by their neighbor relationship Peers with higher degree discovered more often A peer can only be selected once. Random walks are a promising alternative: The information in the starting location is “lost” by repeatedly injecting randomness at each step. The results are biased, but the bias is precisely known. Random walks can implicitly visit the same peer twice.

Daniel Stutzbach The ION P2P Project 10/19 Random walks, formally Random walks can be described with a transition matrix, P(x,y). P(x,y) is the probability of moving from x to y: P r (x,y) is the probability of moving from x to y after r moves Random walks converge to a stationary distribution: Problem: we want a uniform distribution:

Daniel Stutzbach The ION P2P Project 11/19 The Metropolis—Hastings Method The Metropolis—Hastings method modifies the transition matrix to yield the desired distribution: Proven for static graphs Plugging in our P(x,y) and μ(x): Select a neighbor y of x uniformly at random Transition to y with probability deg(x) / deg(y) Otherwise, self-transition to x.

Daniel Stutzbach The ION P2P Project 12/19 Sampling from Dynamic Graphs Adapting to vanishing peers We maintain a stack of visited peers If a query times out, go back in the stack Hypothesis: A Metropolized random walk will yield approximately unbiased samples in practice. Trivially valid for extremely slowly changing graphs Trivially false for extremely rapidly changing graphs Where is the transition? Methodology: Session-level simulations of a wide variety of situations Determine what conditions lead to biased samples Do those conditions arise in practice?

Daniel Stutzbach The ION P2P Project 13/19 Metrics: Fundamental properties We focus on three fundamental properties that affect the walk: Degree Session length Query latency (in paper only) We compute the KS statistic (D) for each distribution versus a snapshot from an oracle. We evaluate these metrics under a variety of conditions: Several models of churn Several models of degree distribution Four different peer discovery mechanisms

Daniel Stutzbach The ION P2P Project 14/19 Base case Base case: Session length distribution is Weibull (k=0.59, λ=40) Maximum degree: 30 Target degree: 15 Peer discovery mechanism: FIFO rendezvous point Sampled and expected distributions are visually indistinguishable. Very low KS statistic: D < 0.004

Daniel Stutzbach The ION P2P Project 15/19 Varying churn Each point represents a simulation; y-axis show KS statistic (D) Error is low over a wide range of session lengths Becomes significant for median < 2 min High for median < 30 s Type of distribution does not have a large impact

Daniel Stutzbach The ION P2P Project 16/19 Varying topology Little bias when target degree > 2 Degree ≤ 2 means network fragmentation History mechanism bias is due to ~2% of peers with no neighbors. More simulation results in the paper

Daniel Stutzbach The ION P2P Project 17/19 Empirical results We developed the technique into a tool called ion-sampler. Available from our website bash$./ion-sampler gnutella --hops 25 -n : : : : : : : : : :44380

Daniel Stutzbach The ION P2P Project 18/19 Empirical validation Empirical validation is tricky because there is no perfect baseline for comparison. Full crawling performed by Cruiser [Stutzbach 05 IMC] The full crawl may be slightly biased towards higher degree Ion-sampler records slightly fewer higher degree peers than a full crawl Conclusion: ion-sampler is close to a full crawl in accuracy, and may even be more accurate!

Daniel Stutzbach The ION P2P Project 19/19 Conclusions and Future Work Summary Temporal and topological bias can lead to sampling error. We present the Metropolized Random Walk with Backtracking technique. Extensive simulations show that it gathers nearly unbiased samples in a wide variety of circumstances. Ion-sampler is a tool for gathering nearly unbiased samples from real P2P systems. Future work Explore improving sampling efficiency for uncommon events. Evaluate MRWB under flash crowd scenarios. Develop additional plug-ins for ion-sampler.