1 “Hybrid Search Schemes for Unstructured Peer- to-Peer Networks” “Random Walks in Peer-to-Peer Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi.

Slides:



Advertisements
Similar presentations
1 Complex Networks: Connectivity and Functionality Milena Mihail Georgia Tech.
Advertisements

Peer-to-Peer and Social Networks An overview of Gnutella.
Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.
Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen Scott Shenker This is a modified version of the original presentation by the authors.
IEEE ICDCS, Toronto, Canada, June 2007 (LA-UR ) 1 Scale-Free Overlay Topologies with Hard Cutoffs for Unstructured Peer-to-Peer Networks Hasan Guclu.
1 Routing Techniques in Wireless Sensor networks: A Survey.
Modeling and Analysis of Random Walk Search Algorithms in P2P Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE, Rensselaer Polytechnic Institute.
Technion –Israel Institute of Technology Computer Networks Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
1 Algorithmic Performance in Power Law Graphs Milena Mihail Christos Gkantsidis Christos Papadimitriou Amin Saberi.
On the Spread of Viruses on the Internet Noam Berger Joint work with C. Borgs, J.T. Chayes and A. Saberi.
Analysis of Network Diffusion and Distributed Network Algorithms Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization.
Small-world Overlay P2P Network
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Scalable Application Layer Multicast Suman Banerjee Bobby Bhattacharjee Christopher Kommareddy ACM SIGCOMM Computer Communication Review, Proceedings of.
Dynamic Hypercube Topology Stefan Schmid URAW 2005 Upper Rhine Algorithms Workshop University of Tübingen, Germany.
Building Low-Diameter P2P Networks Eli Upfal Department of Computer Science Brown University Joint work with Gopal Pandurangan and Prabhakar Raghavan.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.
LSDS-IR’08, October 30, Peer-to-Peer Similarity Search over Widely Distributed Document Collections Christos Doulkeridis 1, Kjetil Nørvåg 2, Michalis.
Advanced Topics in Data Mining Special focus: Social Networks.
Vassilios V. Dimakopoulos and Evaggelia Pitoura Distributed Data Management Lab Dept. of Computer Science, Univ. of Ioannina, Greece
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
Searching in Unstructured Networks Joining Theory with P-P2P.
CS401 presentation1 Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility Takahiro Hara Presented by Mingsheng Peng (Proc. IEEE.
Scalable Construction of Resilient Overlays using Topology Information Mukund Seshadri Dr. Randy Katz.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
1 Topology Control of Multihop Wireless Networks Using Transmit Power Adjustment Infocom /12/20.
COCONET: Co-Operative Cache driven Overlay NETwork for p2p VoD streaming Abhishek Bhattacharya, Zhenyu Yang & Deng Pan.
Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.
IEEE P2P, Aachen, Germany, September Ad-hoc Limited Scale-Free Models for Unstructured Peer-to-Peer Networks Hasan Guclu
1 Algorithmic Performance in Complex Networks Milena Mihail Georgia Tech.
1 BitHoc: BitTorrent for wireless ad hoc networks Jointly with: Chadi Barakat Jayeoung Choi Anwar Al Hamra Thierry Turletti EPI PLANETE 28/02/2008 MAESTRO/PLANETE.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology The Weak Law and the Strong.
CCAN: Cache-based CAN Using the Small World Model Shanghai Jiaotong University Internet Computing R&D Center.
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
1 “Erdos and the Internet” Milena Mihail Georgia Tech. The Internet is a remarkable phenomenon that involves graph theory in a natural way and gives rise.
Project funded by the Future and Emerging Technologies arm of the IST Programme Analytical Insights into Immune Search Niloy Ganguly Center for High Performance.
1 “Expansion” in Power Law and Scale Free Graphs Milena Mihail Georgia Tech with Christos Gkantsidis, Christos Papadimitriou and Amin Saberi.
Many random walks are faster than one Noga AlonTel Aviv University Chen AvinBen Gurion University Michal KouckyCzech Academy of Sciences Gady KozmaWeizmann.
15-853:Algorithms in the Real World
On Heterogeneous Overlay Construction and Random Node Selection in Unstructured P2P Networks Presenter: 游創文.
By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search in Unstructured P2p.
Gerhard Haßlinger Search Methods in Dynamic Wireless Networks  Challenges for search in wireless networks  Random walks and flooding for search with.
A Framework for Reliable Routing in Mobile Ad Hoc Networks Zhenqiang Ye Srikanth V. Krishnamurthy Satish K. Tripathi.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Project funded by the Future and Emerging Technologies arm of the IST Programme Search in Unstructured Networks Niloy Ganguly, Andreas Deutsch Center for.
Privacy Preserving in Social Network Based System PRENTER: YI LIANG.
1 Milena Mihail Georgia Tech. Algorithmic Performance in Complex Networks.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Random Walk for Similarity Testing in Complex Networks
Chapter 5. Greedy Algorithms
Sequential Algorithms for Generating Random Graphs
On Growth of Limited Scale-free Overlay Network Topologies
Minimum Spanning Tree 8/7/2018 4:26 AM
Complex Networks: Connectivity and Functionality
Peer-to-Peer and Social Networks
Peer-to-Peer and Social Networks
On the effect of randomness on planted 3-coloring models
Joydeep Chandra, Santosh Shaw and Niloy Ganguly
Presentation transcript:

1 “Hybrid Search Schemes for Unstructured Peer- to-Peer Networks” “Random Walks in Peer-to-Peer Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th, 2007

2 “Hybrid Search Schemes for Unstructured Peer- to-Peer Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi

3 Outline Random Graph Models Flooding and Normalization Random Walks and Replication Generalized Search Schemes Experimental evaluation

4 Motivation Flooding + small time-to-live (TTL) performs well in regular graphs Performance metric: number of exchanged messages/distinct response Its performance decreases: when TTL increases or for irregular networks Random Walk performs better than flooding scalability, granularity Hybrid + Generalized search schemes : Random Walks with lookahead, Random Walks with 1-step replication

5 Contribution Random walks (RW) with shallow flooding offer good performance (analytic justification) R1: In a random graph model with O(n) nodes of constant degree and O(n 1/2 ) nodes of degree O(n 1/2 ) the expected time to discover Ω(n) is O(n 1/2 ). R2: Random Walks with look-ahead 1 or 1-step replication perform better when there is discrepancy on the degrees of the underlying topology. Normalized Flooding (NF) solution R3: NF achieves comparable performance to flooding in regular graphs. R4: NF with 1-step replication achieves performance comparable to RW with 1-step replication. R5: Local information of the network (nodes degree) offers global benefit. Generalized Search Schemes

6 Random Graph Models Random Regular Graphs – G n,d G n,d represents a graph with n nodes and each node is of degree d. G n,d has a sum of degree D = nd. Random Graphs with super-nodes - G n,d,α,β Given α and β constants, G n,d,α,β denotes a graphs with αn 1/2 of degree βn 1/2 (i.e. large vertices) and the remaining nodes of degree d (i.e. small vertices). G n,d,α,β has a sum of degree D = (αβ+d)n.

7 Flooding and Normalization Theorem 3.1.: Let us consider G n,d random regular graph, flooding scenario from node v with time-to-live τ, S – the number of distinct nodes queried by flooding with |S| ≤ |V| / 2 Claims: (1) (2) (3)

8 (1) Proof:

9 (2) Proof:

10

11 (3) Proof:

12 Flooding and Normalization Theorem 3.2.: Let G n,d,α,β be a random graph with supernodes and a flooding scenario from node v of degree d with time-to-live τ. Claim: For some τ = O(log log n), the number of distinct responses is Ω(n). Proof: Consider flooding with τ = c log d-1 (log n)+1 and vertices visited with TTL τ-1. Assumption: this set (of visited nodes) doesn’t contain a large degree vertex. From d -regular graphs we know that this set contains at least (d - 1) τ-1 edges. The probability that no vertex in Γ(S τ-1 (v)) is bounded by (d/(d+αβ)) (d - 1)^(τ-1) = (d/(d+αβ)) clog n so within the first O(loglog n) steps we see a large vertex.

13 Flooding and Normalization Theorem 3.3. : Let G n,d,α,β be a random graph with supernodes, a normalized flooding scenario from node v with TTL. Then the number of distinct responses is Ω((d - 1) τ-1 ) and the number of messages per response is O(1). Proof: From Theorem 3.1. the number of minigroups seen is (d - 1) τ-1 The expected number of small vertices is Q = (d *(d - 1) τ-1 )/(d+αβ) Let X i, i = 1,…,N be random variables with P[ X i =1]=p i and P[X i =0]=1-p i Using the above Chernoff bound the probability that less than Q/2 are seen is vanishingly small.

14 Random Walks and Replication Random Walk with Look-Ahead: a random walk with shallow flooding on each step of the walk RW with lookahead 1 visits Ω(n) nodes with response O(n^(1/2)) Theorem 4.2.: Let G n,d,α,β be a random graph with supernodes and consider a random walk from a node v. Then, in 1-step replication scenario, the expected number of messages and response time to obtain distinct responses is

15 Theorem 4.3. : Let G n,d,α,β be a random graph with supernodes and consider Normalized flooding from v with TTL τ ≈ (log n)/(2*log(d-1)). Then, in 1-step replication scenario, the number of distinct responses is at least and the number of messages is at most Proof: The number of minigroups seen is (d - 1) τ – 1 and using the Chernoff bounds there will be minigroups corresponding to large vertices.

16 Generalized Search Schemes Searching procedure: A node of degree d initiates a search based on a budget k budget = number of messages that are propageted in the network Among its d neighbors the node picks certain quantities k 1,k 2,…,k d such that k 1 + k 2 + … + k d = k For every neighbor i the master node forwards the message with budget k i ( for k i = 0 the message is not transmitted ) Each neighbor i reduces the budget by 1 unit and repeat the process until the budget is greater than 0 Every node that receives the message for the second yime from another neighbor forwards the message with the corresponding budget Random Walks + Flooding

17 Experimental Evaluation Methodology –Performance Metrics Median and Mean number of distinct peers discovered (hits) Minimum, Maximum, Standard Deviation of the number of hits Number of messages Granularity of number of messages Response time –Topologies Random d-Regular Graphs Power Law Graphs Bimodal topologies Clustered topologies

18 Normalized Flooding (NF) Mean number of unique peers discovered as a function of the initial TTL NF and Standard Flooding behave similarly in Regular Graphs NF controls the number of messages and provides higher efficiency

19 Normalized Flooding (NF) The number of unique peers increases exponentially with TTL in NF case The number of peers increases faster than exponentially with TTL in topologies with high degrees

20 Random Walk with 1-step replication

21 Random Walk with LookAhead (RWLA) RWLA performance is similar to long RW without lookahead (in terms of unique peers discovered) RWLA response time is much smaller compared to standard RW

22 Edge Criticality & Searching with weights Generalized Searching performs similarly to Standard Flooding in regular graphs Generalized Searching behaves similarly to Standard Flooding in other topologies if normalized edge criticality is used.

23 Conclusions Normalized Flooding (NF) could substitute the Standard Flooding in irregular graphs RW with 1-step replication performs better than RW and NF in irregular graphs Open for improvements: Generalized schemes (analytic investigation) Quantifying Directional flooding

24 “Random Walks in Peer-to-Peer (P2P) Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi

25 Outline Motivation Statistical Estimation and Random Walks (RW) Searching Methodology and Topologies importance Construction and Summary

26 Motivation Random Walks (RW) were proposed for constructing searching and topology maintenance protocols in P2P networks RW improve searching performance as compared to flooding (Cao et al., 2002) A RW approach to constructing and maintaining unstructured topologies provides good connectivity properties ( i.e. constant degree, constant expansion) Claim: RW approach is a good candidate to simulate uniform sampling the number of simulation steps required can be as low as the number of samples in independent uniform sampling Searching and Overlay Topology Construction RW searching performs better than flooding for the same number of messages and for cluster and slow dynamic topologies Construction of P2P networks by random walks

27 Statistical Estimation & Random Walks Coupon collection and Chernoff bounds n - type of coupons & each time one is drawn (uniformly distributed) T n - time by which we extracted coupons belonging to all n types T αn - time by which we encountered αn distinct types, 0 < α < 1 X 1,…,X k independent Bernoulli trials, P[X i =1]=p i and P[X i =0]=1-p i p - probability that a random drawn object has a particular property the probability that the property is found in substantially fewer draws than its frequency in the search space and the quality of the estimator X/k are bounded by

28 Statistical Estimation & Random Walks Random Walks (RW), Convergence and Cover Time G = (V,E) undirected graph, |V| = n, and d i - degree of vertex I A ij - adjacency matrix, P - transition matrix which satisfies f: V→{0,1} which satisfies Convergence rate metric - the rate at which the RW approaches the stationary distribution Cover time metric - the time by which all nodes were visited Trajectory sample average - the rate at which the value of f averaged over successive vertices of the RW trajectory approaches p

29 Statistical Estimation & Random Walks Convergence rate is related to the second eigenvalue of P (1) y t – the vertex that the RW visited at time t Cover time (2) Trajectory sample average (3) (1) :[ 11], (2) :[ 12, 13], (3) :[ 3, 4, 5, 6]

30 Statistical Estimation & Random Walks Second Eigenvalue, Expansion and Conductance S subset of V, C(S) cutset of V ( i.e. edges with one point in S and the other one in V\S ), vol(S) ( i.e. the sum of degrees of vertices in S) Expansion Conductance Known bound [ 11, 14, 15, 16, 17, 18, 19]

31 Searching Performance metrics for Flooding and RW average number of distinct copies of an item located in the search number of messages used by the searching algorithm RW performs better than flooding if multiple search requests for the same item with slow-changing topology peer clustering ( see [20, 21, 22, 23, 24, 25] for details) Searching analysis Methodology Flat topologies with Uniformly Distributed Content Topologies with Peer Clustering Re-issuing the Same Query Real topologies

32 Searching - Methodology Performance Metrics mean of the number of distinct copies (i.e. Mean ) discrepancy around the mean (i.e. Std ) and the failure probability Cost number of messages or queries performed during search Peer-to-peer topologies ( ≈ 1 million nodes) Flat regular expanders, Two tier topologies with clustering, Power law graphs, Samples from real topologies Dynamic topologies rewiring Content placement Content clustering affects the performance of searching

33 Searching – Flat Topologies Experiment: one request in a network of 500K peers Mean hits, Minimum # of hits and Std are similar for Flooding and RW the entire distribution of hits is similar for Flooding and RW

34 Searching -Topologies with Peer Clustering Cluster topology consists of 5 flat regular graphs of size 40K; from each one pick randomly 1000 nodes to construct another flat regular graph Number of hits for RW is more concentrated around the mean compared to Flooding

35 Searching - Reissuing the Same Query Experiment setup – repeat 4 times the below procedure each peer sends a request and waits for response between requests 2% of the links are rewired each peer initiates a new searching RW have better performance than Flooding Mean Hits and Failure Probability

36 Searching - Reissuing the Same Query Performance of successive searches depends on the number of topology changes considered between consecutive searches Performance of Flooding increases as the rate of topological changes increases RW Performance remains the same for small variations

37 Searching – Real Topologies The number of hits for RW is more concentrated around the mean than in Flooding P2P have good expansion properties

38 Construction P2P network construction concerns with: peers arrive and leave the network dynamically strong and weak decentralization low network overhead per addition or deletion

39 Baseline Construction of Expander Graphs A BASE (undirected graph) consists of: n vertices where each one chooses randomly d vertices total number of edges = nd and expected vertex degree = 2d Theorem 4.1. Let G(V,E) a graph constructed by A BASE. Then, G is an expander with high probability and for positive constant α < 1

40 Baseline Construction of Expander Graphs with Constant Overhead in Random Bits A’ BASE construction algorithm: start a RW at a random vertex on H (constant degree expander graph) when A BASE needs a random number this is taken from the RW on H Theorem 4.2. Let G(V,E) a graph constructed by A’ BASE. There are positive constants α, 0 < β < 0.5 such that any subset S of at least β|V| and at most 0.5|V| has cutset expansion α almost surely.

41 Distributed Construction of Expanders with Constant Overhead on Network Resources A’ H – construction d daemons, one for each Hamilton cycle a new arriving node, it contacts the daemon associated with the i -th Hamilton cycle it attaches after c number of steps between the peer that currently hosts daemon i and one of its neighbors in the cycle i

42 Distributed Construction of Expanders with Constant Overhead on Network Resources A’ M – construction d daemons, one for each Hamilton cycle the arrival of a new arriving node consists of two X and Y nodes; X and Y contact the central server to discover the location of the d daemons X becomes the neighbor of daemon i and Y the neighbor of the initial daemon’s neighbor

43 Summary For Searching Random Walks (RW) are superior to Flooding For Construction RW add new peers with constant overhead Open Problems Strong Decentralized Construction algorithm Can we handle better deletions and expansions of small sets? How the P2P network parameters (e.g. capacities) affect the performance of RW?