Replication Strategies in Unstructured Peer-to-Peer Networks Edith CohenScott Shenker Some slides are taken from the authors’ original presentation.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

Scalable and Dynamic Quorum Systems Moni Naor & Udi Wieder The Weizmann Institute of Science.
Data and Computer Communications
On the Steady-State of Cache Networks Elisha J. Rosensweig Daniel S. Menasche Jim Kurose.
Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen Scott Shenker This is a modified version of the original presentation by the authors.
An Approximate Truthful Mechanism for Combinatorial Auctions An Internet Mathematics paper by Aaron Archer, Christos Papadimitriou, Kunal Talwar and Éva.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
How Bad is Selfish Routing? By Tim Roughgarden Eva Tardos Presented by Alex Kogan.
On Large-Scale Peer-to-Peer Streaming Systems with Network Coding Chen Feng, Baochun Li Dept. of Electrical and Computer Engineering University of Toronto.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Fabian Kuhn, Microsoft Research, Silicon Valley
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Modeling and Analysis of Random Walk Search Algorithms in P2P Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE, Rensselaer Polytechnic Institute.
Technion –Israel Institute of Technology Computer Networks Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Search in Power-Law Networks Presented by Hakim Weatherspoon CS294-4: Peer-to-Peer Systems Slides also borrowed from the following paper Path Finding Strategies.
Small-world Overlay P2P Network
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Volcano Routing Scheme Routing in a Highly Dynamic Environment Yashar Ganjali Stanford University Joint work with: Nick McKeown SECON 2005, Santa Clara,
Data Broadcast in Asymmetric Wireless Environments Nitin H. Vaidya Sohail Hameed.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
1 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen, Scott Shenker ACM SIGCOMM Computer Communication Review, Proceedings of the.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
1 Maximizing Remote Work in Flooding-based P2P Systems Qixiang Sun Neil Daswani Hector Garcia-Molina Stanford University.
Adaptive Content Management in Structured P2P Communities Jussi Kangasharju Keith W. Ross David A. Turner.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao Cisco Systems, Inc. (Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker)
Making Gnutella-like P2P Systems Scalable Presented by: Karthik Lakshminarayanan Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, and Scott.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
Mercury: Supporting Scalable Multi-Attribute Range Queries A. Bharambe, M. Agrawal, S. Seshan In Proceedings of the SIGCOMM’04, USA Παρουσίαση: Τζιοβάρα.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Replication.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Searching in Unstructured Networks Joining Theory with P-P2P.
STAT 4060 Design and Analysis of Surveys Exam: 60% Mid Test: 20% Mini Project: 10% Continuous assessment: 10%
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
Application Layer – Peer-to-peer UIUC CS438: Communication Networks Summer 2014 Fred Douglas Slides: Fred, Kurose&Ross (sometimes edited)
1 Unstructured P2P overlay. 2 Centralized model  e.g. Napster  global index held by central authority  direct contact between requestors and providers.
DaVinci: Dynamically Adaptive Virtual Networks for a Customized Internet Jennifer Rexford Princeton University With Jiayue He, Rui Zhang-Shen, Ying Li,
Particle Filtering in Network Tomography
Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
CCAN: Cache-based CAN Using the Small World Model Shanghai Jiaotong University Internet Computing R&D Center.
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
DaVinci: Dynamically Adaptive Virtual Networks for a Customized Internet Jiayue He, Rui Zhang-Shen, Ying Li, Cheng-Yen Lee, Jennifer Rexford, and Mung.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search in Unstructured P2p.
Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06.
Aug 22, 2002Sigcomm 2002 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen AT&T Labs-research Scott Shenker ICIR.
Data Indexing in Peer- to-Peer DHT Networks Garces-Erice, P.A.Felber, E.W.Biersack, G.Urvoy-Keller, K.W.Ross ICDCS 2004.
1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,
Freenet: Anonymous Storage and Retrieval of Information
Peer-to-Peer Video Systems: Storage Management CS587x Lecture Department of Computer Science Iowa State University.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
P2P Networking: Freenet Adriane Lau November 9, 2004 MIE456F.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
-1/16- Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in Wireless Ad Hoc Networks C.-K. Toh, Georgia Institute of Technology IEEE.
1 “Hybrid Search Schemes for Unstructured Peer- to-Peer Networks” “Random Walks in Peer-to-Peer Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
Distributed Hash Tables (DHT) Jukka K. Nurminen *Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)
Multiway Search Trees Data may not fit into main memory
Peer-to-Peer Video Services
Presentation transcript:

Replication Strategies in Unstructured Peer-to-Peer Networks Edith CohenScott Shenker Some slides are taken from the authors’ original presentation

Search strategies in P2P Networks Search is performed by probing peers –Structured (DHTs): (Freenet, Chord,etc) location is coupled with topology - search is routed by the query. Only exact-match queries, tightly controlled overlay. –Unstructured: (Gnutella, FastTrack); search is “blind” - probed peers are unrelated to query. Resilient to transient peers; versatile queries

Replication in P2P architectures No proactive replication (Gnutella) –Hosts store and serve only what they requested –A copy can be found only by probing a host with a copy Proactive replication of “keys” (= meta data + pointer) for search efficiency (FastTrack, some DHTs) Proactive replication of “copies” – for search and download efficiency, anonymity. (Freenet)

How to use replication to improve search efficiency in unstructured networks with a proactive replication mechanism ? The Focus of this paper

Search and replication model Search: probe hosts, uniformly at random, until the query is satisfied (or the search max size is exceeded) Goal: minimize average search size (number of probes till query is satisfied) Replication: Each host can store up to r copies (or keys = metadata + pointer) of items. Unstructured networks with replication of keys or copies. Peers probed are unrelated to query/item. Success likelihood can not be better, on average, than random probes.

What is the search size of a query ? Insoluble queries: maximum search size Soluble queries: number of visited nodes until answer is found. We look at the Expected Search Size (ESS) of each item. The ESS is inversely proportional to the replication factor (fraction of peers with a copy of the item). Search size Query is soluble if there are sufficiently many copies of the item. Query is insoluble if item is rare or non existent.

Search Example 2 probes4 probes Looking for the green object Random walk is sometimes a popular tool

Expected Search Size (ESS) Allocation : p 1, p 2, p 3,…, p m ∑ i p i = 1 i th item is allocated p i fraction of storage. (keys placed in p i r fraction of hosts) Consider m items with relative query rates q 1 > q 2 > q 3 > … > q m. ∑ i q i = 1 Search size for i th item is a geometric random variable with mean Ai = 1/(ρ p i ). ESS is ∑ i q i A i = ( ∑ i q i / p i )/ρ

Model and notations

What is an allocation? Allocation means how many copies of each object will be made. A natural question is to relate this to the query rate. So Allocation:

Uniform and Proportional Replication Uniform Allocation: p i = 1/m Simple, resources are divided equally Proportional Allocation: p i = q i “Fair”, resources per item proportional to demand Reflects current P2P practices Example: 3 items, q 1 =1/2, q 2 =1/3, q 3 =1/6 UniformProportional

Basic Questions How do Uniform and Proportional allocations perform/compare ? Which strategy minimizes the Expected Search Size (ESS) ? Is there a simple protocol that achieves optimal replication in unstructured P2P networks ?

Insoluble queries Search always extends to the maximum limit size. If we fix the available storage for copies, the query rate distribution, and the number of items that we wish to be “locatable”, then The maximum required search size depends on the smallest allocation of an item. Thus, This means that uniform allocation minimizes this maximum and thus the cost induced by insoluble queries.

Soluble queries What about the cost of soluble queries? Answer is more surprising …

Soluble queries ESS of an object is derived from a geometric distribution. ESS of an object is inversely proportional to population density of that object. Thus ESS ∝ 1/p i. ρ

ESS under Uniform and Proportional Allocations (soluble queries) Lemma: The ESS under either Uniform or Proportional allocations is m/ρ –Independent of query rates (!!!) –Same ESS for Proportional and Uniform (!!!) Proportional: ESS is ( Σ i q i / p i )/ρ = ( Σ i q i / q i )/ρ = m/ρ Uniform: ESS is ( Σ i q i / p i )/ρ = ( Σ i m q i )/ρ = (m/ρ) Σ i q i = m/ρ Proof…

Space of Possible Allocations Proportional Allocation means q i+1 /q i = p i+1 /p i (“fair share”) Uniform Allocation means p i+1 /p i =1 In-between allocation is: (q i+1 /q i < p i+1 /p i ) and (p i+1 /p i <1) (less popular gets more than its “fair share”) Other possibilities: p i+1 /p i > 1 (more popular gets less) OR q i+1 /q i > p i+1 /p i (less popular gets less than “fair share”)

Space of Possible Allocations Theorem1: All (strictly) in-between strategies are (strictly) better than Uniform and Proportional Theorem2: p is worse than Uniform/Proportional if for all i, p i+1 /p i > 1 (more popular gets less) OR for all i, q i+1 /q i > p i+1 /p i (less popular gets less than “fair share”) Proportional and Uniform are the worst “reasonable” strategies (!!!)

So, what is the best strategy for soluble queries ?

Square-Root Allocation Consider a pair of items i, j: q i > q j; and parameterize the allocations by a variable x (0 ≤ x <1) pi = x (pi + pj), pj = (1-x) (pi + pj) Proportional => pi = qi, so x= qi / (qi + qj) Uniform => pi=pj= ½ ESS ∝ qi/x + qj/(1-x) [inversely proportional to population] (convex function) ESS equal for proportional & uniform ESS is minimum when x= sqrt(qi)/(sqrt(qi) + sqrt(qj)) So, pi ∝ sqrt(qi)

Square-Root Allocation p i is proportional to square-root(q i ) Lies “In-between” Uniform and Proportional Theorem: Square-Root allocation minimizes the ESS (on soluble queries)

q2/q1 p2/p1 Space of allocations on 2 items Worse than prop/uni More popular item gets less. Worse than prop/uni More popular gets more than its proportional share Better than prop/uni Uniform Proportional SQUARE R00T

How much can we gain by using SR ? Zipf-like query rates

SR is best for soluble queries Uniform minimizes cost of insoluble queries OPT is a hybrid of Uniform and SR Tuned to balance cost of soluble and insoluble queries. What is the optimal strategy?

UniformSR 10^4 items, Zipf-like w=1.5 All Soluble 85% Soluble All Insoluble

Replication Algorithms Fully distributed where peers communicate through random probes; minimal bookkeeping; and no more communication than what is needed for search. Converge to/obtain SR allocation when query rates remain steady. Uniform and Proportional are “easy” – Uniform: When item is created, replicate its key in a fixed number of hosts. – Proportional: for each query, replicate the key in a fixed number of hosts Desired properties of algorithm:

Model for Copy Creation/Deletion Creation: after a successful search of s, Cs new copies are created at random hosts. Deletion: is independent of the identity of the item; copy survival chances are non-decreasing with age. (i.e., FIFO at each node, per a TTL) average value of C used to replicate i th item. Claim: If / remains fixed over time, then p i /p j  q i /q j Property of the process:

Creation/Deletion Process If then Corollary: Algorithm for square-root allocation needs to have equal to or converge to a value inversely proportional to

SR Replication Algorithms Path replication: number of new copies C(s) is proportional to the size of the search (Freenet) –Converges to SR allocation (+reasonable conditions) –Convergence unstable with delayed creations Sibling memory: each copy remembers the number of sibling copies, –Quickly “on target” –For “good estimates” need to find several copies.

Algorithm 1: Path Replication Number of new copies produced per query,, is proportional to search size 1/p i Creation rate is proportional to q i Steady state: creation rate proportional to allocation p i, thus

Simulation Path replication Sibling number Hosts with copy time Delay = 0.25 * copy lifetime; hosts In this simulation there is delay of 25 time units in copy creation; the copy lifetime is 100 time units; and the inter-request time is 2.

Summary Random Search/replication Model: probes to “random” hosts Proportional allocation – current practice Uniform allocation – best for insoluble queries Soluble queries: Proportional and Uniform allocations are two extremes with same average performance Square-Root allocation minimizes Average Search Size OPT (all queries) lies between SR and Uniform SR/OPT allocation can be realized by simple algorithms.