Aug 22, 2002Sigcomm 2002 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen AT&T Labs-research Scott Shenker ICIR.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

Optimal Scheduling in Peer-to-Peer Networks Lee Center Workshop 5/19/06 Mortada Mehyar (with Prof. Steven Low, Netlab)
Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen Scott Shenker This is a modified version of the original presentation by the authors.
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Intel Research Seattle Sylvia Ratnasamy, Lee Breslau, Scott Shenker, and Nick Lanham.
Fabian Kuhn, Microsoft Research, Silicon Valley
Technion –Israel Institute of Technology Computer Networks Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Search in Power-Law Networks Presented by Hakim Weatherspoon CS294-4: Peer-to-Peer Systems Slides also borrowed from the following paper Path Finding Strategies.
Peer to Peer (P2P) Networks and File sharing. By: Ryan Farrell.
Open Problems in Data- Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang.
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
FRIENDS: File Retrieval In a dEcentralized Network Distribution System Steven Huang, Kevin Li Computer Science and Engineering University of California,
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
1 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen, Scott Shenker ACM SIGCOMM Computer Communication Review, Proceedings of the.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Adaptive Content Management in Structured P2P Communities Jussi Kangasharju Keith W. Ross David A. Turner.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao Cisco Systems, Inc. (Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker)
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Searching in Unstructured Networks Joining Theory with P-P2P.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
P2P File Sharing Systems
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
1 Unstructured P2P overlay. 2 Centralized model  e.g. Napster  global index held by central authority  direct contact between requestors and providers.
1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)
Peer-to-Peer Networking. Presentation Introduction Characteristics and Challenges of Peer-to-Peer Peer-to-Peer Applications Classification of Peer-to-Peer.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Replication Strategies in Unstructured Peer-to-Peer Networks Edith CohenScott Shenker Some slides are taken from the authors’ original presentation.
PSI Peer Search Infrastructure. Introduction What are P2P Networks? The term "peer-to-peer" refers to a class of systems and applications that employ.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search in Unstructured P2p.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06.
1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Freenet: Anonymous Storage and Retrieval of Information
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
Peer-to-Peer Video Systems: Storage Management CS587x Lecture Department of Computer Science Iowa State University.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
P2P Networking: Freenet Adriane Lau November 9, 2004 MIE456F.
Peer-to-Peer (P2P) File Systems. P2P File Systems CS 5204 – Fall, Peer-to-Peer Systems Definition: “Peer-to-peer systems can be characterized as.
Peer-to-Peer File Sharing Systems Group Meeting Speaker: Dr. Xiaowen Chu April 2, 2004 Centre for E-transformation Research Department of Computer Science.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
Distributed Hash Tables (DHT) Jukka K. Nurminen *Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Peer-to-Peer and Social Networks
EE 122: Peer-to-Peer (P2P) Networks
Peer-to-Peer (P2P) File Systems
Peer-to-Peer Video Services
Peer-to-Peer Storage Systems
Presentation transcript:

Aug 22, 2002Sigcomm 2002 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen AT&T Labs-research Scott Shenker ICIR

Aug 22, 2002Sigcomm 2002 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g., music, videos, etc.)

Aug 22, 2002Sigcomm 2002 (Search in) Basic P2P Architectures Centralized : central directory server. (Napster) Supports versatile queries, scope, legal troubles Decentralized: search is performed by probing peers –Structured (DHTs): (Freenet, Can, Chord,…) location is coupled with topology - search is routed by the query. Scope, Only exact-match queries, tightly controlled overlay. –Unstructured: (Gnutella, FastTrack); search is “blind” - probed peers are unrelated to query. Resilient to transient peers; versatile queries; Harsh scope/scalability tradeoff.

Aug 22, 2002Sigcomm 2002 (replication in) P2P architectures No proactive replication (Gnutella) –Hosts store and serve only what they requested –A copy can be found only by probing a host with a copy Proactive replication of “keys” (= meta data + pointer) for search efficiency (FastTrack, DHTs) Proactive replication of “copies” – for search and download efficiency, anonymity. (Freenet)

Aug 22, 2002Sigcomm 2002 Question: how to use replication to improve search efficiency in unstructured networks with a proactive replication mechanism ?

Aug 22, 2002Sigcomm 2002 Search and replication model Search: probe hosts, uniformly at random, until the query is satisfied (or the search max size is exceeded) Goal: minimize average search size (number of probes till query is satisfied) Replication: Each host can store up to  copies (or keys=metadata+pointer) of items. Unstructured networks with replication of keys or copies. Peers probed (in the search and replication process) are unrelated to query/item - Probe success likelihood can not be better, on average, than random probes.

Aug 22, 2002Sigcomm 2002 What is the search size of a query ? Insoluble queries: maximum search size Soluble queries: number of probes until answer is found. We look at the Expected Search Size (ESS) of each item. The ESS is inversely proportional to the fraction of peers with a copy of the item. Search size Query is soluble if there are sufficiently many copies of the item. Query is insoluble if item is rare or non existent.

Aug 22, 2002Sigcomm 2002 Search Example 2 probes4 probes

Aug 22, 2002Sigcomm 2002 Expected Search Size (ESS) Allocation : p 1, p 2, p 3,…, p m  i p i = 1 i th item is allocated p i fraction of storage. (keys placed in p i  fraction of hosts) m items with relative query rates q 1 > q 2 > q 3 > … > q m.  i q i = 1 Search size for i th item is a Geometric r.v. with mean Ai = 1/(  p i ). ESS is  i q i A i = (  i q i / p i )/ 

Aug 22, 2002Sigcomm 2002 Uniform and Proportional Replication Two natural strategies: Uniform Allocation: p i = 1/m Simple, resources are divided equally Proportional Allocation: p i = q i “Fair”, resources per item proportional to demand Reflects current P2P practices Example: 3 items, q 1 =1/2, q 2 =1/3, q 3 =1/6 UniformProportional

Aug 22, 2002Sigcomm 2002 Basic Questions How do Uniform and Proportional allocations perform/compare ? Which strategy minimizes the Expected Search Size (ESS) ? Is there a simple protocol that achieves optimal replication in decentralized unstructured networks ?

Aug 22, 2002Sigcomm 2002 Insoluble queries Search always extends to the maximum allowed search size. If we fix the available storage for copies, the query rate distribution, and the number if items that we wish to be “locatable”, then The maximum required search size depends on the smallest allocation of an item. Thus, Uniform allocation minimizes this maximum and thus the cost induced by insoluble queries. What about the cost of soluble queries? Answer is more surprising …

Aug 22, 2002Sigcomm 2002 ESS under Uniform and Proportional Allocations (soluble queries) Lemma: The ESS under either Uniform or Proportional allocations is m/  –Independent of query rates (!!!) –Same ESS for Proportional and Uniform (!!!) Proportional: ASS is (  i q i / p i )/  (  i q i / q i )/  m/  Uniform: ASS is (  i q i / p i )/  (  i m q i )/  m/   i q i  m/  Proof…

Aug 22, 2002Sigcomm 2002 Space of Possible Allocations Definition: Allocation p 1, p 2, p 3,…, p m is “in-between” Uniform and Proportional if for 1  i < m, q i+1 /q i < p i+1 /p i < 1 Theorem1: All (strictly) in-between strategies are (strictly) better than Uniform and Proportional Theorem2: p is worse than Uniform/Proportional if for all i, p i+1 /p i > 1 (more popular gets less) OR for all i, q i+1 /q i > p i+1 /p i (less popular gets less than “fair share”) Proportional and Uniform are the worst “reasonable” strategies (!!!)

Aug 22, 2002Sigcomm 2002 q2/q1 p2/p1 Space of allocations on 2 items Worse than prop/uni More popular item gets less. Worse than prop/uni More popular gets more than its proportional share Better than prop/uni Uniform Proportional SR

Aug 22, 2002Sigcomm 2002 So, what is the best strategy for soluble queries ?

Aug 22, 2002Sigcomm 2002 Square-Root Allocation p i is proportional to square-root(q i ) Lies “In-between” Uniform and Proportional Theorem: Square-Root allocation minimizes the ESS (on soluble queries) Minimize  i q i / p i such that  i p i = 1

Aug 22, 2002Sigcomm 2002 How much can we gain by using SR ? Zipf-like query rates

Aug 22, 2002Sigcomm 2002 OK SR is best for soluble queries Uniform minimizes cost of insoluble queries OPT is a hybrid of Uniform and SR Tuned to balance cost of soluble and insoluble queries. What is the optimal strategy?

Aug 22, 2002Sigcomm 2002 UniformSR 10^4 items, Zipf-like w=1.5 All Soluble 85% Soluble All Insoluble

Aug 22, 2002Sigcomm 2002 We now know what we need. How do we get there?

Aug 22, 2002Sigcomm 2002 Replication Algorithms Fully distributed where peers communicate through random probes; minimal bookkeeping; and no more communication than what is needed for search. Converge to/obtain SR allocation when query rates remain steady. Uniform and Proportional are “easy” :- – Uniform: When item is created, replicate its key in a fixed number of hosts. – Proportional: for each query, replicate the key in a fixed number of hosts Desired properties of algorithm:

Aug 22, 2002Sigcomm 2002 Model for Copy Creation/Deletion Creation: after a successful search, C(s) new copies are created at random hosts. Deletion: is independent of the identity of the item; copy survival chances are non- decreasing with creation time. (i.e., FIFO at each node) average value of C used to replicate i th item. Claim: If / remains fixed over time, and,  , then p i /p j  q i /q j Property of the process:

Aug 22, 2002Sigcomm 2002 Creation/Deletion Process If then Corollary: Algorithm for square-root allocation needs to have equal to or converge to a value inversely proportional to

Aug 22, 2002Sigcomm 2002 SR Replication Algorithms Path replication: number of new copies C(s) is proportional to the size of the search (Freenet) –Converges to SR allocation (+reasonable conditions) –Convergence unstable with delayed creations Sibling memory: each copy remembers the number of sibling copies, –Quickly “on target” –For “good estimates” need to find several copies. Probe memory: each peer records number and combined search size of probes it sees for each item. C(S) is determined by collecting this info from number of peers proportional to search size. –Immediately “on target” –Extra communication (proportional to that needed for search).

Aug 22, 2002Sigcomm 2002 Alg1: Path Replication Number of new copies produced per query,, is proportional to search size 1/p i Creation rate is proportional to q i Steady state: creation rate proportional to allocation p i, thus

Aug 22, 2002Sigcomm 2002 Simulation Path replication Sibling number Hosts with copy time Delay = 0.25 * copy lifetime; hosts

Aug 22, 2002Sigcomm 2002 Summary Random Search/replication Model: probes to “random” hosts Proportional allocation – current practice Uniform allocation – best for insoluble queries Soluble queries: Proportional and Uniform allocations are two extremes with same average performance Square-Root allocation minimizes Average Search Size OPT (all queries) lies between SR and Uniform SR/OPT allocation can be realized by simple algorithms.