1 Unstructured P2P overlay. 2 Centralized model  e.g. Napster  global index held by central authority  direct contact between requestors and providers.

Slides:

Advertisements

Similar presentations

Peer-to-Peer and Social Networks An overview of Gnutella.

Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen Scott Shenker This is a modified version of the original presentation by the authors.

Efficient Search - Overview Improving Search In Peer-to-Peer Systems Presented By Jon Hess cs294-4 Fall 2003.

Improving Search in Peer-to-Peer Networks Beverly Yang Hector Garcia-Molina Presented by Shreeram Sahasrabudhe

1 Turning Heterogeneity into an Advantage in Overlay Routing Gisik Kwon Dept. of Computer Science and Engineering Arizona State University Published in.

Technion –Israel Institute of Technology Computer Networks Laboratory A Comparison of Peer-to-Peer systems by Gomon Dmitri and Kritsmer Ilya under Roi.

Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Routing Indices For Peer-to-Peer Systems Svetlana Strunjas University of Cincinnati May,2002.

Routing Indices For Peer-to-Peer Systems Arturo Crespo, Hector Garcia-Molina Stanford ICDCS 2002.

P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Routing indexes A. Crespo & H. Garcia-Molina ICDCS 02.

LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.

Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By

Small-world Overlay P2P Network

P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.

Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.

A Trust Based Assess Control Framework for P2P File-Sharing System Speaker ： Jia-Hui Huang Adviser : Kai-Wei Ke Date ： 2004 / 3 / 15.

Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.

1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.

Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,

Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao Cisco Systems, Inc. (Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker)

presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.

Vassilios V. Dimakopoulos and Evaggelia Pitoura Distributed Data Management Lab Dept. of Computer Science, Univ. of Ioannina, Greece

Object Naming & Content based Object Search 2/3/2003.

Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.

Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.

1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.

Efficient Search in Peer to Peer Networks By: Beverly Yang Hector Garcia-Molina Presented By: Anshumaan Rajshiva Date: May 20,2002.

Searching in Unstructured Networks Joining Theory with P-P2P.

Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.

1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.

P2P File Sharing Systems

INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.

Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.

1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.

Introduction Widespread unstructured P2P network

P2P Architecture Case Study: Gnutella Network

IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

Searching In Peer-To-Peer Networks Chunlin Yang. What’s P2P - Unofficial Definition All of the computers in the network are equal Each computer functions.

Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.

Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.

P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Routing indexes A. Crespo & H. Garcia-Molina ICDCS 02.

Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.

Routing Indices For P-to-P Systems ICDCS Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Replication Strategies in Unstructured Peer-to-Peer Networks Edith CohenScott Shenker Some slides are taken from the authors’ original presentation.

An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.

03/19/02Scalab Seminar Series1 Routing in Peer-to-Peer Systems Ramaswamy N.Vadivelu Scalab, ASU.

Unstructure P2P Overlay. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina.

1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.

Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.

By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.

LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.

ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.

Aug 22, 2002Sigcomm 2002 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen AT&T Labs-research Scott Shenker ICIR.

Peer-to-peer systems (part I) Slides by Indranil Gupta (modified by N. Vaidya)

1 Reading Report 3 Yin Chen 20 Feb 2004 Reference: Efficient Search in Peer-to-Peer Networks, Beverly Yang, Hector Garcia-Molina, In 22 nd Int. Conf. on.

1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

School of Electrical Engineering &Telecommunications UNSW Cost-effective Broadcast for Fully Decentralized Peer-to-peer Networks Marius Portmann & Aruna.

09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.

Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.

P2P Networking: Freenet Adriane Lau November 9, 2004 MIE456F.

Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.

Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.

Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,

CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)

A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.

Early Measurements of a Cluster-based Architecture for P2P Systems

Presentation transcript:

1 Unstructured P2P overlay

2 Centralized model  e.g. Napster  global index held by central authority  direct contact between requestors and providers Decentralized model  e.g. Gnutella, Freenet, Chord  no global index – local knowledge only (approximate answers)  contact mediated by chain of intermediaries P2P Application

Gnutella search mechanism Each peer keeps a list of other peers that it knows about  Neighbors Increasing the degree of the peers reduces the longest path from one peer to another but requires more storage at each peer Once a peer is connected to the overlay, it can exchange messages with other peers in its neighbor. 3

4 Gnutella search mechanism A Steps: Node 2 initiates search for file A

5 Gnutella Search Mechanism A Steps: Node 2 initiates search for file A Sends message to all neighbors A A

6 Gnutella Search Mechanism A Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message A A A

7 Gnutella Search Mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message A:5 A A:7 A A

8 Gnutella Search Mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back- propagated A:5 A:7 A A

9 Gnutella Search Mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back- propagated A:5 A:7

10 Gnutella Search Mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back- propagated File download download A

11 Scalability Whenever a node receives a message, (ping/query) it sends copies out to all of its other connections. existing mechanisms to reduce traffic:  TTL counter  Cache information about messages they received, so that they don't forward duplicated messages.

Gnutella Search Mechanism FloodFoward(Query q, Source p) if (q.id  oldIdsQ) return oldIdsQ=oldIdsQ ∪ q.id q.TTL=q.TTL-1 if (q.TTL <= 0) return foreach (s  Neighbors) if (s <> p) send(s,q) 12

13 Total Generated Traffic Ripeanu has determined that Gnutella traffic totals 1Gbps (or 330TB/month)!  Compare to 15,000TB/month in US Internet backbone (Dec. 2000)  this estimate excludes actual file transfers Reasoning:  QUERY and PING messages are flooded. They form more than 90% of generated traffic  predominant TTL=7

14 Mapping between Gnutella Network and Internet Infrastructure A DB C E H G F Perfect Mapping

15 A DB C E H G F Mismatch between Gnutella Network and Internet Infrastructure Inefficient mapping Link D-E needs to support six times higher traffic.

16 Free Riding on Gnutella 70% of Gnutella users share no files 90% of users answer no queries Those who have files to share may limit number of connections or upload speed, resulting in a high download failure rate. If only a few individuals contribute to the public good, these few peers effectively act as centralized servers.

17 Query Expressiveness Format of query not standardized No standard format or matching semantics for the QUERY string. Its interpretation is completely determined by each node that receives it. String literal vs. regular expression Directory name, filename, or file contents Malicious users may even return files unrelated to the query

18 Conclusions  Gnutella is a self-organizing, large-scale, P2P application that produces an overlay network on top of the Internet; it appears to work  freedom  High network traffic cost  Scalability  File availability

Random Walk To avoid the message overhead of flooding, unstructured overlays can use some type of random walk.  A single query message is sent to a randomly selected neighbor  The message has a TTL that is decremented at each hop  Termination The query locates a node with the desired object Search timeout 19

Random Walk To improve the response time, several random walk queries can be issued in parallel. 20

21 Some References [1] Eytan Adar and Bernardo A. Huberman, Free Riding on Gnutella [2] Igor Ivkovic, Improving Gnutella Protocol: Protocol Analysis And Research Proposals [3] Jordan Ritter, Why Gnutella Can't Scale. No, Really. [4] Matei Ripeanu, Peer-to-Peer Architecture Case Study: Gnutella network. [5] The Gnutella Protocol Specification v0.4

22 Improving on Flooding and Random Walk

23 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g., music, videos, etc.)

Overviews Disadvantages  Flooding : does not scale  Random walks : take a long time to find an object Key ideas to improve performance  Query forwarding criteria  Overlay topology  Object placement 24

Overviews Query forwarding  By using additional knowledge about where the object is likely to be. Overlay topology  Proximity of the peers in the network  Connecting with high degree nodes  Shared properties of the peers Object placement  Object popularity 25

Overviews Metrics  Overlay hop (hop) The overlay hop may corresponding to many network hops!  Request hit rate  Latency 26

27 Topics Search strategies  Beverly Yang and Hector Garcia-Molina, “Improving Search in Peer-to-Peer Networks”, ICDCS 2002Improving Search in Peer-to-Peer Networks  Arturo Crespo, Hector Garcia-Molina, “Routing Indices For Peer-to-Peer Systems”, ICDCS 2002Routing Indices For Peer-to-Peer Systems Short cuts  Kunwadee Sripanidkulchai, Bruce Maggs and Hui Zhang, “Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems”, infocom 2003.Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Replication  Edith Cohen and Scott Shenker, “Replication Strategies in Unstructured Peer-to-Peer Networks”, SIGCOMM 2002.Replication Strategies in Unstructured Peer-to-Peer Networks

28 Improving Search in Peer-to- Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina

29 Motivation The propose of a data-sharing P2P system is to accept queries from users, and locate and return data (or pointers to the data). Metrics  Cost Average aggregate bandwidth Average aggregate processing cost  Quality of results Number of results Satisfaction : a query is satisfied if Z (a value specified by user) or more results are returned. Time to satisfaction

30 Current Techniques Gnutella  BFS with depth limit D.  Waste bandwidth and processing resources Freenet  DFS with depth limit D.  Poor response time.

31 Broadcast policies Iterative deepening (Expanding ring) Directed BFS

32 Iterative Deepening In system where satisfaction is the metric of choice, iterative deepening is a good technique Under policy P= { a, b, c} ;waiting time W  A source node S first initiates a BFS of depth “a”  The query is processed and then becomes frozen at all nodes that are “a” hops from the source  S waiting for a time period W

33 Iterative Deepening  If query is not satisfied, S will start the next iteration, initiating a BFS of depth b. S send a “Resend” with a TTL of “a” A node that receives a Resend message will simply forward the message or if the node is at depth “a”, it will drop the resend message and unfreeze the corresponding query by forwarding the query message with a TTL of b-a to all its neighbors

34 Directed BFS If minimizing response time is important to an application, iterative deepening may not be appropriate A source send query messages to just a subset of its neighbors A node maintains simple statistics on its neighbors  Number of results received from each neighbor  Latency of connection

35 Directed BFS (cont) Candidate nodes  Returned the Highest number of results  The neighbor that returns response messages that have taken the lowest average number of hops

36 Routing Indices For Peer-to-Peer Systems Arturo Crespo, Hector Garcia-Molina Stanford University

37 Motivation A distributed-index mechanism  Routing Indices (RIs)  Give a “direction” towards the document, rather than its actual location By using “routes” the index size is proportional to the number of neighbors

38 Peer-to-peer Systems A P2P system is formed by a large number of nodes that can join or leave the system at any time  Each node has a local document database that can be accessed through a local index  The local index receives content queries and returns pointers to the documents with the requested content

39 Routing indices The objective of a Routing Index (RI) is to allow a node to select the “best” neighbors to send a query A RI is a data structure that, given a query, returns a list of neighbors, ranked according to their goodness for the query Each node has a local index for quickly finding local documents when a query is received. Nodes also have a CRI containing  the number of documents along each path  the number of documents on each topic

40 Routing indices (cont.) Thus, the number of results in a path can be estimated as : Example : search documents contain (DB and L)  Goodness of B: (20/100) *(30/100) * 100= 6 C: ( 0/1000)*(50/1000)*1000=0 D: (100/200)*(150/200)*200=75 Note that these numbers are just estimates and they are subject to overcounts and/or undercounts A limitation of using CRIs is that they do not take into account the difference in cost due to the number of “hops” necessary to reach a document

41 Using Routing Indices

42 Using Routing Indices (cont.) t is the counter size in bytes, c is the number of categories, N the number of nodes, and b the branching factor  Centralized index would require t × (c + 1) × N bytes  the total for the entire distributed system is t × (c + 1) × b × N bytes the RIs require more storage space overall than a centralized index, the cost of the storage space is shared among the network nodes

43 Creating Routing Indices

44 Maintaining Routing Indices Maintaining RIs is identical to the process used for creating them For efficiency, we may delay exporting an update for a short time so we can batch several updates, thus, trading RI freshness for a reduced update cost

45 Hop-count Routing Indices

46 Hop-count Routing Indices (cont.) The estimator of a hop-count RI needs a cost model to compute the goodness of a neighbor We assumes that document results are uniformly distributed across the network and that the network is a regular tree with fanout F We define the goodness (goodness hc ) of Neighbor i with respect to query Q for hop-count RI as: If we assume F = 3, the goodness of X for a query about “DB” documents would be 13+10/3 = and for Y would be 0+31/3 = 10.33

47 Exponentially aggregated RI Each entry of the ERI for node N contains a value computed as:  th is the height and F the fanout of the assumed regular tree, goodness() is the Compound RI estimator, N[j] is the summary of the local index of neighbor j of N, and T is the topic of interest of the entry

48 Exponentially aggregated RI (cont.)

Cycles in the P2P Network 49

50 Cycles in the P2P Network There are three general approaches for dealing with cycles:  No-op solution: No changes are made to the algorithms only works with the hop-count and the exponential RI schemes hop-count RI: cycles longer than the horizon will not affect the RI. However, shorter cycles will affect the hop- count RI exponential RI: updates are sent back to the originator. However, the effect of the cycle will be smaller and smaller every time the update is sent back (due to the exponential decay

Cycles in the P2P Network  Cycle avoidance solution: do not allow nodes to create an “update” connection to other nodes if such connection would create a cycle Absence of global information  Cycle detection and recovery: This solution detects cycles sometime after they are formed and, after that, takes recovery actions to eliminate the effect of the cycles Cycles can be detected by having the originating node of a query or an update, let us say A, include a unique message identifier in the message. 51

52 Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems

53 Background  Each peer is connected randomly, and searching is done by flooding.  Allow keyword search Example of searching a mp3 file in Gnutella network. The query is flooded across the network.

54 Background DHT (Chord):  Given a key, Chord will map the key to the node.  Each node need to maintain O(log N) information  Each query use O(log N) messages.  Key search means searching by exact name An chord with about 50 nodes. The black lines point to adjacent nodes while the red lines are “finger” pointers that allow a node to find key in O(log N) time.

55 Interest-based Locality  Peers have similar interest will share similar contents

56 Architecture Shortcuts are modular. Shortcuts are performance enhancement hints.

57 Creation of shortcuts The peer use the underlying topology (e.g. Gnutella) for the first few searches. One of the return peers is selected from random and added to the shortcut lists. Each shortcut will be ordered by the metric, e.g. success rate, path latency. Subsequent queries go through the shortcut lists first. If fail, lookup through underlying topology.

58 Methodology – query workload Create traffic trace from the real application traffic:  Boeing firewall proxies  Microsoft firewall proxies  Passively collect the web traffic between CMU and the Internet  Passively collect typical P2P traffic (Kazza, Gnutella) Use exact matching rather than keyword matching in the simulation.  “song.mp3” and “my artist – song.mp3” will be treated as different.

59 Methodology – Underlying peers topology Based on the Gnutella connectivity graph in 2001, with 95% nodes about 7 hops away. Searching TTL is set to 7. For each kind of traffic (Boeing, Microsoft… etc), run 8 times simulations, each with 1 hour.

60 Simulation Results – success rate

61 Using Shortcuts’ Shortcuts Idea: Add the shortcut’s shortcut Performance gain of 7% on average Enhancement of Interest-based Locality

62 Interest-based Structures When viewed as an undirected graph:  In the first 10 minutes, there are many connected components, each component has a few peers in between.  At the end of simulation, there are few connected components, each component has several hundred peers. Each component is well connected.  The clustering coefficient is about 0.6 ~ 0.7.

63 Conclusion Interest based shortcuts are modular and performance enhancement hints over existing P2P topology. Shortcuts can enhance the searching efficiencies. Shortcuts form clusters within a P2P topology, and the clusters are well connected.

64 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen AT&T Labs-research Scott Shenker ICIR

65 (replication in) P2P architectures No proactive replication (Gnutella)  Hosts store and serve only what they requested

66 Question: how to use replication to improve search efficiency in unstructured networks with a proactive replication mechanism ?

67 Search and replication model Search: probe hosts, uniformly at random, until the query is satisfied (or the search max size is exceeded) Goal: minimize average search size (number of probes till query is satisfied) Replication: Each host can store up to  copies of items. Unstructured networks with replication of keys or copies. Peers probed (in the search and replication process) are unrelated to query/item

68 Search Example 2 probes 4 probes

69 What is the search size of a query ? Soluble queries: number of probes until answer is found. We look at the Expected Search Size (ESS) of each item. The ESS is inversely proportional to the fraction of peers with a copy of the item. Search size

70 Expected Search Size (ESS) n nodes, capacity  R=n*  r i = number of copies of the i’th items Allocation : p 1(=r 1 /R), p 2, p 3,…, p m  i p i = 1 i th item is allocated p i fraction of storage. m items with relative query rates q 1 > q 2 > q 3 > … > q m.  i q i = 1 Search size for i th item is a Geometric r.v. with mean Ai = n/(p i R)=1/(  p i ). ESS is  i q i A i = (  i q i / p i )/ 

71 Uniform and Proportional Replication Two natural strategies: Uniform Allocation: p i = 1/m (m items) Simple, resources are divided equally Proportional Allocation: p i = q i “Fair”, resources per item proportional to demand Reflects current P2P practices Example: 3 items, q 1 =1/2, q 2 =1/3, q 3 =1/6 UniformProportional

72 Basic Questions How do Uniform and Proportional allocations perform/compare ? Which strategy minimizes the Expected Search Size (ESS) ? Is there a simple protocol that achieves optimal replication in decentralized unstructured networks ?

73 ESS under Uniform and Proportional Allocations (soluble queries) Lemma: The ESS under either Uniform or Proportional allocations is m/  –Independent of query rates (!!!) –Same ESS for Proportional and Uniform (!!!) Proportional: ASS is (  i q i / p i )/  (  i q i / q i )/  m/  Uniform: ASS is (  i q i / p i )/  (  i m q i )/  m/   i q i  m/  p i =(R/m)/R Proof…

74 Space of Possible Allocations Definition: Allocation p 1, p 2, p 3,…, p m is “in- between” Uniform and Proportional if for 1  i <m, q i+1 /q i < p i+1 /p i < 1 Theorem1: All (strictly) in-between strategies are (strictly) better than Uniform and Proportional Theorem2: p is worse than Uniform/Proportional if for all i, p i+1 /p i > 1 (more popular gets less) OR for all i, q i+1 /q i > p i+1 /p i (less popular gets less than “fair share”) Proportional and Uniform are the worst “reasonable” strategies (!!!)

75 So, what is the best strategy for soluble queries ?

76 Square-Root Allocation p i is proportional to square-root(q i ) Lies “In-between” Uniform and Proportional Theorem: Square-Root allocation minimizes the ESS (on soluble queries) Minimize  i q i / p i such that  i p i = 1

77 Replication Algorithms Fully distributed where peers communicate through random probes  minimal bookkeeping  no more communication than what is needed for search. Converge to/obtain SR allocation when query rates remain steady. Uniform and Proportional are “easy” :- – Uniform: When item is created, replicate its key in a fixed number of hosts. – Proportional: for each query, replicate the key in a fixed number of hosts Desired properties of algorithm:

78 Model for Copy Creation/Deletion Creation: after a successful search, C(s) new copies are created at random hosts. Deletion: is independent of the identity of the item average value of C used to replicate i th item. Claim: If / remains fixed over time, then p i /p j  q i /q j Property of the process:

79 SR Replication Algorithms Path replication: number of new copies C(s) is proportional to the size of the search Probe memory: each peer records number and combined search size of probes it sees for each item. C(S) is determined by collecting this info from number of peers proportional to search size.  Extra communication (proportional to that needed for search).

80 Path Replication Number of new copies produced per query,, is proportional to search size 1/p i Creation rate is proportional to q i Steady state: creation rate proportional to allocation p i, thus

81 Summary Random Search/replication Model: probes to “random” hosts Soluble queries: Proportional and Uniform allocations are two extremes with same average performance Square-Root allocation minimizes Average Search Size OPT (all queries) lies between SR and Uniform SR/OPT allocation can be realized by simple algorithms.