1 Peer-to-Peer Networks. Overlay Network A logical network laid on top of the Internet A B C Internet Logical link AB Logical link BC.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
SKIP GRAPHS Slides adapted from the original slides by James Aspnes Gauri Shah.
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications Prepared by Ali Yildiz (with minor modifications by Dennis Shasha)
Chord: A Scalable Peer-to- Peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
The Chord P2P Network Some slides have been borowed from the original presentation by the authors.
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Speaker: Cathrin Weiß 11/23/2004 Proseminar Peer-to-Peer Information Systems.
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
1 1 Chord: A scalable Peer-to-peer Lookup Service for Internet Applications Dariotaki Roula
Xiaowei Yang CompSci 356: Computer Network Architectures Lecture 22: Overlay Networks Xiaowei Yang
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Small-world Overlay P2P Network
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Skip Lists1 Skip Lists William Pugh: ” Skip Lists: A Probabilistic Alternative to Balanced Trees ”, 1990  S0S0 S1S1 S2S2 S3S3 
SKIP GRAPHS James Aspnes Gauri Shah To appear in SODA Level 0 Level 1 Level 2.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Skip Lists 二○一七年四月二十五日
Chord Advanced issues. Analysis Theorem. Search takes O (log N) time (Note that in general, 2 m may be much larger than N) Proof. After log N forwarding.
An overview of Gnutella
Chord Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
Chord Advanced issues. Analysis Search takes O(log(N)) time –Proof 1 (intuition): At each step, distance between query and peer hosting the object reduces.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
2/19/2016 3:18 PMSkip Lists1  S0S0 S1S1 S2S2 S3S3    2315.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
1 Distributed Hash tables. 2 Overview r Objective  A distributed lookup service  Data items are distributed among n parties  Anyone in the network.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
The Chord P2P Network Some slides taken from the original presentation by the authors.
Skip Lists S3   S2   S1   S0  
Pastry Scalable, decentralized object locations and routing for large p2p systems.
CSE 486/586 Distributed Systems Distributed Hash Tables
The Chord P2P Network Some slides have been borrowed from the original presentation by the authors.
A Scalable Peer-to-peer Lookup Service for Internet Applications
Peer-to-Peer Data Management
An overview of Gnutella
(slides by Nick Feamster)
Peer-to-Peer and Social Networks
SKIP GRAPHS James Aspnes Gauri Shah SODA 2003.
EE 122: Peer-to-Peer (P2P) Networks
DHT Routing Geometries and Chord
Prof. Leonardo Mostarda University of Camerino
Chord Advanced issues.
SKIP LIST & SKIP GRAPH James Aspnes Gauri Shah
P2P Systems and Distributed Hash Tables
Chord Advanced issues.
Chord Advanced issues.
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
CSE 486/586 Distributed Systems Distributed Hash Tables
A Scalable Peer-to-peer Lookup Service for Internet Applications
Presentation transcript:

1 Peer-to-Peer Networks

Overlay Network A logical network laid on top of the Internet A B C Internet Logical link AB Logical link BC

The Formal Model Let V be a set of nodes. The functions id : V  Z+ assigns a unique id to each node in V rs : V  {0, 1}* assigns a random bit string to each node in V (optional) A family of overlay networks ON : F  G, where F is the set of all triples λ= (V; id; rs) and G is the set of all directed graphs. A unique directed graph ON(λ) ∈ G with each labeled set λ = (V; id; rs) of nodes. Each node contains one or more objects. One important objective is SEARCH: any node must be able to access any object as quickly as possible

Structured vs. Unstructured Overlay networks UnstructuredStructured No restriction on network topology. Examples: Gnutella, Kazaa, Bittorrent, Skype etc. Network topology satisfies specific invariants. Examples: Chord, CAN, Pastry Skip Graph etc

5 Gnutella The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network was spurred on by Napster's legal demise in early 2001.

6 What is Gnutella? object1 No central authority. object2 peer A protocol for distributed search

7 Remarks Gnutella uses the simple idea of searching by flooding, but scalability is an issue, since query flooding wastes bandwidth. Uses TTL to control flooding. Sometimes, existing objects may not be located due to limited TTL. Subsequently, various improved search strategies have been proposed.

8 Searching in Gnutella The topology is dynamic, i.e. constantly changing. How do we model a constantly changing topology? Usually, we begin with a static topology, and later account for the effect of churn. Modeling topology -- Random graph -- Power law graph (measurements provide useful inputs)

9 Random graph: Erdös-Rényi model A random graph G(n, p) is constructed by starting with a set of n vertices, and adding edges between pairs of nodes at random. Every possible edge occurs independently with probability p. Q. Is Gnutella topology a random graph?

10 Gnutella topology Gnutella topology is almost a power-law graph. (Also called scale-free graph) What is a power-law graph? The number of nodes with degree k = c.k - r (Contrast this with Gaussian distribution where the number of nodes with degree k = c. 2 - k. ) Many graphs in the nature exhibit power-law characteristics. Examples, world-wide web (the number of pages that have k in-links is proportional to k - 2 ), The fraction of scientific papers that receive k citations is k -3 etc.

11 AT&T Call Graph # of telephone numbers from which calls were made # of telephone numbers called 4 How many telephone numbers receive calls from k different telephone numbers?

12 Gnutella network power-law link distribution summer 2000, data provided by Clip2 5

13 A possible explanation Nodes join at different times. The more connections a node has, the more likely it is to acquire new connections (“Rich gets richer”). Popular webpages attract new pointers. It has been mathematically shown that such a growth process produces power-law network 7

14 Search strategies Flooding Random walk / - Biased random walk/ - Multiple walker random walk (Combined with) One-hop replication / Two-hop replication k-hop replication

15 On Random walk Let p(d) be the probability that a random walk on a d-D lattice returns to the origin. In 1921, Pólya proved that, (1) p(1)=p(2)=1, but (2) p(d) 2 There are similar results on two walkers meeting each other via random walk

16 Search via random walk Existence of a path does not necessarily mean that such a path can be discovered

17 Search via Random Walk Search metrics Delay = discovery time in hops Overhead = total distance covered by the walker Both should be as small as possible. For a single random walker, these are equal. K random walkers is a compromise. For search by flooding, if delay = h then overhead = d + d 2 + … + d h where d = degree of a node.

18 A simple analysis of random walk Let p = Population of the object. i.e. the fraction of nodes hosting the object T = TTL (time to live) Hop count hProbability of success 1p 2(1-p).p 3(1-p) 2.p T(1-p) T-1.p

19 A simple analysis of random walk Expected hop count E(h) = 1.p + 2.(1-p).p + 3(1-p) 2.p + …+ T.(1-p) T-1.p =1/p. ( 1-(1-p) T ) - T(1-p) T With a large TTL, E(h) = 1/p With a small TTL, there is a risk that search will time out before an existing object is located.

20 K random walkers As k increases, the overhead increases, but the delay decreases. There is a tradeoff. Assume they all k walkers start in unison. Probability that none could find the object after one hop = (1-p) k. The probability. that none succeeded after T hops = (1-p) kT. So the probability that at least one walker succeeded is 1-(1-p) kT. A typical assumption is that the search is abandoned as soon as at least one walker succeeds. Using these, one can derive a new value of E(h)

21 Increasing search efficiency Major strategies 1.Biased walk utilizing node degree heterogeneity. 2.Utilizing structural properties like random graph, power-law graphs, or small-world properties 3.Topology adaptation for faster search 4.Introducing two layers in the graph structure using supernodes

22 One hop replication Each node keeps track of the indices of the files belonging to its immediate neighbors. As a result, high capacity / high degree nodes can provide much better clues to a large number of search queries. Where is

23 Biased random walk P=5/10 P=3/10 P=2/10 Each node records the degree of the neighboring nodes. Search easily gravitates towards high degree nodes that hold more clues.

number of nodes found power-law graph 9 Deterministic biased walk

25 The next step This growing surge in popularity revealed the limits of the initial protocol's scalability. In early 2001, variations on the protocol improved the scalability. Instead of treating every node as client and server, some resource-rich nodes were used as ultrapeers or “supernodes,” containing indices of the objects in the local neighborhood. Search requests and responses were routed through them leading to faster response.

26 The KaZaA approach Powerful nodes (supernodes) act as local index servers, and client queries are propagated to other supernodes. Two-layered architecture. Supernode download Supernode Where is ABC? ABC

The Chord P2P Network Some slides have been borrowed from the original presentation by the authors

Main features of Chord -- Load balancing via Consistent Hashing –Small routing tables per node: log n –Small routing delay: log n hops –Fast join/leave protocol (polylog time)

Consistent Hashing -- Assigns both nodes and objects an m-bit key. -- Order these nodes around an identifier circle (what does a circle mean here?) according to the order of their keys (0.. 2 m -1). This ring is known as the Chord Ring. An object with key k is assigned to the first node whose key is ≥ k (called the successor node of key k)

Nodes and Objects on the Chord Ring N32 N90 N105 K80 K20 K5 Circular 7-bit ID space Key 5 Node 105 A key k is stored at its successor (node with key ≥ k)

Consistent Hashing [Karger 97] Property 1 If there are N nodes and K object keys, then with high probability, each node is responsible for (1+  )K/N objects. Property 2 When a node joins or leaves the network, the responsibility of at most O(K/N) keys changes hand (only to or from the node that is joining or leaving. When K is large, the impact is quite small.

The log N Fingers (0) Each node knows of only log N other nodes. N80 1/8 1/16 1/32 1/64 1/128 Circular (log N)-bit ID space Distance of N80’s neighbors from N80 1/4 1/2

Finger i points to successor of n+2 i N80 ½ ¼ 1/8 1/16 1/32 1/64 1/ N120

Chord Finger Table (0) N32 N60 N79 N70 N113 N102 N40 N N N N N N N N102 Node n’s i-th entry: first node  n + 2 i-1 N32’s Finger Table N80 N85 N=128 Finger table actually contains ID and IP address

Lookup N N N N N N N102 N32’s Finger Table Node 32, lookup(82): 32  70  80  N N N N N N N32 N70’s Finger Table (0) N32 N60 N79 N70 N113 N102 N40 N52 N80 N N N N N N N N32 N80’s Finger Table Greedy routing

New Node Join (0) N32 N60 N80 N70 N113 N102 N40 N N20’s Finger Table N20 Assume that the new node N20 knows one of the existing nodes.

New Node Join (2) (0) N32 N60 N80 N70 N113 N102 N40 N N N N N N N N102 N20’s Finger Table N20 Node 20 asks that node to locate the successors of 21, 22, …, 52, 84.

The Join procedure The new node id asks a gateway node n to find the successor of id n. find_successor(id) if id  (n, successor] then return successor else forward the query around the circle fi Needs O(n) messages for a simple Chord ring. This is slow.

Steps in join id n Successor(n) id n Finally But the transition does not happen immediately Linked list insert

A More Efficient Join // ask n to find the successor of id if id  (n, successor] then return successor else n’= closest_ preceding_node (id) return n’.find_successor(id) fi // search for the highest predecessor of id n. closest_preceding_node(id) for i = log N downto 1 if (finger[i]  (n,id) return finger[i]

Example (0) N32 N60 N80 N70 N113 N102 N40 N52 N20 K65 N20 wants to find out the successor of key 65

After join move objects (0) N32 N60 N80 N70 N113 N102 N40 N N N N N N N N102 N20’s Finger Table N20 Node 20 moves documents from node 32. D Notify nodes that must include N20 in their table. N113[1]=N20, not N32.

Three steps in join Step 1. Initialize predecessor and fingers of the new node. Step 2. Update the predecessor and the fingers of the existing nodes. ( Thus notify nodes that must include N20 in their table. N113[1] = N20, not N32. Step 3. Transfer objects to the new node as appropriate. ( Knowledge of predecessor is useful in stabilization )

Concurrent Join New node n n1 n2 [Before] New node n n2 n1 [After] New node n’

Stabilization New node n n2 n1 Predecessor.successor(n1) ≠ n1, so n1 adopts predecessor.successor(n1) = n as its new successor New node n n2 Periodic stabilization is needed to integrate the new node into the network and restore the invariant. n1

The complexity of join With high probability, any node joining or leaving an N -node Chord network will use O(log 2 N) messages to re-establish the Chord routing invariants and finger tables.

Chord Summary Log(n) lookup messages and table space. Well-defined location for each ID. Natural load balance due to consistent hashing. No name structure imposed. Minimal join/leave disruption.

Chord Advanced issues

Analysis Theorem. Search takes O (log N) time 2 m = key space, N= number of nodes Proof. After log N forwarding steps, distance to key is at most (N= 2 log N ). Number of nodes in the remaining range is O(log N) with high probability (property of consistent hashing). So by using successors in that range, it will take at most an Additional O(log N) forwarding steps.

Analysis (contd.) O(log N) search time is true if finger and successor entries correct, But what if these entries are wrong (which is possible during join or leave operations, or process crash?)

Search under peer failures N80 0 Say m=7 N32 N45 File abcnews.com with key K42 stored here X X X N32 crashed. Lookup for K42 fails (N16 does not know N45) N112 N96 N16 Who has abcnews.com ? (hashes to K42)

Search under peer failures N80 0 Say m=7 N32 N45 File abcnews.com with key K42 stored here X One solution: maintain r multiple successor entries in case of a failure, use other successor entries. N112 N96 N16 Who has abcnews.com ? (hashes to K42) Reactive vs. Proactive approach

Search under peer failures Choosing r=2log(N) suffices to maintain the correctness “with high probability.” Say 50% of nodes fail (i.e prob of failure = ½). For a given node, Probability (at least one successor alive) =

Search under peer failures (2) N80 0 Say m=7 N32 N45 File abcnews.com with key K42 stored here X X Lookup fails (N45 is dead) N112 N96 N16 Who has abcnews.com ? (hashes to K42)

Search under peer failures (2) N80 0 Say m=7 N32 N45 File abcnews.com with key K42 stored here X One solution: replicate file/key at r successors and predecessors N112 N96 N16 K42 replicated Who has abcnews.com ? (hashes to K42)

Dealing with dynamic issues Peers fail New peers join Peers leave Need to update successors and fingers, and ensure keys reside in the right places

New peers joining N80 0 Say m=7 N32 N45 N112 N96 N16 N40 Some gateway node directs N40 to its successor N45 N32 updates successor to N40 N40 initializes successor to N45, and obtains fingers from it N40 periodically talks to neighbors to update finger table Stabilization protocol New node Gateway node

New peers joining (2) N80 0 Say m=7 N32 N45 N112 N96 N16 N40 N40 may need to copy some files/keys from N45 (files with fileid between 32 and 40) K34,K38

Concurrent join N80 0 Say m=7 N32 N45 N112 N96 N16 N24 K38 N20 N28 Argue that each node will eventually be reachable K24

Effect of join on lookup If in a stable network with N nodes, another set of N nodes joins the network, and the join protocol correctly sets their successors, then lookups will take O(log N) steps w.h.p

Effect of join on lookup N80 0 N32 N45 N112 N96 N16 N24 K38 N20 N28 K24 Transfer pending Linear Scan Will locate K24 Consistent hashing guarantees that there be O(log N) new nodes w.h.p between two consecutive nodes

Weak and Strong Stabilization N5 N3 N1 N24 N63 N78 N96  u (successor (predecessor (u))) = u. Still it is weakly stable but not strongly stable. Why? Loopy network

What is funny / awkward about this?  v: u < v < successor (u) (succ (pred (u))) = u (Weakly stable) stable Must be false for strong stability N5 N3 N1 N24 N63 N78 N96

Strong stabilization The key idea of recovery from loopiness is: Let each node u ask its successor to walk around the ring until it reaches a node v : u <v ≤ successor(u). If  v: u <v < successor(u) then loopiness exists, and reset successor(u):=v Takes O(N 2 ) steps. But loopiness is a rare event. No protocol for recovery exists from a split ring.

New peers joining (3) A new peer affects O(log N) other finger entries in the system. So, the number of messages per peer join= O(log(N)*log(N)) Similar set of operations for dealing with peers leaving

Bidirectional Chord Each node u has fingers to u+1, u+2, u+4, u+8 … as well as u-1, u-2, u-4, u-8 … How does it help?

Skip Lists and Skip Graphs Some slides adapted from the original slides by James Aspnes Gauri Shah

68 Definition of Skip List A skip list for a set L of distinct (key, element) items is a series of linked lists L 0, L 1, …, L h such that Each list L i contains the special keys  and  List L 0 contains the keys of L in non-decreasing order Each list is a subsequence of the previous one, i.e., L 0  L 1  …  L h List L h contains only the two special keys  and 

69 Skip List Dictionary based on a probabilistic data structure. Allows efficient search, insert, and delete operations. Each element in the dictionary typically stores additional useful information beside its search key. Example: [for University of Iowa] [for Daily Iowan] Probabilistic alternative to a balanced tree.

70 Skip List AGJMRW HEAD TAIL Each node linked at higher level with probability 1/2. Level 0 AJM Level 1 J Level 2  

71 Another example    31  64  3134  23 L0L0 L1L1 L2L2 Each element of L i appears in L i+1 with probability p. Higher levels denote express lanes.

72 Searching in Skip List Search for a key x in a skip list as follows: Start at the first position of the top list At the current position P, compare x with y  key(after(p)) x  y -> return element(after (P)) x  y -> “scan forward” x  y -> “drop down” –If we move past the bottom list, then no such key exists

73 Example of search for 78  L1L1 L2L2 L3L3  31  64  3134    L0L0 At L 1 P is  at,  is bigger than 78, we drop down At L 0, 78 = 78, so the search is over.

74 Insertion The insert algorithm uses randomization to decide in how many levels the new item should be added to the skip list. After inserting the new item at the bottom level flip a coin. If it returns tail, insertion is complete. Otherwise, move to next higher level and insert in this level at the appropriate position, and repeat the coin flip.

75 Insertion Example    23   L0L0 L1L1 L2L2  L0L0 L1L1 L2L2 L3L3    2315 p0p0 p1p1 p2p2 1)Suppose we want to insert 15 2)Do a search, and find the spot between 10 and 23 3)Suppose the coin come up “head” three times

76 Deletion Search for the given key. If a position with key is not found, then no such key exists. Otherwise, if a position with key is found (it will be definitely found on the bottom level), then we remove all occurrences of from every level. If the uppermost level is empty, remove it.

77 Deletion Example 1) Suppose we want to delete 34 2) Do a search, find the spot between 23 and 45 3) Remove all the position above p  4512  23  L0L0 L1L1 L2L2 L0L0 L1L1 L2L2     p0p0 p1p1 p2p2 Remove this level

78 Constant number of pointers Average number of pointers per node = O(1) Total number of pointers = 2.n + 2. n/ n/ n/8 + … = 4.n So, the average number of pointers per node = 4

79 Number of levels Pr[a given element x is above level c log n] = 1/2 c log n = 1/n c Pr[any element is above level c log n] = n. 1/n c = 1/n c-1 The number of levels = O(log n) w.h.p

80 Search time Consider a skiplist with two levels L 0 and L 1. To search a key, first search L 1 and then search L 0. Cost (i.e. search time) = length (L 1 ) + n / length (L 1 ) Minimum when length (L 1 ) = n / length (L 1 ). Thus length(L 1 ) = (n) 1/2, and cost = 2. (n) 1/2 (Three lists) minimum cost = 3. (n) 1/3 (Log n lists) minimum cost = log n. (n) 1/log n = 2.log n

81 Skip lists for P2P? Heavily loaded top-level nodes. Easily susceptible to failures. Lacks redundancy. Disadvantages Advantages O(log n) expected search time. Retains locality. Dynamic node additions/deletions.

82 A Skip Graph A 001 J M 011 G 100 W 101 R 110 Level 1 G R W AJM Level 2 A G JMRW Level 0 Membership vectors Link at level i to nodes with matching prefix of length i. Think of a tree of skip lists that share lower layers.

83 Properties of skip graphs 1.Efficient Searching. 2.Efficient node insertions & deletions. 3.Independence from system size. 4.Locality and range queries.

84 Searching: avg. O (log n) Same performance as DHTs. AJM GWR Level 1 G R W AJM Level 2 AGJMRW Level 0 Restricting to the lists containing the starting element of the search, we get a skip list.

85 Node Insertion – 1 A 001 M 011 G 100 W 101 R 110 Level 1 G R W A M Level 2 A GM R W Level 0 J 001 Starting at buddy node, find nearest key at level 0. Takes O(log n) time on average. buddy new node

86 Node Insertion - 2 At each level i, find nearest node with matching prefix of membership vector of length i+1. A 001 M 011 G 100 W 101 R 110 Level 1 G R W A M Level 2 A GM R W Level 0 J 001 J J Total time for insertion: O(log n) DHTs take: O(log 2 n)

87 Independent of system size No need to know size of keyspace or number of nodes. E Z 10 E Z J insert Level 0 Level 1 E Z 10 E Z J 0 J 0001 E ZJ Level 0 Level 1 Level 2 Old nodes extend membership vector as required with arrivals. DHTs require knowledge of keyspace size initially.

88 Locality and range queries Find key F. Find largest key < x. Find least key > x. Find all keys in interval [D..O]. Initial node insertion at level 0. D F A I D F A I L O S

89 Applications of locality news:02/13 e.g. find latest news from yesterday. find largest key < news: 02/13. news:02/11news:02/12news:02/10news:02/09 Level 0 DHTs cannot do this easily as hashing destroys locality. e.g. find any copy of some Britney Spears song. britney05britney03britney04britney02britney01 Level 0 Data Replication Version Control

90 Load balancing Interested in average load on a node u. i.e. the number of searches from source s to destination t that use node u. Theorem: Let dist (u, t) = d. Then the probability that a search from s to t passes through u is < 2/(d+1). where V = {nodes v: u <= v <= t} and |V| = d+1.

91 Nodes u Skip list restriction Level 0 Level 1 Level 2 Node u is on the search path from s to t only if it is in the skip list formed from the lists of s at each level. s

92 Tallest nodes Node u is on the search path from s to t only if it is in T = the set of k tallest nodes in [u..t]. u u t s u is not on path. t u u s u u is on path. Pr [u T] = Pr[|T|=k] k/(d+1) = E[|T|]/(d+1). k=1 d+1 Heights independent of position, so distances are symmetric.