SKIP GRAPHS James Aspnes Gauri Shah SODA 2003.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Scalable Content-Addressable Network Lintao Liu
SKIP GRAPHS Slides adapted from the original slides by James Aspnes Gauri Shah.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Fabian Kuhn, Microsoft Research, Silicon Valley
SCRIBE A large-scale and decentralized application-level multicast infrastructure.
1 PASTRY Partially borrowed from Gabi Kliot ’ s presentation.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
1 One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems Prasanna Ganesan Beverly Yang Hector Garcia-Molina Stanford University.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Fault-tolerant Routing in Peer-to-Peer Systems James Aspnes Zoë Diamadi Gauri Shah Yale University PODC 2002.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
P2P Course, Structured systems 1 Skip Net (9/11/05)
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
SKIP GRAPHS James Aspnes Gauri Shah To appear in SODA Level 0 Level 1 Level 2.
Balanced Search Trees CS 3110 Fall Some Search Structures Sorted Arrays –Advantages Search in O(log n) time (binary search) –Disadvantages Need.
1 Distributed Data Structures for a Peer-to-peer system Advisor: James Aspnes Committee: Joan Feigenbaum Arvind Krishnamurthy Antony Rowstron [MSR, Cambridge,
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Efficient and Robust Query Processing in Dynamic Environments Using Random Walk Techniques Chen Avin Carlos Brito.
Other Structured P2P Systems CAN, BATON Lecture 4 1.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
1 Plaxton Routing. 2 Introduction Plaxton routing is a scalable mechanism for accessing nearby copies of objects. Plaxton mesh is a data structure that.
Multi-dimensional Queries in P2P Systems. Applications Photo-sharing (photographs tagged with metadata) Multi-player online games (locate objects and.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
CCAN: Cache-based CAN Using the Small World Model Shanghai Jiaotong University Internet Computing R&D Center.
Online Social Networks and Media
Data Communications and Networking Chapter 11 Routing in Switched Networks References: Book Chapters 12.1, 12.3 Data and Computer Communications, 8th edition.
1 Peer-to-Peer Networks. Overlay Network A logical network laid on top of the Internet A B C Internet Logical link AB Logical link BC.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Chord Advanced issues. Analysis Theorem. Search takes O (log N) time (Note that in general, 2 m may be much larger than N) Proof. After log N forwarding.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
Chord Advanced issues. Analysis Search takes O(log(N)) time –Proof 1 (intuition): At each step, distance between query and peer hosting the object reduces.
BATON A Balanced Tree Structure for Peer-to-Peer Networks H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Peer-to-Peer Networks 07 Degree Optimal Networks
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
CS 268: Lecture 22 (Peer-to-Peer Networks)
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Spatial Indexing I Point Access Methods.
Know thy Neighbor’s Neighbor Better Routing for Skip Graphs and Small Worlds Moni Naor Udi Wieder.
Peer-to-Peer and Social Networks
Plethora: Infrastructure and System Design
Accessing nearby copies of replicated objects
PASTRY.
EE 122: Peer-to-Peer (P2P) Networks
DHT Routing Geometries and Chord
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
Chord Advanced issues.
SKIP LIST & SKIP GRAPH James Aspnes Gauri Shah
Chord Advanced issues.
(2,4) Trees (2,4) Trees (2,4) Trees.
Chord Advanced issues.
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
(2,4) Trees (2,4) Trees (2,4) Trees.
A Scalable Content Addressable Network
A Semantic Peer-to-Peer Overlay for Web Services Discovery
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
CS210- Lecture 20 July 19, 2005 Agenda Multiway Search Trees 2-4 Trees
SKIP GRAPHS (continued)
Presentation transcript:

SKIP GRAPHS James Aspnes Gauri Shah SODA 2003

P2P system Bunch of peers. Store resources identified by keys. Peers subject to crash failures. Goal: locate resources efficiently.

Properties of ideal network Data availability Decentralization Fault-tolerance Scalability Load balancing Maintaining the network Dynamic node addition/deletion Self-stabilization Efficient searching Incorporating geography Incorporating locality [temporal, spatial]

Distributed Hash Tables Virtual Route v4 Nodes Keys v2 v1 HASH Physical Link v3 v1 v2 v3 v4 Virtual Link Actual Route PHYSICAL NETWORK VIRTUAL OVERLAY NETWORK

Advantages Disadvantages SKIP GRAPHS Load balancing. Decentralization. O(log n) space and search time. O(log2n) insert and delete time [search for (log n) neighbors]. Tolerance of random faults. No locality properties. No tolerance to adversarial faults. No self-stabilization. No optimization wrt. geography. SKIP GRAPHS

Skip List [Pugh ’90] Data structure based on a linked list. J A J M A HEAD TAIL J Level 2 A J M Level 1 Level 0 A G J M R W Each node linked at higher level with probability 1/2.

Searching in a skip list Search for key ‘R’ HEAD success TAIL failure J Level 2 A J M Level 1 Level 0 A G J M R W - + Time for search: O(log n) on average. On average, constant number of pointers per node.

Skip lists for P2P? Advantages O(log n) expected search time. Retains locality. Dynamic node additions/deletions. Disadvantages Heavily loaded top-level nodes. Easily susceptible to random failures. Lacks redundancy.

A Skip Graph G W A J M R G R W A J M A G J M R W Level 2 A J M R 101 100 000 001 011 110 100 G R W Level 1 A J M 110 101 001 001 011 Membership vectors Level 0 A G J M R W 001 100 001 011 110 101 Link at level i to nodes with matching prefix of length i. Think of a tree of skip lists that share lower layers.

Properties of skip graphs Searching. Node insertions. Independence from system size. Locality and range queries.

Searching: avg. O (log n) Restricting to the lists containing the starting element of the search, we get a skip list. Level 2 G W A J M R G R W Level 1 A J M Level 0 A G J M R W Same performance as DHTs.

Node Insertion – 1 buddy new node J 001 G W Level 2 A M R 100 101 000 011 110 G R W Level 1 A M 110 101 100 001 011 Level 0 A G M R W 001 100 011 110 101 Starting at buddy node, find nearest key at level 0. Basically a range query looking for key closest to new key. Takes O(log n) on average.

Node Insertion - 2 Total time for insertion: O(log n) At each level i, find nearest node with matching prefix of membership vector of length i+1. G W Level 2 A J 001 M R 100 101 000 011 110 G R W Level 1 A J 001 M 110 101 100 001 011 A Level 0 G J 001 M R W 001 100 011 110 101 Total time for insertion: O(log n) DHTs take: O(log2n)

Independent of system size No need to know size of keyspace or number of nodes. E Z 1 J 00 01 Level 0 Level 1 Level 2 J insert E Z Level 1 E Z Level 0 1 Old nodes extend membership vector as required with arrivals. DHTs require knowledge of keyspace size initially.

Locality and range queries Find key < F, > F. Find largest key < x. Find least key > x. Find all keys in interval [D..O]. Initial node insertion at level 0. A D F I A D F I L O S

Applications of locality Version Control e.g. find latest news from yesterday. find largest key < news:10/29. Level 0 news:10/25 news:10/26 news:10/27 news:10/28 news:10/29 Data Replication e.g. find any copy of some Britney Spears song. Level 0 britney01 britney02 britney03 britney04 britney05 DHTs cannot do this easily as hashing destroys locality.

So far... Coming up...      Self-stabilization. Decentralization. Load balancing. Tolerance to faults. Self-stabilization. Random faults. Adversarial faults. Decentralization. Locality properties. O(log n) space per node. O(log n) search, insert, and delete time. Independent of system size.     

Load balancing Interested in average load on a node u. i.e. the number of searches from source s to destination t that use node u. Theorem: Let dist (u, t) = d. Then the probability that a search from s to t passes through u is < 2/(d+1). where V = {nodes v: u <= v <= t} and |V| = d+1.

Skip list restriction s Level 2 Nodes u Level 1 Level 0 Node u is on the search path from s to t only if it is in the skip list formed from the lists of s at each level.

Tallest nodes s t u s u is not on path.  u is on path. u u t Node u is on the search path from s to t only if it is in T = the set of k tallest nodes in [u..t]. Pr [u T] = Pr[|T|=k] • k/(d+1) = E[|T|]/(d+1). k=1 d+1 Heights independent of position, so distances are symmetric.

Load on node u Average load on a node is inversely proportional Start with n nodes. Each node goes to next set with prob. 1/2. We want expected size of T = last non-empty set. We show that: E[|T|] < 2. = T Asymptotically: E[|T|] = 1/(ln 2)  2x10-5  1.4427… [Trie analysis] Average load on a node is inversely proportional to the distance from the destination. We also show that the distribution of average load declines exponentially beyond this point.

Experimental result Load on node Node location Expected load Actual load Destination = 76542 76400 76450 76500 76550 76600 76650 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 0.0 Load on node Node location

Fault tolerance How do node failures affect skip graph performance? Random failures: Randomly chosen nodes fail. Experimental results. Adversarial failures: Adversary carefully chooses nodes that fail. Bound on expansion ratio.

Random faults 131072 nodes

Searches with random failures 131072 nodes 10000 messages

Adversarial faults Theorem: A skip graph with n nodes has dA = nodes adjacent to A but not in A. Expansion ratio = min |dA|/|A|, 1 <= |A| <= n/2. A dA Theorem: A skip graph with n nodes has expansion ratio = (1/log n). f failures can isolate only O(f•log n ) nodes.

Proof intuition Consider neighbors of set A at level 0. 1. Clumpy sets Level 0 Low probability of clumpy sets. dA A A 2. Non-clumpy sets Level 0 Non-clumpy sets have many neighbors at level 0. Gives high expansion ratio.

This gives expansion ratio = (1/log n). All sets have low probability of few neighbors at level h. And there are not too many clumpy sets. Low probability that any set A has few neighbors at level 0 or h. This gives expansion ratio = (1/log n). Same analysis applicable to DHTs?

Need for repair mechanism G W Level 2 A J M R G R W Level 1 A J M Level 0 A G J M R W Node failures can leave skip graph in inconsistent state.

Let xRi (xLi) be the right (left) neighbor of x Ideal skip graph Let xRi (xLi) be the right (left) neighbor of x at level i. If xLi, xRi exist: k xRi = xRi-1. xLi = xLi-1. Successor constraints x Level i Level i-1 i xR i-1 1 2 ..00.. ..01.. xLi < x < xRi. xLiRi = xRiLi = x. Invariant

Basic repair If a node detects a missing neighbor, it tries to patch the link using other levels. 1 5 1 3 5 6 1 2 3 4 5 6 Also relink at other lower levels. Successor constraints may be violated by node arrivals or failures.

Constraint violation Neighbor at level i not present at level (i-1). x ..00.. ..01.. x Level i x x Level i-1 ..00.. ..01.. ..01.. ..01.. ..00.. ..01.. ..01.. ..01.. zipper Level i-1 Level i x

Self-stabilization zOp(B) zOp(E) zOp(I) A C D F J zipperOp message Level i B E G H I zOp(A) zOp(D) zOp(F) Eventually want each connected component of the skip graph to reorganize itself into an ideal skip graph.

Conclusions Similarities with DHTs Decentralization. O(log n) space at each node. O(log n) search time. Load balancing properties. Tolerant of random faults.

Differences Property DHTs Skip Graphs Insert/Delete time O(log2n) Locality No Yes Repair mechanism ? Partial Tolerance of adversarial faults Keyspace size Reqd. Not reqd.

Open Problems Design efficient repair mechanism. Incorporate geographical proximity. Study multi-dimensional skip graphs. Evaluate performance in practice. Study effect of byzantine failures. ?