SKIP GRAPHS James Aspnes Gauri Shah To appear in SODA 2003. Level 0 Level 1 Level 2.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Scalable Content-Addressable Network Lintao Liu
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
SKIP GRAPHS Slides adapted from the original slides by James Aspnes Gauri Shah.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Peer-to-Peer Structured Overlay Networks
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 PASTRY Partially borrowed from Gabi Kliot ’ s presentation.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
A Scalable Content Addressable Network (CAN)
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Looking Up Data in P2P Systems Hari Balakrishnan M.Frans Kaashoek David Karger Robert Morris Ion Stoica.
Fault-tolerant Routing in Peer-to-Peer Systems James Aspnes Zoë Diamadi Gauri Shah Yale University PODC 2002.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Distributed Lookup Systems
SCALLOP A Scalable and Load-Balanced Peer- to-Peer Lookup Protocol for High- Performance Distributed System Jerry Chou, Tai-Yi Huang & Kuang-Li Huang Embedded.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
P2P Course, Structured systems 1 Skip Net (9/11/05)
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
“Umbrella”: A novel fixed-size DHT protocol A.D. Sotiriou.
1 Distributed Data Structures for a Peer-to-peer system Advisor: James Aspnes Committee: Joan Feigenbaum Arvind Krishnamurthy Antony Rowstron [MSR, Cambridge,
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Other Structured P2P Systems CAN, BATON Lecture 4 1.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
1 Plaxton Routing. 2 Introduction Plaxton routing is a scalable mechanism for accessing nearby copies of objects. Plaxton mesh is a data structure that.
1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Peer-to-Peer Networks. Overlay Network A logical network laid on top of the Internet A B C Internet Logical link AB Logical link BC.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Chord Advanced issues. Analysis Theorem. Search takes O (log N) time (Note that in general, 2 m may be much larger than N) Proof. After log N forwarding.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
Chord Advanced issues. Analysis Search takes O(log(N)) time –Proof 1 (intuition): At each step, distance between query and peer hosting the object reduces.
Peer to Peer Network Design Discovery and Routing algorithms
BATON A Balanced Tree Structure for Peer-to-Peer Networks H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Peer-to-Peer Networks 07 Degree Optimal Networks
Accessing nearby copies of replicated objects
SKIP GRAPHS James Aspnes Gauri Shah SODA 2003.
EE 122: Peer-to-Peer (P2P) Networks
SKIP LIST & SKIP GRAPH James Aspnes Gauri Shah
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
SKIP GRAPHS (continued)
Presentation transcript:

SKIP GRAPHS James Aspnes Gauri Shah To appear in SODA Level 0 Level 1 Level 2

2 Outline Peer-to-peer systems Existing approach: Distributed Hash Tables Our Approach: Skip Graphs Algorithms and Properties Experimental Results Conclusions and open problems

3 P2P system Bunch of peers. Store resources identified by keys. Peers subject to crash failures. Goal: locate resources efficiently. Resources Peers Key

4 Properties of ideal network Data availability Decentralization Fault-tolerance Scalability Load balancing Maintaining the network Dynamic node addition/deletion Self-stabilization Efficient searching Incorporating geography Incorporating locality [temporal, spatial]

5 Early P2P systems Napster ? x x ? x Central server bottleneck Gnutella Inefficient flooding

6 Tapestry [JKZ’01] Uses Plaxton’s Algorithm: Correct one digit at a time to reach target. Pastry [DR’01] is also similar Node xyz links to *XX, x*X and xy* [* = all digits, X = any digit]

7 CAN [RFHKS’01] (0,0) (1,0) (0,1)(1,1) Partition d-dimensional co-ordinate space into zones. Nodes own zones and keys hashed to them. Greedy routing: forward to neighbor closest to target. d=2 zone

8 Chord [SMKKB‘01] Nodes and resources mapped to identifier circle. Routing table: successor nodes at distances Greedy routing: forward to node in routing table closest to target successors identifier circle (n=8)

9 Distributed Hash Tables v2 HASH v1 v4 v3 v1 v2 v3 v4 Keys Nodes Actual Route Physical Link Virtual Link Virtual Route PHYSICAL NETWORK VIRTUAL OVERLAY NETWORK

10 AdvantagesDisadvantages Load balancing. Decentralization. O(log n) space and search time. O(log 2 n) insert and delete time [search for (log n) neighbors]. Tolerance of random faults. SKIP GRAPHS No locality properties. No tolerance to adversarial faults. No self-stabilization. No optimization wrt. geography.

11 Skip List [Pugh ’90] Data structure based on a linked list. AGJMRW HEAD TAIL Each node linked at higher level with probability 1/2. Level 0 AJM Level 1 J Level 2

12 Searching in a skip list AGJMRW HEADTAIL AJ J Search for key ‘R’ M Time for search: O(log n) on average. On average, constant number of pointers per node. Level 0 Level 1 Level 2 -- ++ success failure

13 Skip lists for P2P? Heavily loaded top-level nodes. Easily susceptible to random failures. Lacks redundancy. Disadvantages Advantages O(log n) expected search time. Retains locality. Dynamic node additions/deletions.

14 A Skip Graph A 001 J M 011 G 100 W 101 R 110 Level 1 G R W AJM Level 2 A G JMRW Level 0 Membership vectors Link at level i to nodes with matching prefix of length i. Think of a tree of skip lists that share lower layers.

15 Properties of skip graphs 1.Searching. 2.Node insertions. 3.Independence from system size. 4.Locality and range queries.

16 Searching: avg. O (log n) Same performance as DHTs. AJM GWR Level 1 G R W AJM Level 2 AGJMRW Level 0 Restricting to the lists containing the starting element of the search, we get a skip list.

17 Use doubly linked lists at each level to account for absence of head and tail nodes. So search can start at any node. Cannot use circular singly-linked list because it is hard to detect and repair an error like this: Level 0 Design aspects

18 Node Insertion – 1 A 001 M 011 G 100 W 101 R 110 Level 1 G R W A M Level 2 A GM R W Level 0 J 001 Starting at buddy node, find nearest key at level 0. Basically a range query looking for key closest to new key. Takes O(log n) on average. buddy new node

19 Node Insertion - 2 At each level i, find nearest node with matching prefix of membership vector of length i+1. A 001 M 011 G 100 W 101 R 110 Level 1 G R W A M Level 2 A GM R W Level 0 J 001 J J Total time for insertion: O(log n) DHTs take: O(log 2 n)

20 Independent of system size No need to know size of keyspace or number of nodes. E Z 10 E Z J insert Level 0 Level 1 E Z 10 E Z J 0 J 0001 E ZJ Level 0 Level 1 Level 2 Old nodes extend membership vector as required with arrivals. DHTs require knowledge of keyspace size initially.

21 Locality and range queries Find key F. Find largest key < x. Find least key > x. Find all keys in interval [D..O]. Initial node insertion at level 0. D F A I D F A I L O S

22 Applications of locality news:10/29 e.g. find latest news from yesterday. find largest key < news:10/29. news:10/27news:10/28news:10/26news:10/25 Level 0 DHTs cannot do this easily as hashing destroys locality. e.g. find any copy of some Britney Spears song. britney05britney03britney04britney02britney01 Level 0 Data Replication Version Control

23 So far... Decentralization. Locality properties. O(log n) space per node. O(log n) search, insert, and delete time. Independent of system size. Coming up... Load balancing. Tolerance to faults. Self-stabilization. Random faults. Adversarial faults.

24 Load balancing Interested in average load on a node u. i.e. the number of searches from source s to destination t that use node u. Theorem: Let dist (u, t) = d. Then the probability that a search from s to t passes through u is < 2/(d+1). where V = {nodes v: u <= v <= t} and |V| = d+1.

25 Nodes u Skip list restriction Level 0 Level 1 Level 2 Node u is on the search path from s to t only if it is in the skip list formed from the lists of s at each level. s

26 Tallest nodes Node u is on the search path from s to t only if it is in T = the set of k tallest nodes in [u..t]. u u t s u is not on path. t u u s u u is on path. Pr [u T] = Pr[|T|=k] k/(d+1) = E[|T|]/(d+1). k=1 d+1 Heights independent of position, so distances are symmetric.

27 Load on node u Start with n nodes. Each node goes to next set with prob. 1/2. We want expected size of T = last non-empty set. = T Average load on a node is inversely proportional to the distance from the destination. We show that: E[|T|] < 2. Asymptotically: E[|T|] = 1/(ln 2)  2x10 -5  … [Trie analysis] We also show that the distribution of average load declines exponentially beyond this point.

28 Expected load Actual load Destination = Node location Load on node Experimental result

29 Fault tolerance How do node failures affect skip graph performance? Random failures: Randomly chosen nodes fail. Experimental results. Adversarial failures: Adversary carefully chooses nodes that fail. Bound on expansion ratio.

30 Random faults nodes

31 Searches with random failures nodes messages

32 Adversarial faults Theorem: A skip graph with n nodes has expansion ratio = (1/log n). A dA dA = nodes adjacent to A but not in A. Expansion ratio = min |dA|/|A|, 1 <= |A| <= n/2. f failures can isolate only O(flog n ) nodes.

33 Proof intuition Consider neighbors of set A at level 0. A Level 0 1. Clumpy sets A 2. Non-clumpy sets Level 0 Low probability of clumpy sets. Non-clumpy sets have many neighbors at level 0. Gives high expansion ratio. A dA

34 Expansion ratio All sets have low probability of few neighbors at level h. And there are not too many clumpy sets. Low probability that any set A has few neighbors at level 0 or h. This gives expansion ratio = (1/log n). Same analysis applicable to DHTs?

35 Need for repair mechanism AJM G WR Level 1 G R W AJM Level 2 AGJ M RW Level 0 Node failures can leave skip graph in inconsistent state.

36 Ideal skip graph Let xR i (xL i ) be the right (left) neighbor of x at level i. xL i < x < xR i. xL i R i = xR i L i = x. Invariant k xR i = xR i-1. k xL i = xL i-1. Successor constraints x Level i Level i-1 i xR i-1 xR 1 i-1 xR 2 x If xL i, xR i exist:

37 Basic repair If a node detects a missing neighbor, it tries to patch the link using other levels Also relink at other lower levels. Successor constraints may be violated by node arrivals or failures.

38 Constraint violation Neighbor at level i not present at level (i-1). x x Level i-1 Level i Level i-1 Level i x x zipper x x

39 AC B D E F GHI J zipperOp message Level i Self-stabilization zOp(I) zOp(F) zOp(E) zOp(D) zOp(B) zOp(A) Eventually want each connected component of the skip graph to reorganize itself into an ideal skip graph.

40 Conclusions Decentralization. O(log n) space at each node. O(log n) search time. Load balancing properties. Tolerant of random faults. Similarities with DHTs

41 PropertyDHTsSkip Graphs Insert/Delete time O(log 2 n)O(log n) LocalityNoYes Repair mechanism?Partial Tolerance of adversarial faults ?Yes Keyspace sizeReqd.Not reqd. Differences

42 Open Problems Evaluate performance in practice. Incorporate geographical proximity. Study multi-dimensional skip graphs. Study effect of byzantine failures. ? Design efficient repair mechanism.