1 Distributed Data Structures for a Peer-to-peer system Advisor: James Aspnes Committee: Joan Feigenbaum Arvind Krishnamurthy Antony Rowstron [MSR, Cambridge,

Slides:



Advertisements
Similar presentations
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Scalable Content-Addressable Network Lintao Liu
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
SKIP GRAPHS Slides adapted from the original slides by James Aspnes Gauri Shah.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 PASTRY Partially borrowed from Gabi Kliot ’ s presentation.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Presented by Elisavet Kozyri. A distributed application architecture that partitions tasks or work loads between peers Main actions: Find the owner of.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Presented by.
Fault-tolerant Routing in Peer-to-Peer Systems James Aspnes Zoë Diamadi Gauri Shah Yale University PODC 2002.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Distributed Lookup Systems
Aggregating Information in Peer-to-Peer Systems for Improved Join and Leave Distributed Computing Group Keno Albrecht Ruedi Arnold Michael Gähwiler Roger.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
P2P Course, Structured systems 1 Skip Net (9/11/05)
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
SKIP GRAPHS James Aspnes Gauri Shah To appear in SODA Level 0 Level 1 Level 2.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Symmetric Replication in Structured Peer-to-Peer Systems Ali Ghodsi, Luc Onana Alima, Seif Haridi.
Other Structured P2P Systems CAN, BATON Lecture 4 1.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Introduction of P2P systems
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
NCLAB 1 Supporting complex queries in a distributed manner without using DHT NodeWiz: Peer-to-Peer Resource Discovery for Grids Sujoy Basu, Sujata Banerjee,
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
Peer-to-Peer Networks 05 Pastry Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Peer-to-Peer Networks 07 Degree Optimal Networks
Accessing nearby copies of replicated objects
SKIP GRAPHS James Aspnes Gauri Shah SODA 2003.
EE 122: Peer-to-Peer (P2P) Networks
SKIP LIST & SKIP GRAPH James Aspnes Gauri Shah
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
SKIP GRAPHS (continued)
Presentation transcript:

1 Distributed Data Structures for a Peer-to-peer system Advisor: James Aspnes Committee: Joan Feigenbaum Arvind Krishnamurthy Antony Rowstron [MSR, Cambridge, UK] Gauri Shah

2 P2P system Very large number of peers (nodes). Store resources identified by keys. Peers subject to crash failures. Question: how to locate resources efficiently? Resources Peers Key

3 A brief history Shawn Fanning starts Napster. June 1999 Dec RIAA sues Napster for copyright infringement. July 2001 Napster is shut down! Napster KaZaA Gnutella Morpheus MojoNation …… Napster clones CAN Chord Pastry Tapestry Skip graphs …… Academic Research …… Distributed computing

4 Answer: Central server? Napster ? x Central server bottleneck Wasted power at clients No fault tolerance x ? x Using server farms?

5 Answer: Flooding? Gnutella Too much traffic Available resources ‘out-of-reach’

6 Answer: Super-peers? KaZaA/Morpheus Inherently unscalable Super-peers

7 What would we like? Data availability Decentralization Scalability Load balancing Fault-tolerance Network maintenance Dynamic node addition/deletion Repair mechanism Efficient searching Incorporating proximity Incorporating locality

8 Distributed Hash Tables Node Physical Link HASH Resource v2 v1 v4 v3 Virtual Link VIRTUAL OVERLAY NETWORK PHYSICAL NETWORK v3 v1 v4 v1 v3 v4 Virtual Route Actual RouteNode Ids and keys

(0,0) (1,0) (0,1)(1,1) d=2 CAN [RFHKS ’01] Pastry [RD ’01] Tapestry [ZKJ ’01] Existing DHT systems Chord [SMKKB ’01] O(log n) time per search O(log n) space per node

10 What does this give us? Data availability Decentralization Scalability Load balancing Fault-tolerance Network maintenance Dynamic node addition/deletion Repair mechanism Efficient searching Incorporating proximity Incorporating locality

11 Analytical model [Aspnes-Diamadi-Shah, PODC 2002] Questions: Performance with failures? Optimal link distribution for greedy routing? Construction and dynamic maintenance?

12 Our approach (Based on [Kleinberg 1999]) Simple metric space: 1D line. Hash(key) = Metric space location. 2 short-hop links: immediate neighbors. k long-hops links: inverse-distance distribution. Pr[edge(u,v)] = 1/d(u,v) / Greedy Routing: forward message to neighbor closest to target in metric space. 1/d(u,v’)

13 Performance with failures Without failures: Routing time: O((log 2 n)/k). With failures: Each node/link fails with prob. p. Routing time: O((log 2 n)/[(1-p).k]). p (1-p) Time Each node has k [1..log n] long-hop links.

14 Search with random failures n = nodes log n = 17 links Fraction of failed searches Probability of node failure Fraction of failed searches [Non-faulty source and target]

15 Lower bounds? Is it possible to design a link distribution that beats the O(log 2 n) bound for routing given by 1/d distribution? Lower bound on routing time as a function of number of links per node.

16 Lower bounds Random graph G. Node x has k links on average, each chosen independently. x links to (x-1) and (x+1). Let target = 0. Expected time to reach 0 from any point chosen uniformly from 1..n: (log 2 n) worse than O(log n) for a tree: cost of assuming symmetry between nodes. * * Probability of choosing links symmetric about 0 and unimodal. Routing time: (log 2 n/k log log n)

17 Heuristic for construction New node chooses neighbors using inverse-distance distribution. Links to live nodes closest to chosen ones. Selects older nodes to point to it. absent node adjusted link initial link new link older node y x Same strategy for repairing broken links. new node ideal link

18 n=16384 nodes log n=14 links Derived Ideal

19 So far... Data availability Decentralization Scalability Load balancing Fault-tolerance Network maintenance Dynamic node addition/deletion Repair mechanism Efficient searching Incorporating proximity Incorporating locality

20 Disadvantage of DHTs No support for locality. User requests Likely to request System should use information from first search to improve performance of second search. No support for complex queries. DHTs cannot do this as hashing destroys locality.

21 Skip list [Pugh ’90] Data structure based on a linked list. AGJMRW HEAD TAIL Each element linked at higher level with probability 1/2. Level 0 AJM Level 1 J Level 2

22 Searching in a skip list AGJMRW HEADTAIL AJ J Search for key ‘R’ M Level 0 Level 1 Level 2 -- ++ success failure Time for search: O(log m) on average. Number of pointers per element: O(1) on average. [m = number of elements in skip list]

23 Skip lists for P2P? Cannot reduce load on top-level elements. Cannot survive partitioning by failures. Disadvantages Advantages Takes O(log m) expected search time. Retains locality. Supports dynamic additions/deletions. Problem: Lack of redundancy.

24 A skip graph [Aspnes-Shah, SODA 2003] A 000 J 001 M 011 G 100 W 101 R 110 Level 1 G R W AJM Level 2 A G JMRW Level 0 Membership vectors Link at level i to elements with matching prefix of length i. Average O(log m) pointers per element [m = number of resources].

25 Search: expected O (log m) Same performance as skip lists and DHTs. AJM GWR Level 1 G R W AJM Level 2 AGJRW Level 0 Restricting to the lists containing the starting element of the search, we get a skip list. M

26 Resources vs. nodes Skip graphs: Elements are resources. DHTs: Elements are nodes. C B D A E Does not affect search performance or load balancing. But increases pointers at each node. DHT Skip graph Physical Network Physical Network A E C B D Level 0

27 com.applecom.sun com.ibm com.microsoft com.ibm/m1 com.ibm/m2 com.ibm/m3 m3 com.ibm/m4 m2 m1 m4 r.htm a.htm …… f.htm g.htm …… Level 0 SkipNet [HJSTW’03] Distributed Hash Table

28 So far... Data availability Decentralization Scalability Load balancing Fault-tolerance Network maintenance Dynamic node addition/deletion Repair mechanism Efficient searching Incorporating proximity Incorporating locality

29 Insertion – 1 A 000 M 011 G 100 W 101 R 110 Level 1 G R W A M Level 2 A GM R W Level 0 J 001 Starting at buddy, find nearest key at level 0: range query looking for key closest to new key. Takes O(log m) on average. buddy new element

30 Insertion - 2 A 000 M 011 G 100 W 101 R 110 Level 1 G R W A M Level 2 A GM R W Level 0 J 001 J J Adds O(1) time per level. Total time for insertion: O(log m). Same as most DHTS. Search for matching prefix of increasing length.

31 So far... Data availability Decentralization Scalability Load balancing Fault-tolerance Network maintenance Dynamic node addition/deletion Repair mechanism Efficient searching Incorporating proximity Incorporating locality

32 Locality and range queries Find any key F. Find largest key < F. Find least key > F. Find all keys in interval [D..O]. Initial element insertion at level 0. D F A I D F A I L O S

33 Further applications of locality news:05/14 e.g. find latest news before today. find largest key < news: 05/14. news:03/18news:04/03news:03/01news:01/31 Level 0 1. Version Control

34 e.g. find any copy of some Britney Spears song: search for britney*. britney03britney04britney02 Level 0 2. Data Replication Level 1 Level 2 Provides hot-spot management and survivability.

35 What’s left? Data availability Decentralization Scalability Load balancing Fault-tolerance Network maintenance Dynamic node addition/deletion Repair mechanism Efficient searching Incorporating proximity Incorporating locality

36 Fault tolerance How do failures affect skip graph performance? Random failures: Randomly chosen elements fail. Experimental results. [Experiments may not necessarily give worst failure pattern.] Adversarial failures: Adversary carefully chooses elements that fail. Theoretical results.

37 Random failures elements

38 Searches with random failures elements messages Fraction of failed searches [Non-faulty source and target]

39 Adversarial failures Theorem: A skip graph with m elements has expansion ratio = (1/log m) whp. A dA dA = elements adjacent to A but not in A. Expansion ratio = min |dA|/|A|, 1 <= |A| <= m/2. f failures can isolate only O(flog m) elements. # of failures isolated elements >= |dA| >= |A| 1 log m

40 Need for repair mechanism AJM G WR Level 1 G R W AJM Level 2 AGJ M RW Level 0 Node failures can leave skip graph in inconsistent state.

41 33 Basic repair action If an element detects a missing neighbor, it tries to patch the link using other levels Also relink at other lower levels. Eventually each connected component of the disrupted skip graph reorganizes itself into an ideal skip graph.

42 Ideal skip graph Let xR i (xL i ) be the right (left) neighbor of x at level i. xL i < x < xR i. xL i R i = xR i L i = x. Invariant If xL i, xR i exist: Successor constraints x Level i Level i-1 xR i xR i-1 x xR i = xR i-1, for some k’. xL i = xL i-1, for some k. k k’ 1 2

43 Constraint violation Neighbor at level i not present at level (i-1). Level i-1 Level i Level i+1 merge Level i-1 Level i

44 Additional properties 1.Low network congestion. 2.No need to know key space.

45 Network congestion Interested in average traffic through any element u i.e. the number of searches from source s to destination t that use u. Theorem: Let dist (u, t) = d. Then the probability that a search from s to t passes through u is < 2/(d+1). where V = {elements v: u <= v <= t} and |V| = d+1. Elements near popular target get loaded but effect drops off rapidly.

Location Fraction of messages Predicted vs. real load Predicted load Actual load Destination = 76500

47 Knowledge of key space DHTs require knowledge of key space size initially. Skip graphs do not! E Z 10 E Z J insert Level 0 Level 1 Old elements extend membership vector as required with new arrivals. E Z 10 E Z J 0 J ZJ Level 0 Level 1 Level 2 New bit

48 Similarities with DHTs Data availability Decentralization Scalability Load balancing Fault-tolerance [Random failures] Network maintenance Dynamic node addition/deletion Repair mechanism Efficient searching Incorporating proximity Incorporating locality

49 PropertyDHTsSkip Graphs Tolerance of adversarial faults Not yetYes LocalityNoYes Key space sizeReqd.Not reqd. ProximityPartiallyNo Differences

50 Open Problems Design more efficient repair mechanism. Incorporate proximity. Study effect of byzantine/selfish behavior. Provide locality and state-minimization Some promising approaches: Soln: Composition of data structures [AS’03, ZSZ’03] Tool: Locality-sensitive hashing [LS’96, IMRV’97]

51 Questions, Comments, Criticisms