A Common API for Structured Peer-to-Peer Overlays

Slides:



Advertisements
Similar presentations
Tapestry: Scalable and Fault-tolerant Routing and Location Stanford Networking Seminar October 2001 Ben Y. Zhao
Advertisements

Tapestry: Decentralized Routing and Location SPAM Summer 2001 Ben Y. Zhao CS Division, U. C. Berkeley.
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
*Towards A Common API for Structured Peer-to-Peer Overlays Frank Dabek, Ben Y. Zhao, Peter Druschel, John Kubiatowicz, Ion Stoica MIT, U. C. Berkeley,
SCRIBE A large-scale and decentralized application-level multicast infrastructure.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Outline for today Structured overlay as infrastructures Survey of design solutions Analysis of designs.
1 Towards a Common API for Structured Peer-to-Peer Overlays Frank Dabek, Ben Zhao, Peter Druschel, John Kubiatowicz, Ion Stoica Presented for Cs294-4 by.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Decentralized Location Services CS273 Guest Lecture April 24, 2001 Ben Y. Zhao.
Wide-area cooperative storage with CFS
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Tapestry: Finding Nearby Objects in Peer-to-Peer Networks Joint with: Ling Huang Anthony Joseph Robert Krauthgamer John Kubiatowicz Satish Rao Sean Rhea.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
Tapestry: A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.
*Towards A Common API for Structured Peer-to-Peer Overlays Frank Dabek, Ben Y. Zhao, Peter Druschel, John Kubiatowicz, Ion Stoica MIT, U. C. Berkeley,
Tapestry An off-the-wall routing protocol? Presented by Peter, Erik, and Morten.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
1 Plaxton Routing. 2 Introduction Plaxton routing is a scalable mechanism for accessing nearby copies of objects. Plaxton mesh is a data structure that.
Arnold N. Pears, CoRE Group Uppsala University 3 rd Swedish Networking Workshop Marholmen, September Why Tapestry is not Pastry Presenter.
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
Cooperative File System. So far we had… - Consistency BUT… - Availability - Partition tolerance ?
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
P2P Network Structured Networks (IV) Distributed Hash Tables Pedro García López Universitat Rovira I Virgili
Peer-to-Peer Name Service (P2PNS) Ingmar Baumgart Institute of Telematics, Universität Karlsruhe IETF 70, Vancouver.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
1 More on Plaxton routing There are n nodes, and log B n digits in the id, where B = 2 b The neighbor table of each node consists of - primary neighbors.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Protocol Requirements draft-bryan-p2psip-requirements-00.txt D. Bryan/SIPeerior-editor S. Baset/Columbia University M. Matuszewski/Nokia H. Sinnreich/Adobe.
Peer to Peer Network Design Discovery and Routing algorithms
Bruce Hammer, Steve Wallis, Raymond Ho
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
Fabián E. Bustamante, Fall 2005 A brief introduction to Pastry Based on: A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and.
Peer-to-Peer Information Systems Week 12: Naming
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
CS 268: Lecture 22 (Peer-to-Peer Networks)
Pastry Scalable, decentralized object locations and routing for large p2p systems.
(slides by Nick Feamster)
CHAPTER 3 Architectures for Distributed Systems
P2PS: a Peer-to-Peer Development Platform
Plethora: Infrastructure and System Design
Accessing nearby copies of replicated objects
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
EE 122: Peer-to-Peer (P2P) Networks
Overlay Networking Overview.
Presentation by Theodore Mao CS294-4: Peer-to-peer Systems
Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service
Distributed Hash Tables
Tapestry: Scalable and Fault-tolerant Routing and Location
Applications (2) Outline Overlay Networks Peer-to-Peer Networks.
Consistent Hashing and Distributed Hash Table
A Scalable Peer-to-peer Lookup Service for Internet Applications
Peer-to-Peer Information Systems Week 12: Naming
Kademlia: A Peer-to-peer Information System Based on the XOR Metric
Presentation transcript:

A Common API for Structured Peer-to-Peer Overlays Frank Dabek, Ben Y. Zhao, Peter Druschel, Ion Stoica Change motivation to, everyone’s using these, but nobody understands what they’re using… let’s break down the pieces, so that we can remove the commonality and examine the differences. Change presentation to top down from bottom up. Fill in explanation of DOLR implementation on API, absorb anycast into DOLR. Write out names of systems onto DOLR (tapestry=pastry+scribe), DHT (can=Chord+DHash)

Structured Peer-to-Peer Overlay They are: Scalable, self-organizing overlay networks Provide routing to location-independent names Examples: CAN, Chord, Pastry, Tapestry, … Basic operation: Large sparse namespace N (integers: 0–2128 or 0–2160) Nodes in overlay network have nodeIds  N Given k  N, a deterministic function maps k to its root node (a live node in the network) route(msg, k) delivers msg to root(k) OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Current Progress Lots of applications built on top File systems, archival backup Application level multicast Routing for anonymity, attack resilience But do we really understand them? What is the core functionality that applications leverage from them? What are the strengths and weaknesses of each protocol? How can they be exploited by applications? How can we build new protocols customized to our future needs? OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Our Goals Protocol comparison Towards a common API Compare and contrast protocol semantics Identify basic commonalities Isolate and understand differences Towards a common API Easily supportable by old and new protocols Enables application portability between protocols Enables common benchmarks Provides a framework for reusable components OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Talk Outline Motivation DHTs and DOLRs A Flexible Routing API Usage Examples OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Decomposing Functional Layers Distributed Hash Tables (DHT) put(key, data), value = get(key) Hashtable layered across network Handles replication; distributes replicas randomly Routes queries towards replicas by name Decentralized Object Location and Routing (DOLR) publish(objectId), route(msg, nodeId), routeObj(msg, objectId, n) Application controls replication and placement Cache location pointers to replicas; queries quickly intersect pointers and redirect to nearby replica(s) OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

DHT Illustrated OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

DOLR Illustrated OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Architecture CFS PAST SplitStream i3 OceanStore Bayeux Tier 2 DHT Multicast DOLR CAN, Chord+DHash Tapestry Pastry+Scribe Tier 1 Replication Routing Mesh Tier 0 OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Talk Outline Motivation DHTs and DOLRs A Flexible Routing API Usage Examples OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Flexible API for Routing Goal Consistent API for leveraging routing mesh Flexible enough to build higher abstractions Openness promotes new abstractions Allow competitive selection to determine right abstractions Three main components Invoking routing functionality Accessing namespace mapping properties Open, flexible upcall interface OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

API (routing) Data types Key, nodeId = 160 bit integer Node = Address (IP + port #), nodeId Msg: application-specific msg of arbitrary size Invoking routing functionality Route(key, msg, [node]) route message to node currently responsible for key Non-blocking, best effort – message may be lost or duplicated. node: transport address of the node last associated with key (proposed first hop, optional) OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

API (namespace properties) nextHopSet = local_lookup(key, num, safe) Returns a set of at most num nodes from the local routing table that are possible next hops towards the key. Safe: whether choice of nodes is randomly chosen nodehandle[ ] = neighborSet(max_rank) Returns unordered set of nodes as neighbors of the current node. Neighbor of rank i is responsible for keys on this node should all neighbors of rank < i fail nodehandle[ ] = replicaSet(key, num) Returns ordered set of up to num nodes on which replicas of the object with key key can be stored. Result is subset of neighborSet plus local node boolean = range(node, rank, lkey, rkey) Returns whether current node would be responsible for the range specified by lkey and rkey, should the previous rank-1 nodes fail. OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

API (upcalls) Deliver(key, msg) Forward(&key, &msg, &nextHopNode) Application Application Deliver(key, msg) Delivers an incoming message to the application. One application per node. Demultiplexing done by including demux key in msg. Forward(&key, &msg, &nextHopNode) Synchronous upcall invoked at each node along route On return, will forward msg to nextHopNode App may modify key, msg, nextHopNode, or terminate by setting nextHopNode to NULL. Update(node, boolean joined) Upcall invoked to inform app of a change in the local node’s neighborSet, either a new node joining or an old node leaving. deliver forward msg msg msg Routing Layer Routing Layer OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Talk Outline Motivation DHTs and DOLRs A Flexible Routing API Usage Examples OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

DHT Implementation Interface Implementation (source S, root R) put (key, value) value = get (key) Implementation (source S, root R) Put: route(key, [PUT,value,S], NULL) Reply: route(NULL, [PUT-ACK,key], S) Get: route(key, [GET,S], NULL) Reply: route(NULL, [value,R], S) OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

DOLR Implementation Interface RouteNode(msg, nodeId) Publish(objectId) RouteObj(msg, objectId, n) Implementation (server S, client C, object O) RouteNode: route(nodeId, msg, NULL) Publish: route(objectId, [“publish”,O,S], NULL) Upcall: addLocal([O,S]) RouteObj: route(nodeId, [n,msg], NULL) Upcall: serverSet[] = getLocal(O); if (|serverSet|<n), route(nodeId, [n-|serverSet|,msg], NULL) for first n entries in serverSet, route(serverSet[i], msg, NULL) OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Conclusion Very much ongoing work Ongoing Work Feedback valuable and appreciated Ongoing Work Implementations will support routing API Working towards higher level abstractions Distributed Hash Table API DOLR publish/route API For more information, see IPTPS 2003 Thank you… OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Backup Slides Follow… OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Storage API: Overview linsert(key, value); value = lget(key); OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Storage API linsert(key, value): store the tuple <key, value> into local storage. If a tuple with key already exists, it is replaced. The insertion is atomic wrt to failures of the local node. value = lget(key): retrieves the value associated with key from local storage. Returns null if no tuple with key exists. OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Following slides contain functions that we haven’t decided on yet… To Do Following slides contain functions that we haven’t decided on yet… OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Basic DHT API: Overview insert(key, value, lease); value = get(key); release(key); Upcalls: insertData(key, value, lease); OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Basic DHT API insert(key, value, lease): inserts the tuple <key, value> into the DHT. The tuple is guaranteed to be stored in the DHT only for “lease” time. “value” also includes the type of operations to be performed on insertion. Default operation types include REPLACE: replace value associated with the same key APPEND: append value to the existing key UPCALL: generate an upcall to application before inserting … Does insert include guarantees? No assumption about network layer, but what about semantics of this call? For hard state, decouple from the kind of availability (# of replicas) For soft state, good for adaptability to node changes, can quantify the worse case loss of availability Include insert(key, value, leaselength), returns true/false (false includes if node cannot store for that long) Leaselength = 0  best effort, What are the semantics of the caches of the soft state. After timeout, the assumption is that it still may be there? The answer can make caching of data easier or harder. Keep question on the table. Cached copies should also keep an associated time out which is derived from the timeout of the primary copy. What happens when you change data, and stale data that has expired may not be deleted? Enforced delete may lend to an easy cache validation protocol. Solution: let’s have lookup(key) ignore invalidated copies, then from the application standpoint, deletes are “enforced” Fine with route. Fine with semantics from insert Question is not whether we want append (yes we do), but how it’s done. OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Basic DHT API value = get(key): retrieves the value associated with key. Returns null if no tuple with key exists in the DHT. Does insert include guarantees? No assumption about network layer, but what about semantics of this call? For hard state, decouple from the kind of availability (# of replicas) For soft state, good for adaptability to node changes, can quantify the worse case loss of availability Include insert(key, value, leaselength), returns true/false (false includes if node cannot store for that long) Leaselength = 0  best effort, What are the semantics of the caches of the soft state. After timeout, the assumption is that it still may be there? The answer can make caching of data easier or harder. Keep question on the table. Cached copies should also keep an associated time out which is derived from the timeout of the primary copy. What happens when you change data, and stale data that has expired may not be deleted? Enforced delete may lend to an easy cache validation protocol. Solution: let’s have lookup(key) ignore invalidated copies, then from the application standpoint, deletes are “enforced” Fine with route. Fine with semantics from insert Question is not whether we want append (yes we do), but how it’s done. OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Basic DHT API Release(key): releases any tuples with key from the DHT. After this operations completes, tuples with key are no longer guaranteed to exist in the DHT. OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Basic DHT API: Open questions Semantics? Verification/Access control/multiple DHTs? Caching? Replication? Should we have leases? It makes us dependent on secure time sync. OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Replicating DHT API Insert(key, value, numReplicas); adds a numReplicas argument to insert. Ensures resilience of the tuple to up to numReplicas-1 “simultaneous” node failures. Open questions: consistency OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Caching DHT API Same as basic DHT API. Implementation uses dynamic caching to balance query load. OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Resilient DHT API Same as replicating DHT API. Implementation uses dynamic caching to balance query load. OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Publish API: Overview Publish(key, object); object = Lookup(key); Remove(key, object): OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Publish API Publish(key, object): ensures that the locally stored object can be located using the key. Multiple instances of the object may be published under the same key from different locations. object = Lookup(key): locates the nearest instance of the object associated with key. Returns null if no such object exists. OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003

Publish API Remove(key, object): after this operation completes, the local instance of object can no longer be located using key. OceanStore / Sahara Retreat ravenben@eecs.berkeley.edu January 14, 2003