Applications (2) Outline Overlay Networks Peer-to-Peer Networks.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Peer-to-Peer Structured Overlay Networks
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 PASTRY Partially borrowed from Gabi Kliot ’ s presentation.
Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems Peter Druschel, Rice University Antony Rowstron, Microsoft.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Applications over P2P Structured Overlays Antonino Virgillito.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Pastry Partially borrowed for Gabi Kliot. Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems  Antony Rowstron.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Distributed Lookup Systems
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems (Antony Rowstron and Peter Druschel) Shariq Rizvi First.
Mobile Ad-hoc Pastry (MADPastry) Niloy Ganguly. Problem of normal DHT in MANET No co-relation between overlay logical hop and physical hop – Low bandwidth,
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
CS 3700 Networks and Distributed Systems Overlay Networks (P2P DHT via KBR FTW) Revised 4/1/2013.
1 Slides from Richard Yang with minor modification Peer-to-Peer Systems: DHT and Swarming.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
Peer to Peer Network Design Discovery and Routing algorithms
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Fabián E. Bustamante, Fall 2005 A brief introduction to Pastry Based on: A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Chapter 29 Peer-to-Peer Paradigm Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
Peer-to-Peer Information Systems Week 12: Naming
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Distributed Hash Tables
CS 3700 Networks and Distributed Systems
(slides by Nick Feamster)
CHAPTER 3 Architectures for Distributed Systems
Accessing nearby copies of replicated objects
Peter Druschel, Rice University Antony Rowstron,
PASTRY.
A Scalable content-addressable network
P2P Systems and Distributed Hash Tables
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Peer-to-Peer Information Systems Week 12: Naming
#02 Peer to Peer Networking
Presentation transcript:

Applications (2) Outline Overlay Networks Peer-to-Peer Networks

Overlay Network An overlay network is a virtual network That is constructed on top of physical network Virtual nodes corresponds to physical node Virtual links correspond a (possibly multi-hop) path/tunnel in the physical network That is used for implementing application-specific functions that the traditional Internet otherwise does not provide

Overlay Network

Examples of Overlay Networks VPN (virtual private network) Used as a private/secure overlay network connecting remote company sites Private/secure tunnels connects sites MBone Used for multicast MBone-ware nodes construct a multicast tree, tunneling through regular routers using unicast 6Bone Used for implementing IPv6 IPv6-aware nodes uses v6 addresses, tunneling through regular nodes by v4 addressing

Overlay Network IHdr: can be IPv6 header, application-specific names, even content attributes OHdr: regular IPv4 header

Examples of Overlay Networks End system multicast Only hosts participate Unlike VPN, Mbone, 6bone, which require router participation Hosts connect with each other using UDP tunnels, forming an overlay mesh, on top of which multicast trees are built Resilient overlay networks For finding more efficient, robust routes Problem: BGP does not guarantee shortest path, possible to have: L(AB) > L(AC) + L(CB) Solution: Monitor path metrics (delay, bandwidth, loss probability) among all pairs of participating hosts (n*n pairs) Select the best path between each pair, adapt to change in network condition Modest performance improvements, but much quicker recovery from failures, not scalable

Peer-to-Peer Networks Definition: A P2P network allows a community of peers (users) to share content, storage, processing, bandwidth, etc. In contrast to traditional model of distributed computing (client-server) Characteristics: Decentralized control Self-organization Symmetric (peer) relationship

Examples Pioneers Research Prototypes Napster Gnutella FreeNet Centralized directory Distributed file Gnutella Each node knows a couple of peers, forming an irregular mesh overlay network A node that needs an object floods a query, a peer that knows the object sends a reply. FreeNet A p2p network with anonymity and privacy (encrypted content) Research Prototypes Structured vs. unstructured random/broadcast search vs. routing/look-up tables BitTorrent, Pastry, Chord, CAN,…

BitTorrent A file is divided into pieces, which are replicated in the BitTorrent network (called swarm). Pieces are downloaded in random order A node become a source for a piece as soon as it downloads it. The more nodes download a piece, the more the piece is replicated. To join the swarm, a peer contacts a tracker (server) to download a partial list of other peers. Peers exchange bitmaps reporting the pieces they hold. If a peer’s neighbors do not have a particular piece, the peer can contact its track for more peers, or wait while downloading other pieces.

BitTorrent Newer versions of BitTorrent support trackerless swarms. Reason: tracker can be a single point of failure and a performance bottleneck. Peer finder (implemented in the client software) form an overlay network Each finder belongs to a swarm, and the network has many interconnected swarms. Each finder maintains a table of peers whose IDs are close to its own ID plus a few whose IDs are distant. To find a particular peer, the finder first send the request to the peers in its table whose ID’s are close to that of the target swarm. The request is routed to the target progressively in terms of distance in the ID space.

Structured P2P Consistent hashing: Mapping object names and node (IP) addresses onto the same ID space, typically arranged in a ring hash (obj_name) = obj_id hash (node_addr) = node_id An object is stored at the node whose node_id is closest to the obj_id. Hash turns a number/text to a random number uniformly distributed Good for load balancing Locating an object is a simple matter of invoking a hash function

Common Issues Organize, maintain overlay network node arrivals node failures Resource allocation/load balancing Resource location Locality (network proximity) Idea: generic p2p substrate

Object Distribution Consistent hashing [Karger et al. ‘97] 128 bit circular id space nodeIds (uniform random) objIds (uniform random) Invariant: node with numerically closest nodeId maintains object objid nodeids 2 128 - 1 Each node has a randomly assigned 128-bit nodeId, circular namespace Basic operation: A message with key X, sent by any Pastry node, is delivered to the live node with nodeId closest to X in at most log16 N steps (barring node failures). Pastry uses a form of generalized hypercube routing, where the routing tables are initialized and updated dynamically.

Object Insertion/Lookup 2128 - 1 O Msg with key X is routed to live node with nodeId closest to X Problem: complete routing table not feasible X Each node has a randomly assigned 128-bit nodeId, circular namespace Basic operation: A message with key X, sent by any Pastry node, is delivered to the live node with nodeId closest to X in at most log16 N steps (barring node failures). Pastry uses a form of generalized hypercube routing, where the routing tables are initialized and updated dynamically. Route(X)

Routing Properties log16 N steps O(log N) state Each node has a randomly assigned 128-bit nodeId, circular namespace Basic operation: A message with key X, sent by any Pastry node, is delivered to the live node with nodeId closest to X in at most log16 N steps (barring node failures). Pastry uses a form of generalized hypercube routing, where the routing tables are initialized and updated dynamically. Properties log16 N steps O(log N) state

Leaf Sets Each node maintains IP addresses of the nodes with the L numerically closest larger and smaller nodeIds, respectively. routing efficiency/robustness fault detection (keep-alive) application-specific local coordination

Routing Procedure if (destination is within range of our leaf set) forward to numerically closest member else let l = length of shared prefix let d = value of l-th digit in D’s address if (Rld exists) forward to Rld forward to a known node that (a) shares at least as long a prefix (b) is numerically closer than this node

Routing Integrity of overlay: guaranteed unless L/2 simultaneous failures of nodes with adjacent nodeIds Number of routing hops: No failures: < log16 N expected, 128/b + 1 max During failure recovery: O(N) worst case, average case much better

Node Addition Each node has a randomly assigned 128-bit nodeId, circular namespace Basic operation: A message with key X, sent by any Pastry node, is delivered to the live node with nodeId closest to X in at most log16 N steps (barring node failures). Pastry uses a form of generalized hypercube routing, where the routing tables are initialized and updated dynamically.

Node Departure (Failure) Leaf set members exchange keep-alive messages Leaf set repair (eager): request set from farthest live node in set Routing table repair (lazy): get table from peers in the same row, then higher rows

API route(M, X): route message M to node with nodeId numerically closest to X deliver(M): deliver message M to application forwarding(M, X): message M is being forwarded towards key X newLeaf(L): report change in leaf set L to application

PAST: Cooperative, archival file storage and distribution Layered on top of Pastry Strong persistence High availability Scalability Reduced cost (no backup) Efficient use of pooled resources

PAST API Insert - store replica of a file at k diverse storage nodes Lookup - retrieve file from a nearby live storage node that holds a copy Reclaim - free storage associated with a file Files are immutable

PAST: File storage fileId Insert fileId PAST file storage is mapped onto the Pastry overlay network by maintaing the invariant that replicas of a file are stored on the k nodes that are numerically closest to the file’s numeric fileId. During an insert operation, an insert request for the file is routed using the fileId as the key. The node closest to fileId replicates the file on the k-1 next nearest nodes in then namespace.

PAST: File storage Storage Invariant: File “replicas” are fileId Insert fileId k=4 Storage Invariant: File “replicas” are stored on k nodes with nodeIds closest to fileId (k is bounded by the leaf set size) PAST file storage is mapped onto the Pastry overlay network by maintaing the invariant that replicas of a file are stored on the k nodes that are numerically closest to the file’s numeric fileId. During an insert operation, an insert request for the file is routed using the fileId as the key. The node closest to fileId replicates the file on the k-1 next nearest nodes in then namespace.

PAST: File Retrieval C k replicas Lookup fileId file located in log16 N steps (expected) usually locates replica nearest client C The last point is shown pictorally here. A lookup request is routed in at most log16 N steps to a node that stores a replica, if one exists. In practice, the node among the k that first receives the message serves the file. Furthermore, network locality properties of Pastry (not discussed in this talk) ensure that this is node is usually the node that is closest to the client in the network !!