Peer-to-peer systems Chapter 10 1. Outline Introduction Napster and its legacy Peer-to-peer middleware Routing overlay Pastry 2.

Slides:



Advertisements
Similar presentations
Peer-to-Peer Infrastructure and Applications Andrew Herbert Microsoft Research, Cambridge
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Slides for Chapter 10: Peer-to-Peer Systems From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, © Addison-Wesley.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
FRIENDS: File Retrieval In a dEcentralized Network Distribution System Steven Huang, Kevin Li Computer Science and Engineering University of California,
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Peer-To-Peer Systems Chapter 10 B. Ramamurthy. 6/25/2015B.RamamurthyPage 2 Introduction Monolithic application Simple client-server Multi-tier client-server.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Wide-area cooperative storage with CFS
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Bruce Hammer, Steve Wallis, Raymond Ho
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 10: Peer-to-Peer.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Security Michael Foukarakis – 13/12/2004 A Survey of Peer-to-Peer Security Issues Dan S. Wallach Rice University,
Exercises for Chapter 10: Peer-to-Peer Systems Peer-to-Peer Systems
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Peer-to-peer systems Chapter Outline Introduction Napster and its legacy Peer-to-peer middleware Routing overlay Pastry 2.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Pastry Antony Rowstron and Peter Druschel Presented By David Deschenes.
Slides for Chapter 10: Peer-to-Peer Systems
Peer to Peer Network Design Discovery and Routing algorithms
Peer-to-Peer Systems: An Overview Hongyu Li. Outline  Introduction  Characteristics of P2P  Algorithms  P2P Applications  Conclusion.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Bruce Hammer, Steve Wallis, Raymond Ho
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
1 Tuesday, February 03, 2009 “Work expands to fill the time available for its completion.” - Parkinson’s 1st Law.
Peer-to-peer systems ”Sharing is caring”. Why P2P? Client-server systems limited by management and bandwidth P2P uses network resources at the edges.
Peer-to-Peer Networks 05 Pastry Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
Fabián E. Bustamante, Fall 2005 A brief introduction to Pastry Based on: A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and.
Chapter 29 Peer-to-Peer Paradigm Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
CS 268: Lecture 22 (Peer-to-Peer Networks)
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Peer-to-Peer Data Management
CHAPTER 3 Architectures for Distributed Systems
EE 122: Peer-to-Peer (P2P) Networks
Applications (2) Outline Overlay Networks Peer-to-Peer Networks.
Presentation transcript:

Peer-to-peer systems Chapter 10 1

Outline Introduction Napster and its legacy Peer-to-peer middleware Routing overlay Pastry 2

Introduction Peer-to-peer systems represent a paradigm for the construction of distributed systems and applications in which data and computational resources are contributed by many hosts on the Internet, all of which participate in the provision of a uniform service. Their emergence is a consequence of the very rapid growth of the Internet, embracing many millions of computers and similar numbers of users requiring access to shared resources. The goal of peer-to-peer systems is to enable the sharing of data and resources on a very large scale by eliminating any requirement for separately-managed servers and their associated infrastructure. 3

peer-to peer can be defined as ‘applications that exploit resources available at the edges of the Internet – storage, cycles, content, human presence’. Each type of resource sharing mentioned in that definition is already represented by distributed applications available for most types of personal computer. The purpose of this chapter is to describe some general techniques that simplify the construction of peer-to-peer applications and enhance their scalability, reliability and security. 4

What is a peer? Contrasted with Client- Server model Servers are centrally maintained and administered Client has fewer resources than a server

What is a peer? A peer’s resources are similar to the resources of the other participants P2P – peers communicating directly with other peers and sharing resources Often administered by different entities Compare with DNS

client-server vs. Peer-to-peer systems Traditional client-server systems manage and provide access to resources such as files, web pages or other information objects located on a single server computer or a small cluster of tightly coupled servers. With such centralized designs, few decisions are required about the placement of the resources or the management of server hardware resources, but the scale of the service is limited by the server hardware capacity and network connectivity. Peer-to-peer systems provide access to information resources located on computers throughout a network (whether it be the Internet or a corporate network). 7

8 jasonrundell.com

Peer-to-peer systems share these characteristics Their design ensures that each user contributes resources to the system All the nodes in a P2P system have the same functional capabilities and responsibilities. Their operation does not depend on the existence of any centrally administrated systems. They can be designed to offer a limited degree of anonymity to the providers and users of resources. A key issue for their efficient operation is the choice of an algorithm for the placement of data across many hosts and subsequent access to it to balance the load and provide availability. 9

P2P generations 1. The first generation was launched by the Napster music exchange service 2. A second generation of file-sharing application offering grater scalability, anonymity and fault tolerance quickly followed including Freenet, Gnutella, Kazaa and BitTorrent. 3. The third generation is characterized by the emergence of middleware layers for application-independent management of distributed resources on a global scale Middleware platforms:( Pastry, Tapestry, CAN, Chord and Kademlia) 10

Middleware platforms These platforms are designed to place resources (data objects, files) on a set of computers that are widely distributed throughout the Internet and to route messages to them on behalf of clients. 11

Advantages of third generation: Middleware platforms relieve clients of decisions about placing resources and holding information about the whereabouts of resources they require. They provide guarantees of delivery for requests in a bounded number of network hops. They replace replicas of resources on available host computers in a structured manner. 12

Globally unique identifiers GUIDs Resources are identified by globally unique identifiers (GUIDs. They are usually derived as a secure hash ( تجزئة امنة ) from some or all of the resource’s state. Advantages: The use of a secure hash makes a resource ‘self certifying’ which protect it against tampering by un-trusted nodes on which it may be stored. Disadvantages: This technique requires that the states of resources are immutable. 13

Peer-to-peer challenges Peer-to-peer storage systems use for objects with changing values is more challenging but can be addressed by the addition of trusted servers to manage a sequence of versions and identify the current version. The use of peer-to-peer systems for applications that demand a high level of availability for objects stored requires careful application design to avoid situations in which all of the replicas of an object are simultaneously unavailable. Solution: randomly-distributed GUID If the underlying network spans many organizations across the globe, then the risk of simultaneous unavailability is much reduced. 14

Overlay routing versus IP routing Routing overlays share many characteristics with the IP packet routing infrastructure that constitutes the primary communication mechanism of the Internet. There are several distinctions between them that are identified in Figure

IPv6 – Caen, 11 juin Overlay networks IP Overlay

Figure 10.1: Distinctions between IP and overlay routing for peer-to-peer applications 17

P2P Application Taxonomy P2P Systems Distributed Computing File Sharing Gnutella Collaboration Jabber Platforms JXTA

Distributed computation Example : The computation It is unusual, it does not involve any communication or coordination between computers while they are processing the work units. the results are communicated to a central server in a single short message that may be delivered whenever the client and server are available. 19

Distributed Computing

Napster and its legacy The first large scale peer-to-peer network was Napster, set up in 1999 to share digital music files over the Internet. While Napster maintained centralized (and replicated) indices, the music files were created and made available by individuals, usually with music copied from CDs to computer files. Napster’s architecture included centralized indexes, but users supplied the files, which were stored and accessed on their personal computers. Napster’s method of operation is illustrated by the sequence of steps shown in Figure

22 Note that in step 5 clients are expected to add their own music files to the pool of shared resources by transmitting a link to the Napster indexing service for each available file Thus the motivation for Napster and the key to its success was the making available of a large, widely distributed set of files to users throughout the Internet, by providing access to ‘shared resources at the edges of the Internet’.

Figure 10.2: Napster: peer-to-peer file sharing with a centralized, replicated index 23

IPv6 – Caen, 11 juin Centralized model (Napster) File-sharing system Almost distributed system The location of a document is centralized The "transfer" is peer-to-peer Problems Robustness Scalability (?)

Lessons Learned from Napster Napster created a network of millions of people, with thousands of files being transferred at the same time. There were quality issues. While Napster displayed link speeds to allow users to choose faster downloads, the fidelity of recordings varied widely. Since Napster users were parasites of the recording companies, there was some central control over selection of music. One benefit was that music files did not need updates. There was no guarantee of availability for a particular item of music. 25

Peer-to-peer middleware 26 Functional requirements A key problem in Peer-to-Peer applications is to provide a way for clients to access data resources quickly and dependably. Similar needs in client/server technology led to solutions like NFS. However, NFS relies on pre-configuration and is not scalable enough for peer-to-peer. Peer clients need to locate and communicate with any available resource, even though resources may be widely distributed and configuration may be dynamic, constantly adding and removing resources and connections.

Non-functional requirement 27 Global Scalability Peer-to-peer middleware must be designed to support applications that access millions of objects on hundred of thousands of hosts. Load Balancing The performance of any system designed to exploit a large number of computers depends upon the balanced distribution of workload across them. For the systems we are considering, this will be achieved by a random placement of resources together with the use of replicas of heavily used resources. Optimization for local interactions between neighbouring peers: The middleware should aim to place resources close to the nodes that access them the most.

Accommodating to highly dynamic host availability As hosts join the system, they must integrated into the system and the load must be re-distributed to exploit their new resources. When they leave the system, the system must detect their departure and re-distribute their load and resources. Security of data Trust must be built up by the use of authentication and encryption mechanisms to ensure integrity and privacy of information. 28 Non-functional requirement (con.)

Anonymity, deniability, and resistance to censorship (in some applications) Host that hold data should be able to deny responsibility for holding or supplying it. 29 Non-functional requirement (con.)

Routing overlay 30 A routing overlay is a distributed algorithm for a middleware layer responsible for routing requests from any client to a host that holds the object to which the request is addressed. Any node can access any object by routing each request through a sequence of nodes, exploiting knowledge at each of theme to locate the destination object. The routing overlay ensures that any node can access any object by routing each request through a sequence of nodes, exploiting knowledge at each of them to locate the destination object.

Peer-to-peer systems usually store multiple replicas of objects to ensure availability. In that case, the routing overlay maintains knowledge of the location of all the available replicas and delivers requests to the nearest ‘live’ node (i.e. one that has not failed) that has a copy of the relevant object. The GUIDs used to identify nodes and objects are an example of the ‘pure’ names.These are also known as opaque identifiers, since they reveal nothing about the locations of the objects to which they refer. 31

Figure 10.3: Distribution of information in a routing overlay 32

The main task of a routing overlay is the following: Routing of requests to objects: A client wishing to invoke an operation on an object submits a request including the object’s GUID to the routing overlay, which routes the request to a node at which a replica of the object resides. 33

The routing overlay must also perform some other tasks: 1- Insertion of objects: A node wishing to make a new object available to a peer-to-peer service computes a GUID for the object and announces it to the routing overlay, which then ensures that the object is reachable by all other clients. 2- Deletion of objects: When clients request the removal of objects from the service the routing overlay must make them unavailable. 3- Node addition and removal: Nodes (i.e., computers) may join and leave the service. When a node joins the service, the routing overlay arranges for it to assume some of the responsibilities of other nodes. When a node leaves (either voluntarily( ( طوعا or as a result of a system or network fault), its responsibilities are distributed amongst the other nodes. 34

35 Basic programming interface for a distributed hash table (DHT) as implemented by the PAST API over Pastry put(GUID, data) The data is stored in replicas at all nodes responsible for the object identified by GUID. remove(GUID) Deletes all references to GUID and the associated data. value = get(GUID) The data associated with GUID is retrieved from one of the nodes responsible it. The DHT layer take responsibility for choosing a location for data item, storing it (with replicas to ensure availability) and providing access to it via get() operation. Routing overlay

Overlay case studies: Pastry, Pastry is the message routing infrastructure deployed in several applications including PAST [Druschel and Rowstron 2001], an archival (immutable) file storage system implemented as a distributed hash table Pastry has a straightforward but effective design that makes it a good first example for us to study in detail. 36

Pastry Pastry is a routing overlay. All the nodes and objects that can be accessed through Pastry are assigned 128-bit GUIDs. In a network with N participating nodes, the Pastry routing algorithm will correctly route a message addressed to any GUID in O(logN) steps. If the GUID identifies a node that is currently active, the message is delivered to that node; otherwise, the message is delivered to the active node whose GUID is numerically closest to it (the closeness referred to here is in an entirely artificial space- the space of GUIDs) 37

Pastry In a network with N participating nodes, the Pastry routing algorithm will correctly route a message addressed to any GUID in O(log N) steps. If the GUID identifies a node that is currently active, the message is delivered to that node; otherwise, the message is delivered to the active node whose GUID is numerically closest to it. Active nodes take responsibility for processing requests addressed to all objects in their numerical neighbourhood. Routing steps involve the use of an underlying transport protocol (normally UDP) to transfer the message to a Pastry node that is ‘closer’ to its destination. The GUID space is treated as circular: GUID 0’s lower neighbor is

The real transport of a message across the Internet between two Pastry nodes may require a substantial number of IP hops. To minimize the risk of unnecessarily extended transport paths, Pastry uses a locality metric based on network distance in the underlying network (such as a hop counts or round- trip latency measurements) to select appropriate neighbours when setting up the routing tables used at each node. 39

Pastry- Routing algorithm The full routing algorithm involves the use of a routing table at each node to route messages efficiently, but for the purposes of explanation, we describe the routing algorithm in two stages: The first stage decribes a simplified form of the algorithm which routes messages correctly but inefficiently without a routing table The second stage describes the full routing algorithm which routes a request to any node in O(logN) messages. 40

Pastry- Routing algorithm Stage 1: Any node A that recieves a message M with destination address D routes the message by comparing D with its own GUID A and with each of the GUIDs in its leaf set and forwarding M to the node amongst them that is numerically closet to D At each step M is forwarded to node that is closer to D than the current node and that this process will eventually deliver M to the active node closer to D Very inefficient, requiring ~N/2l hops to deliver a message in a network with N nodes 41

Pastry- Routing algorithm The diagram illustrates the routing of a message from node 65A1FC to D46A1C using leaf set information alone, assuming leaf sets of size 8 (l=4) 42

Pastry- Routing algorithm Stage 2: Each Pastry node maintains a routing table giving GUIDs and IP addresses for a set of nodes spread throughout the entire range of possible GUID values The routing table is structured as follows: GUIDs are viewed as hexadecimal values and the table classifies GUIDs based on their hexadecimal prefixes The table has as many rows as there are hexadecimal digits in a GUID, so for the prototype Pastry system that we are describing, there are 128/4 = 32 rows Any row n contains 15 entries – one for each possible value of the n th hexadecimal digit excluding the value in the local node’s GUID. Each entry in the table points to one of the potentially many nodes whose GUIDs have the relevant prefix 43

Pastry- Routing algorithm Stage 2 (cont.): The routing table is located at the node whose GUID begins 65A1 44

Pastry- Routing algorithm Stage 2 (cont.): To handle a message M addressed to a node D (where R[p,i] is the element at column i, row p of the routing table) 1. If (L -l < D < L l ) { //the destination is within the leaf set or is the current node 2. Forward M to the element L i of the leaf set with GUID closest to D or the current node A 3. } else { // use the routing table to despatch M to a node with the closer GUID 4. Find p (the length of the longest common prefix of D and A), and i (the (p+1) th hexadecimal digit of D) 5. If (R[p,i]  null) forward M to R[p,i] //route M to a node with a longer common prefix 6. else { //there is no entry in the routing table 7. Forward M to any node in L and R with a common prefix of length i, but a GUID that is numerically closer. 8. } 9. } 45