1 Peer-to-Peer Systems SVTH: Lê Thành Nguyên 00707174 Võ Lê Quy Nhơn 00707176.

Slides:



Advertisements
Similar presentations
Peer-to-Peer Infrastructure and Applications Andrew Herbert Microsoft Research, Cambridge
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Slides for Chapter 10: Peer-to-Peer Systems From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, © Addison-Wesley.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Peer-To-Peer Systems Chapter 10 B. Ramamurthy. 6/25/2015B.RamamurthyPage 2 Introduction Monolithic application Simple client-server Multi-tier client-server.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Wide-area cooperative storage with CFS
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
Tapestry: A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.
Peer-to-peer file-sharing over mobile ad hoc networks Gang Ding and Bharat Bhargava Department of Computer Sciences Purdue University Pervasive Computing.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
Forensics Investigation of Peer-to- Peer File Sharing Networks Authors: Marc Liberatore, Robert Erdely, Thomas Kerle, Brian Neil Levine & Clay Shields.
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Peer-to-Peer Networking. Presentation Introduction Characteristics and Challenges of Peer-to-Peer Peer-to-Peer Applications Classification of Peer-to-Peer.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Freenet File sharing for a political world. Freenet: A Distributed Anonymous Information Storage and Retrieval System I. Clarke, O. Sandberg, B. Wiley,
Bruce Hammer, Steve Wallis, Raymond Ho
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 10: Peer-to-Peer.
Security Michael Foukarakis – 13/12/2004 A Survey of Peer-to-Peer Security Issues Dan S. Wallach Rice University,
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Peer-to-peer systems Chapter Outline Introduction Napster and its legacy Peer-to-peer middleware Routing overlay Pastry 2.
Peer-to-peer systems Chapter Outline Introduction Napster and its legacy Peer-to-peer middleware Routing overlay Pastry 2.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Slides for Chapter 10: Peer-to-Peer Systems
1. 2 Tuesday, January 27, 2009 “In the confrontation between the stream and the rock, the stream always wins, not through strength but by perseverance.”
Peer to Peer Network Design Discovery and Routing algorithms
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Bruce Hammer, Steve Wallis, Raymond Ho
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
1 Tuesday, February 03, 2009 “Work expands to fill the time available for its completion.” - Parkinson’s 1st Law.
Fabián E. Bustamante, Fall 2005 A brief introduction to Pastry Based on: A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and.
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
CS 268: Lecture 22 (Peer-to-Peer Networks)
Pastry Scalable, decentralized object locations and routing for large p2p systems.
CHAPTER 3 Architectures for Distributed Systems
Internet Networking recitation #12
EE 122: Peer-to-Peer (P2P) Networks
Presentation transcript:

1 Peer-to-Peer Systems SVTH: Lê Thành Nguyên Võ Lê Quy Nhơn

2 Peer-to-Peer  An alternative to the client/server model of distributed computing is the peer-to-peer model.  Client/server is inherently hierarchical, with resources centralized on a limited number of servers.  In peer-to-peer networks, both resources and control are widely distributed among nodes that are theoretically equals. (A node with more information, better information, or more power may be “more equal,” but that is a function of the node, not the network controllers.)

3 Decentralization  A key feature of peer-to-peer networks is decentralization. This has many implications. Robustness, availability of information and fault-tolerance tends to come from redundancy and shared responsibility instead of planning, organization and the investment of a controlling authority.  On the Web both content providers and gateways try to profit by controlling information access. Access control is more difficult in peer-to-peer, although Napster depended on a central index.

4 Technology Transition The Client/Server Model The Peer-to-Peer Model

Classification  Pure P2P vs. Hybrid (servers keep info)  Centralized  Napster  Decentralized  KaZaA  Structured  CAN  Unstructured  Gnutella  Hybrid  JXTA 5

Applications outside Computer Science  Bioinformatics  Education and academic  Military  Business  Television  Telecommunication 6

7 Why Peer-to-Peer Networking?  The Internet has three valuable fundamental assets- information, bandwidth, and computing resources - all of which are vastly under utilized, partly due to the traditional client-server computing model.  Information - Hard to find, impossible to catalog and index  Bandwidth - Hot links get hotter, cold ones stay cold  Computing resources - Heavily loaded nodes get overloaded, idle nodes remain idle

8 Information Gathering  The world produces two exabytes of information (2x10 18 bytes) every year…..out of which  The world publishes 300 terabytes of information (2x10 12 bytes) every year  Google searches 1.3x10 9 pages of data  Data beyond web servers  Transient information Hence, finding useful information in real time is increasingly difficult.

9 Bandwidth Utilization  A single fiber’s bandwidth has increased by a factor of 10 6, doubling every 16 months, since 1975  Traffic is still congested  More devices and people on the net  More volume of data to move around same destinations ( eBay, Yahoo, etc.)

10 Computing Resources  Moore’s Law: processor speed doubles every 18 months  Computing devices ( server, PC, PDA, cellphone) are more powerful than ever  Storage capacity has increased dramatically  Computation still accumulates around data centers

11 Benefits from P2P  T heory Dynamic discovery of information Better utilization of bandwidth, processor, storage, and other resources Each user contributes resources to network  Practice examples Sharing browser cache over 100Mbps lines Disk mirroring using spare capacity Deep search beyond the web

12 Figure 10.1: IP and overlay routing for peer-to-peer

13 Distributed Computation  Only a small portion of the CPU cycles of most computers is utilized. Most computers are idle for the greatest portion of the day, and many of the ones in use spend the majority of their time waiting for input or a response.  A number of projects have attempted to use these idle CPU cycles. The best known is the project, but other projects including code breaking have used idle CPU cycles on distributed machines.

14 Discussion Question: Computer or Infomachine?  The first computers were used primarily for computations. One early use was calculating ballistic tables for the U.S. Navy during World War II.  Today, computers are used more for sharing information than computations—perhaps infomachine may be a more accurate name than computer?  Distributed computation may be better suited to peer-to-peer systems while information tends to be hierarchical and may be better suited to client/server.  NJIT has both Computer Science and Information Systems departments.

Current Peer-Peer Concerns  Topics listed in the IEEE 7 th annual conference: 15

Dangers and Attacks on P2P  Poisoning (files with contents different to description)  Polluting (inserting bad packets into the files)  Defection (users use the service without sharing)  Insertion of viruses (attached to other files)  Malware (originally attached to the files)  Denial of Service (slow down or stop the network traffic)  Filtering (some networks don’t allow P2P traffic)  Identity attacks (tracking down users and disturbing them)  Spam (sending unsolicited information) 16

17 The project  The SETI (Search for Extra Terrestrial Intelligence) project looks for patterns in radio frequency emissions received from radio telescopes that suggest intelligence. This is done by partitioning data received into chunks and sending each chunk to several different computers owned by SETI volunteers for analysis.  Link:

Children of  In 2002, David Anderson, the director of launched the Berkeley Open Infrastructure for Network Computing (BOINC).  There are currently over 40 BOINC projects running to share spare computation on idle CPUs. You can see some of the projects at

 As of September, 2007, the most powerful distributed computing network on Earth is a project to simulate protein folding which can run on Sony Playstation 3 game consoles. At that time, the network reached a capacity of one petaflop (one quadrillion folding point operations per second) on a network of 40,000 game consoles. See

20 Napster  The first large scale peer-to-peer network was Napster, set up in 1999 to share digital music files over the Internet. While Napster maintained centralized (and replicated) indices, the music files were created and made available by individuals, usually with music copied from CDs to computer files. Music content owners sued Napster for copyright violations and succeeded in shutting down the service. Figure 10.2 documents the process of requesting a music file from Napster.

21 Figure 10.2: Napster: peer-to-peer file sharing

22 Napster: Lessons Learned  Napster created a network of millions of people, with thousands of files being transferred at the same time.  There were quality issues. While Napster displayed link speeds to allow users to choose faster downloads, the fidelity of recordings varied widely.  Since Napster users were parasites of the recording companies, there was some central control over selection of music. One benefit was that music files did not need updates.  There was no guarantee of availability for a particular item of music.

23 Middleware for Peer-to-Peer  A key problem in Peer-to-Peer applications is to provide a way for clients to access data resources efficiently. Similar needs in client/server technology led to solutions like NFS. However, NFS relies on pre-configuration and is not scalable enough for peer-to-peer.  Peer clients need to locate and communicate with any available resource, even though resources may be widely distributed and configuration may be dynamic, constantly adding and removing resources and connections.

24 Non-Functional Requirements for Peer-to-Peer Middleware  Global Scalability  Load Balancing  Local Optimization  Adjusting to dynamic host availability  Security of data  Anonymity, deniability, and resistance to censorship (in some applications)

25 Routing Overlays  A routing overlay is a distributed algorithm for a middleware layer responsible for routing requests from any client to a host that holds the object to which the request is addressed.  Any node can access any object by routing each request through a sequence of nodes, exploiting knowledge at each of theme to locate the destination object.  Global User IDs (GUID) also known as opaque identifiers are used as names, but do not contain location information.  A client wishing to invoke an operation on an object submits a request including the object’s GUID to the routing overlay, which routes the request to a node at which a replica of the object resides.

26 Figure 10.3: Distribution of information in a routing overlay

Routing Overlays Basic programming interface for a distributed hash table (DHT) as implemented by the PAST API over Pastry put(GUID, data) The data is stored in replicas at all nodes responsible for the object identified by GUID. remove(GUID) Deletes all references to GUID and the associated data. value = get(GUID) The data associated with GUID is retrieved from one of the nodes responsible it. The DHT layer take responsibility for choosing a location for data item, storing it (with replicas to ensure availability) and providing access to it via get() operation.

Routing Overlays Basic programming interface for distributed object location and routing (DOLR) as implemented by Tapestry publish(GUID) GUID can be computed from the object. This function makes the node performing a publish operation the host for the object corresponding to GUID. unpublish(GUID) Makes the object corresponding to GUID inaccessible. sendToObj(msg, GUID, [n]) Following the object-oriented paradigm, an invocation message is sent to an object in order to access it. This might be a request to open a TCP connection for data transfer or to return a message containing all or part of the object’s state. The final optional parameter [n], if present, requests the delivery of the same message to n replicas of the object. Object can be stored anywhere and the DOLR layer is responsible for maintaining a mapping between GUIDs and the addresses of the nodes at which replicas of the objects are located.

Pastry  All the nodes and objects that can be accessed through Pastry are assigned 128-bit GUIDs.  In a network with N participating nodes, the Pastry routing algorithm will correctly route a message addressed to any GUID in O(logN) steps.  If the GUID identifies a node that is currently active, the message is delivered to that node; otherwise, the message is delivered to the active node whose GUID is numerically closest to it (the closeness referred to here is in an entirely artificial space- the space of GUIDs)

Pastry  When new nodes join the overlay they obtain the data needed to construct a routing table and other required state from existing members in O(logN) messages, where N is the number of hosts participating in the overlay.  In the event of a node failure or departure, the remaning nodes can detect its absence and cooperatively reconfigure to reflect the required changes in the routing structure in a similar number of messages.  Each active node stores a leaf set- a vector L (of size 2l) containing the GUIDs and IP addresses of the nodes whose GUIDs are numerically closet on either side of its own (l above and l below)  The GUID space is treated as circular: GUID 0’s lower neighbor is

Pastry- Routing algorithm  The full routing algorithm involves the use of a routing table at each node to route messages efficiently, but for the purposes of explanation, we describe the routing algorithm in two stages: The first stage decribes a simplified form of the algorithm which routes messages correctly but inefficiently without a routing table The second stage describes the full routing algorithm which routes a request to any node in O(logN) messages.

Pastry- Routing algorithm Stage 1:  Any node A that recieves a message M with destination address D routes the message by comparing D with its own GUID A and with each of the GUIDs in its leaf set and forwarding M to the node amongst them that is numerically closet to D  At each step M is forwarded to node that is closer to D than the current node and that this process will eventually deliver M to the active node closer to D  Very inefficient, requiring ~N/2l hops to deliver a message in a network with N nodes

Pastry- Routing algorithm The diagram illustrates the routing of a message from node 65A1FC to D46A1C using leaf set information alone, assuming leaf sets of size 8 (l=4)

Pastry- Routing algorithm Stage 2:  Each Pastry node maintains a routing table giving GUIDs and IP addresses for a set of nodes spread throughout the entire range of possible GUID values  The routing table is structured as follows: GUIDs are viewed as hexadecimal values and the table classifies GUIDs based on their hexadecimal prefixes  The table has as many rows as there are hexadecimal digits in a GUID, so for the prototype Pastry system that we are describing, there are 128/4 = 32 rows  Any row n contains 15 entries – one for each possible value of the n th hexadecimal digit excluding the value in the local node’s GUID. Each entry in the table points to one of the potentially many nodes whose GUIDs have the relevant prefix

Pastry- Routing algorithm Stage 2 (cont.): The routing table is located at the node whose GUID begins 65A1

Pastry- Routing algorithm Stage 2 (cont.): To handle a message M addressed to a node D (where R[p,i] is the element at column i, row p of the routing table) 1. If (L -l < D < L l ) { //the destination is within the leaf set or is the current node 2. Forward M to the element L i of the leaf set with GUID closest to D or the current node A 3. } else { // use the routing table to despatch M to a node with the closer GUID 4. Find p (the length of the longest common prefix of D and A), and i (the (p+1) th hexadecimal digit of D) 5. If (R[p,i]  null) forward M to R[p,i] //route M to a node with a longer common prefix 6. else { //there is no entry in the routing table 7. Forward M to any node in L and R with a common prefix of length i, but a GUID that is numerically closer. 8. } 9. }

Tapestry  Tapestry is another peer-to-peer model similar to Pastry. It hides a distributed hash table from applications behind a Distributed object location and routing (DOLR) interface to make replicated copies of objects more accessible by allowing multiple entries in the routing structure.  Identifiers are either NodeIds which refer to computers that perform routing actions or GUIDs which refer to the objects.  For any resource with GUID G, there is a unique root node with GUID R G that is numerically closest to G.  Hosts H holding replicas of G periodically invokde publish(G) to ensure that newly arrived hosts become aware of the existence of G. On each invocation, a publish message is routed from the invoker towards node R G.

Tapestry A FE B4F E791 4A6D AA93 57EC 4378 Phil’s Books 4378 Phil’s Books (Root for 4378) publish path Tapestry routings for 4377 Location mapping for 4378 Routes actually taken by send(4378) Replicas of the filePhil’s Books(G=4378) are hosted at nodes 4228 and AA93. Node 4377 is the root node for object The Tapestry routings shown are some of the entries in routing tables. The publish paths show routes followed by the publish messages laying down cached location mappings for object The location mappings are subsequently used to route messages sent to 4378.

Squirrel web cache  The node whose GUID is numerically closest to the GUID of an object becomes that object’s home node, responsible for holding any cached copy of the object.  If the fresh copy of a required object is not in the local cache, Squirrel routes a Get request via Pastry to the home node.  If the home node has a fresh copy it directly responds to the client with a not-modified message.  If the home node has a stale copy or no copy of the object it issues a Get to the origin server. The origin server may respond with a not-modified or a copy of the object.

Squirrel web cache Origin server Home node

Squirrel web cache Evaluation  The reduction in total external bandwidth used: With each client contributing 100MB of disk storage, hit ratio of 28% (36000 active client in Redmond), and 37% (105 active client in Cambridge).  The latency perceived by users for access to web objects: Local transfers take only a few milliseconds, whereas transfers across the Internet require ms => the latency for access to objects found in the cache is swamped by the much greater latency of access to object not found in the cache  The computational and storage load imposed on client nodes: The average number of cache request served for other nodes by each node over the whole period was low at only 0.31 per minute.