Peer to Peer Networks By Cathy Chen CMSC 621, Fall 2007.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
CS522: Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Structured P2P Network Group14: Qiwei Zhang; Shi Yan; Dawei Ouyang; Boyu Sun.
CSE 461 University of Washington1 Topic Peer-to-peer content delivery – Runs without dedicated infrastructure – BitTorrent as an example Peer.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Information-Centric Networks05b-1 Week 5 / Paper 2 A survey of peer-to-peer content distribution technologies –Stephanos Androutsellis-Theotokis, Diomidis.
Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Introduction of P2P systems
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
GNUTELLA PEER-TO-PEER NETWORKING. GNUTELLA n What is Gnutella n Relation to the World Wide Web n How it Works n Sites / Links / Information.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
The Start Shawn Fanning (19-yr-old student nicknamed Napster) developed the original Napster application and service in January 1999 while a freshman.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
FastTrack Network & Applications (KaZaA & Morpheus)
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer to Peer Network Design Discovery and Routing algorithms
Information-Centric Networks Section # 5.2: Content Distribution Instructor: George Xylomenos Department: Informatics.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Malugo – a scalable peer-to-peer storage system..
Peer-to-Peer File Sharing Systems Group Meeting Speaker: Dr. Xiaowen Chu April 2, 2004 Centre for E-transformation Research Department of Computer Science.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Composing Web Services and P2P Infrastructure. PRESENTATION FLOW Related Works Paper Idea Our Project Infrastructure.
Peer-to-Peer Information Systems Week 12: Naming
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
EE 122: Peer-to-Peer (P2P) Networks
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Peer-to-Peer Information Systems Week 12: Naming
#02 Peer to Peer Networking
Presentation transcript:

Peer to Peer Networks By Cathy Chen CMSC 621, Fall 2007

Outline What is P2P Network? Overlay Network centralization Purely Decentralized Architectures Partially Centralized Architectures Hybrid Decentralized Architectures Network Structure Unstructured Structured Unstructured Architectures Hybrid Decentralized Purely Decentralized Partially Centralized Structured Architectures Freenet - A Loosely Structured System Chord Can (Content Addressable Network) Tapestry

CMSC 621, Fall 2007 What is P2P Network? A distributed network composed of a large number of distributed, heterogeneous, autonomous, and highly dynamic peers in which participants share a part of their own resources such as processing power, storage capacity, software, and file contents. The participants in the P2P network can act as a server and a client at the same time. They are accessible by other nodes directly, without passing intermediary entities.

CMSC 621, Fall 2007 P2P Network

CMSC 621, Fall 2007 Peer-to-peer Architectures Although in purest form peer-to-peer overlay networks are supposed to be totally decentralized, in practice this is not always true, and systems with various degrees of centralization are encountered. They are identified into the following three categories: Purely Decentralized architectures Hybrid decentralized architectures Partially Centralized architectures

CMSC 621, Fall 2007 Purely Decentralized Architectures All nodes in the networks perform exactly the same tasks, acting both as servers and clients No central coordination of their activities Ex. Gnutella

CMSC 621, Fall 2007 Purely Decentralized Architectures S1F1.dat S1F2.dat S1F3.dat S2f1.dat S5F1.dat S5F2.txt S5F3.dat S4F1.dat S4F2.dat S4F3.dat S3F1.dat S3F2.dat S3F3.dat S1F1.dat S1F2.dat S1F3.dat S2F1.dat S2F2.dat S2F3.dat S3F1.dat S3F2.dat S3F3.dat S4F1.dat S4F2.dat S4F3.dat S5F1.dat S5F2.dat S5F3.dat Server5 S1F1.dat S1F2.dat S1F3.dat S2F1.dat S2F2.dat S2F3.dat S3F1.dat S3F2.dat S3F3.dat S4F1.dat S4F2.dat S4F3.dat Files available before Server5 joined Files available after Server5 joined joining the network

CMSC 621, Fall 2007 Gnutella : Pure Decentralized System peer-to-peer networking: applications connect to peer applications focus: decentralized method of searching for files each application instance serves to: store selected files route queries (file searches) from and to its neighboring peers respond to queries (serve file) if file stored locally How it works: Searching by flooding: If you don’t have the file you want, query 7 of your partners. If they don’t have it, they contact 7 of their partners, for a maximum hop count of 10. Requests are flooded, but there is no tree structure. No looping but packets may be received twice.

CMSC 621, Fall 2007 Advantages and Disadvantages of Purely Decentralized Systems Advantages Inherently fault-tolerant, since there is no central point of failure and the loss of a peer or even a number of peers can easily be compensated. Have a greater degree of autonomous control over their data and resources Disadvantages Slow discovery time No guarantee about quality of services

CMSC 621, Fall 2007 Hybrid Decentralized Architectures A central directory server maintains an index of the metadata for all files in the network and a table of registered user connection information (IP address, connection bandwidth, et.) Each client who wish to join the network has to contact the central server and report the files it maintains

CMSC 621, Fall 2007 Hybrid Decentralized Architectures Client computers send requests for files to the server. The server searches for matches in its index, returning a list of users that hold the matching file. The user then opens direct connections with one or more of the peers that hold the requested file, and downloads it Ex. Napster

CMSC 621, Fall 2007 Napster Application-level, client-server protocol over point to- point TCP, centralized system Retrieval: four steps Connect to Napster server Upload your list of files (push) to server. Give server keywords to search the full list with. Select “best” of correct answers. (pings) centralized server: single logical point of failure, can load balance among servers using DNS rotation, potential for congestion no security: passwords in plain text, no authentication, no anonymity

CMSC 621, Fall 2007 Napster: How it works?(1) napster.com users File list is uploaded 1.

CMSC 621, Fall 2007 Napster: How it works?(2) napster.com user Request and results User requests search at server. 2.

CMSC 621, Fall 2007 Napster: How it works?(3) napster.com user pings User pings hosts that apparently have data. Looks for best transfer rate. 3.

CMSC 621, Fall 2007 Napster: How it works?(4) napster.com user Retrieves file User retrieves file 4.

CMSC 621, Fall 2007 Advantages and Disadvantages of Hybrid Decentralized Systems Advantages Easy to implement Locate files quickly and efficiently Disadvantages Vulnerable to censorship, legal action, surveillance, malicious attack, and technical failure Considered inherently unscalable as they are bound to be limitations to the size of the server database and its capacity to respond to queries.

CMSC 621, Fall 2007 Partially Centralized Architectures use the concept of supernodes: nodes that are dynamically assigned the task of servicing a small subpart of the peer network by indexing and caching files contained therein. Peers are automatically elected to become supernodes if they have sufficient bandwidth and processing power. Supernodes index the files shared by peers connected to them, and proxy search requests on behalf of these peers. All queries are therefore initially directed to supernodes.

CMSC 621, Fall 2007 Partially Centralized Systems Kazaa and Morpheus are two similar partially centralized systems which use the concept of “SuperNodes”. In Morpheus a central server provides new peers with a list of one or more Supernodes with which they can connect. SuperNodes index the files shared by peers connected to them, and proxy search requestes on behalf of these peers. Queries are therefore sent to SuperNodes, not to other peers. Recent version of the Gnutella protocol. A mechanism for dynamically selecting supernodes organizes the Gnutella network into an interconnection of superpeers (as they are referred to) and client nodes.When a node with enough CPU power joins the network, it immediately becomes a superpeer and establishes connections with other superpeers, forming a flat unstructured network of superpeers. If it establishes a minimum required number of connections to client nodes within a specified time, it remains a superpeer. Otherwise it turns into a regular client node.

CMSC 621, Fall 2007 Advantages of Partially Decentralized Systems Advantages In comparison with purely decentralized systems, discovery time and the traffic on messages exchanging between nodes are reduced In comparison with Hybrid Decentralized Systems, workload on central server are reduced, but with slower information discovery The inherent heterogeneity of peer-to-peer networks is taken advantage of and exploited no unique point of failure If one or more supernodes go down, the nodes connected to them can open new connections with other supernodes, and the network will continue to operate. Peers can become supernodes themselves even when all supernodes go down

CMSC 621, Fall 2007 Discovery Mechanisms for P2P systems Distributed peer-to-peer systems often require a discovery mechanism to locate specific data within the system. P2P systems have evolved from first generation centralized structures to second generation flooding-based and then third generation systems based on distributed hash tables: Centralized indexes and repositories Flooding broadcast of queries Routing Model

CMSC 621, Fall 2007 Centralized Indexes and Repositories Used in hybrid system peers of the community connect to a centralized directory servers, which store all information regarding location and usage of resources A central directory server maintains: an index with meta data (file name, time of creation etc.) of all files in the network a table of registered user connection information (IP addresses, connection speeds etc.) a table listing the files that each user holds and shares in the network

CMSC 621, Fall 2007 Flooding broadcast of queries Simple and robust No state maintenance needed High tolerance to node failures Effective and of low latency Always find the shortest/fastest routing paths Fundamental Operation for Broadcasting in distributed systems P2P communications

CMSC 621, Fall 2007 Problems of Flooding Loops in Gnutella networks Caused by redundant links Result in endless message routing Current solution by Gnutella Detect and discard redundant messages Limit TTL(time-to-live) of messages Unnecessary traffic is till too much The redundant links are still there

CMSC 621, Fall 2007 Routing Model Adds structure to the way information about resources are stored using distributed hash table Provides a mapping between the resource identifier and location Reduces the number of p2p hops that must be taken to locate a resource

CMSC 621, Fall 2007 Chord Provides peer-to-peer hash lookup service: Lookup(key)  IP address Chord does not store the data Efficient: O(Log N) messages per lookup N is the total number of servers Scalable: O(Log N) state per node Robust: survives massive changes in membership

CMSC 621, Fall 2007 Chord: Lookup Mechanism N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19

CMSC 621, Fall 2007 Freenet The query is forwarded from node to node using the routing table, until it reaches the node which has the requested data. The replay is passed back to the original node following the reverse path.

CMSC 621, Fall 2007 Pastry Completely decentralized, scalable, and self- organizing; it automatically adapts to the arrival, departure and failure of nodes. Seeks to minimize the distance messages travel, according to a scalar proximity metric like the number of IP routing hops. In a Pastry network, Each node has a unique id, nodeId. Presented with a message & a key, Pastry node efficiently routes the message to the node with a nodeId that is numerically closest to the key.

CMSC 621, Fall 2007 CAN (Content Addressable Network) The network is created in a tree-like form. Each node is associated to one in the upper level an to a group in the lower level. A query travels from the uppermost level down through the network until a match is found or until it reaches the lowermost level. For its query model, scalability is an issue.

CMSC 621, Fall 2007 Tapestry Self-administered, self-organized, location independent, scalable, fault-tolerant Each node has a neighbor map table with neighbor information.

CMSC 621, Fall 2007 Tapestry Cont. The system is able to adapt to network changes because it algorithms are dynamic. This also provides for Fault-handling

CMSC 621, Fall 2007 When to consider unstructured P2P systems The placement of content (files) is completely unrelated to the overlay topology. In an unstructured network, content typically needs to be located. Searching mechanisms range from brute force methods such as flooding the network with propagating queries in a breadth-first or depth-first manner until the desired content is located, to more sophisticated and resource-preserving strategies that include the use of random walks and routing indices. The searching mechanisms employed in unstructured networks have obvious implications, particularly in regards to matters of availability, scalability, and persistence. Unstructured systems are generally more appropriate for accommodating highly transient node populations. Some representative examples of unstructured systems are Napster, Publius, Gnutella, Kazaa and Freehaven, et.

CMSC 621, Fall 2007 When to consider Structured P2Psystems In structured networks the overlay topology is tightly controlled and files (or pointers to them) are placed at precisely specified locations. These systems essentially provide a mapping between content (e.g. file identifier) and location (e.g. node address), in the form of a distributed routing table, so that queries can be efficiently routed to the node with the desired content. Structured systems offer a scalable solution for exact-match queries, i.e. queries in which the exact identifier of the requested data object is known (as compared to keyword queries). Using exact-match queries as a substrate for keyword queries remains an open research problem for distributed environments A disadvantage of structured systems is that it is hard to maintain the structure required for efficiently routing messages in the face of a very transient node population, in which nodes are joining and leaving at a high rate. Typical examples of structured systems include Chord, CAN, Tapestry et cetera.

CMSC 621, Fall 2007 Conclusion This table shows a classification of peer-to-peer content distribution systems and location and routing infrastructures in terms of their Network structure, with some typical examples.

CMSC 621, Fall 2007 References B. Pourebrahimi, K. bertels and S. Vassiliadis. A survey of Peer-toPeer Networks, Ion Stoica, Robert morris, david Karger, M. Frans Kaashoek and Hari Balakrishnan. Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications, Ismail Guvenc and Juan Jose Urdaneta. Peer to Peer File Sharing: A Survey, Nelson Mina. Distributed Systems Topologies: Part 1, Peer-to-Peer, peer_networkshttp://en.wikipedia.org/wiki/Peer-to-peer#Applications_of_peer-to- peer_networks Stephanos Androutsellis-Theotokis and Diomidis Spinellis. A survey of peer-to- peer content distribution technologies. ACM Computing Surveys, 36(4):335– 371, December 2004.

CMSC 621, Fall 2007 Questions?