A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
A Scalable Content Addressable Network (CAN)
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
1 Client-Server versus P2P  Client-server Computing  Purpose, definition, characteristics  Relationship to the GRID  Research issues  P2P Computing.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Wide-area cooperative storage with CFS
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Information-Centric Networks05b-1 Week 5 / Paper 2 A survey of peer-to-peer content distribution technologies –Stephanos Androutsellis-Theotokis, Diomidis.
1 Plaxton Routing. 2 Introduction Plaxton routing is a scalable mechanism for accessing nearby copies of objects. Plaxton mesh is a data structure that.
Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Introduction of P2P systems
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Peer to Peer Networks By Cathy Chen CMSC 621, Fall 2007.
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
An Introduction to Peer-to-Peer Networks Presentation for MIE456 - Information Systems Infrastructure II Vinod Muthusamy October 30, 2003.
Network Computing Laboratory Scalable File Sharing System Using Distributed Hash Table Idea Proposal April 14, 2005 Presentation by Jaesun Han.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Freenet “…an adaptive peer-to-peer network application that permits the publication, replication, and retrieval of data while protecting the anonymity.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer to Peer Network Design Discovery and Routing algorithms
Information-Centric Networks Section # 5.2: Content Distribution Instructor: George Xylomenos Department: Informatics.
Peer to Peer Computing. What is Peer-to-Peer? A model of communication where every node in the network acts alike. As opposed to the Client-Server model,
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
CS 268: Lecture 22 (Peer-to-Peer Networks)
Peer-to-Peer Data Management
EE 122: Peer-to-Peer (P2P) Networks
A Scalable content-addressable network
An Overview of Peer-to-Peer
Presentation transcript:

A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004 Presenter: Seung-hwan Baek Ja-eun Choi

Outline Overview of P2P – P2P Motivation – P2P Characteristics & Benefits – P2P Application Types P2P Classification – Unstructured: Gnutella, Kazaa, Napster – Structured: Freenet, Chord, CAN, Tapestry Other Aspects Conclusions 2/50

P2P Motivation Client/Server Architecture: Well known, powerful, reliable server is a data source Clients request data from server Very successful model WWW (HTTP), FTP, Web services, etc. 3/50

P2P Motivation (Cont’d) Client/Server Limitation: Scalability is hard to achieve Presents a single point of failure Requires administration Unused resources at the network edge P2P systems try to address these limitations 4/50

P2P Characteristics P2P Computing: P2P computing is the sharing of computer resources and services by direct exchange between systems. These resources and services include the exchange of information, processing cycles, cache storage, and disk storage for files. P2P computing takes advantage of existing computing power, computer storage and networking connectivity, allowing users to leverage their collective power to the ‘benefit’ of all. 5/50

P2P Characteristics (Cont’d) P2P Characteristics: All nodes are both clients and servers – Provide and consume data – Any node can initiate a connection No centralized data source Nodes collaborate directly with each other (not through well-known servers) Network is dynamic Nodes enter and leave the network “frequently” 6/50

P2P Benefits Ease of administration – Nodes self-organize adaptively – No need to deploy servers to satisfy demand (c.f. scalability) – Built-in fault tolerance, replication, and load balancing Scalability – Consumers of resources also donate resources – Aggregate resources grow naturally with utilization Reliability – Geographic distribution – No single point of failure 7/50

P2P Application Types Direct real-time communication: instant messaging Combine processing power of multiple distributed machines to perform complex computations: analysis of SETI data, prime computation Distributed database systems Store and distribute digital content: mp3 file sharing (Content Distribution) 8/50

P2P Classification Architecture Types: Unstructured Structured Loosely structured Here, By structure, we refer to whether overlay network is created non-deterministically or whether it’s created based on a specific rules 9/50

P2P Classification (Cont’d) 10/50 Unstructured Loosely Structured Highly Structured HybridNapster, IM PartialKazaa, Gia NoneGnutellaFreenetChord, CAN Centralization Data organization

Unstructured Architectures Placement of content is unrelated to overlay topology Search mechanism is required. Appropriate for case of highly-transient node population Degrees of centralization: Purely Decentralized Partially Centralized Hybrid Decentralized 11/50

Purely Decentralized 12/50 Purely Decentralized – No central coordination – Users (servents) connect to each other directly. Gnutella architecture – Query: Flooding Send messages to all neighbors – Response: Route back Scalability Issues – With TTL, virtual horizon – Without TTL, unlimited flooding E.g., Gnutella, FreeHaven registration query reply request download query

Partially Centralized 13/50 Partially Centralized – Supernodes Indexing & caching files of small subpart of the peer network Peers are automatically elected to become supernodes. Advantages – Reduced discovery time – Normal nodes will be lightly loaded. E.g., Kazaa, Edutella, Gnutella (later version) registration query reply queryreply request download

Hybrid Decentralized 14/50 Hybrid Decentralized – Central directory server User connection info. File & metadata info. Advantages – Simple to implement – Locate files quickly and efficiently Disadvantages – Vulnerable to technical failure – Inherently unscalable E.g., Napster, Publius resigtration query reply request download

Outline Overview of P2P – P2P Motivation – P2P Characteristics & Benefits – P2P Application Types P2P Classification – Unstructured: Gnutella, Kazaa, Napster – Structured: Freenet, Chord, CAN, Tapestry Other Aspects Conclusions 15/50

Structured Architectures Features – Mapping of content and location – Scalable solution for exact-match queries Examples – Freenet – Chord – CAN – Tapestry

Freenet Loosely Structured System – Chain mode propagation Each node – Local data store – Dynamic routing table ( node address, file key ) Each file – Unique binary key

Freenet (Cont’d) Messages – Node ID, Timeout, Src ID, Dst ID Message types – Data insert : key, data – Data request : key – Data reply : file – Data filed : failure location, reason

Freenet (Cont’d) Data Insert – Calculates a binary key – Sends a data insert message to itself Receiving a Data Insert message – If not taken Store the data Forwards to the closest key’s owner – If taken Returns the preexisting file

Freenet (Cont’d) Data Request – Chain mode propagation Receiving a Data Request – If locally stored The search stops and the data is forwarded back – If not Forwards to the closest key’s owner

Freenet (Cont’d) Data Fail – Timeout (hops-to-live) Receiving a Data Failed Message – Forwards the request to the next best node – After failed through all neighbors, Sends back data filed message to the request sender

Freenet (Cont’d) Data Reply – Includes the actual data – Passed back through the chain – The data is cached in all intermediate nodes A subsequent request w/ the same key → served immediately A request for a similar key → forwarded to the node that previously provided the data

Freenet (Cont’d) Indirect Files – A special class of lightweight files – Named according to search keywords – Contain pointers to the real file – Multiple files w/ the same key

Freenet (Cont’d) Indirect Files

Freenet (Cont’d) Properties – Nodes specialize in searching for similar keys – Nodes store similar keys – Similarity of keys does not reflect similarity of files – Routing does not reflect the underlying network topology

Chord Nodes and Files are identified by keys – m-bit identifiers – a deterministic hash function Mapping File ID onto Node ID – Nodes store (key, data item) pairs

Chord (Cont’d) A Chord Identifier Circle

Chord (Cont’d) Simple Key Location

Chord (Cont’d) Scalable Key Location

Chord (Cont’d) Simple Key Location – Routing Information: Successor pointer – O( n ) Scalable Key Location – Routing Information: Finger Table – O( logn )

Chord (Cont’d) Node Joining – Certain keys assigned to its successor are reassigned to it Node Departing – Keys are reassigned to its successor

Chord (Cont’d) Node Joining – N26 joins the network

CAN Content Addressable Network Hash Table – Maps file names to their location – ( key K, value V ) pairs stored – Each node storing a part of the hash table A “zone”

CAN (Cont’d) Virtual coordinate space – A zone corresponds to a segment of space – Key K is mapped onto a point P A deterministic function – ( K, V ) is stored at the node responsible for P

CAN (Cont’d) Virtual coordinate space

CAN (Cont’d) Retrieve – Map K to P – Retrieve the value from the node covering P Routing – Request is routed to the node covering P – Nodes maintain a routing table Addresses of Nodes holding adjoining zones – Following the straight line path in the space

CAN (Cont’d) Routing

CAN (Cont’d) Node Joining – Allocatedits own portion of the space By splitting the zone of an existing node Node Departing – Hand over hash table entries to one of its neighbors

Tapestry Location and Routing Infrastructure – Self Administeration – Fault Tolerance – Stability By bypassing failed routes and nodes Plaxton Mesh – Routing mechanism – Location mechanism

Tapestry (Cont’d) Routing Mechanism – Neighbor Maps Local routing maps Incrementally route messages Multiple levels – Level l → node ID matched w/ l digits Multiple entries – The number equals to the base of the ID Pointer to the closest node in the network

Tapestry (Cont’d) Neighbor Map of Node w/ ID 67493

Tapestry (Cont’d) Routing Path from to – xxxx7 → xxx67 → xx567 → x4567→ 34567

Tapestry (Cont’d) Location Mechanism – Root node Provide a guaranteed node from which the object can be located Assigned when an object is inserted – A globally consistent deterministic algorithm – When inserted Server node Ns, object O, root node Nr Message routed to Ns to Nr (O, Ns) stored along the routing path

Tapestry (Cont’d) Location Mechanism – Location query Messages destined for O Initially routed toward to Nr Meet a node containing (O, Ns) mapping

Tapestry (Cont’d) Advantages of Plexton Mesh – Simple fault-handling Routing by choosing a node w/ a similar suffix – Scalability w/ the only bottleneck (root nodes) Limitations – The need for global knowledge Assigning and identifying root nodes – The vulnerability of the root nodes

Tapestry (Cont’d) Extending Plaxton mesh’s Design – Plaxton mesh assumes a static node population – Tapestry adapts it to the transient population Adaptibility Fault tolerance Optimizations

Tapestry (Cont’d) Optimizations – Back-pointers for dynamic node insertion – Flexible concept of distance between nodes – Maintain cached content for failures – Multiple roots to each object – Adapt to environment changes

Other Aspects Content Caching, Replication and Migration Security Provisions for Anonymity Provisions for Deniability Incentive Mechanisms and Accountability Resource Management Capability Semantic Grouping of Information

Conclusions Study of P2P Content Distribution Systems – Properties – Design features Location and routing algorithms – Two Categories Unstructured system Structured system – Remains Open Research Problems