An Introduction to Peer-to-Peer Networks Presentation for MIE456 - Information Systems Infrastructure II Vinod Muthusamy October 30, 2003.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
Peer-to-Peer Networks
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Peer-to-Peer Content Sharing. P2P File Sharing Benefits Why use a P2P model for a file sharing application?
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
P2P Network is good or bad? Sang-Hyun Park. P2P Network is good or bad? - Definition of P2P - History of P2P - Economic Impact - Benefits of P2P - Legal.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
1 Client-Server versus P2P  Client-server Computing  Purpose, definition, characteristics  Relationship to the GRID  Research issues  P2P Computing.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
CSE 461 University of Washington1 Topic Peer-to-peer content delivery – Runs without dedicated infrastructure – BitTorrent as an example Peer.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
Introduction Widespread unstructured P2P network
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Popularity-Awareness in Temporal DHT for P2P-based Media Streaming Applications Abhishek Bhattacharya, Zhenyu Yang & Deng Pan IEEE International Symposium.
Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Introduction of P2P systems
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Peer-to-Peer Networks University of Jordan. Server/Client Model What?
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Peer-to-Peer (P2P) networks and applications. What is P2P? r “the sharing of computer resources and services by direct exchange of information”
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Models and Algorithms for Complex Networks Searching in P2P Networks.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Chord Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Bruce Hammer, Steve Wallis, Raymond Ho
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Malugo – a scalable peer-to-peer storage system..
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Distributed Hash Tables (DHT) Jukka K. Nurminen *Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Peer-to-Peer Data Management
EE 122: Peer-to-Peer (P2P) Networks
Presentation transcript:

An Introduction to Peer-to-Peer Networks Presentation for MIE456 - Information Systems Infrastructure II Vinod Muthusamy October 30, 2003

Agenda Overview of P2P Characteristics Benefits Unstructured P2P systems Napster (Centralized) Gnutella (Distributed) Kazaa/Fasttrack (Super-peers) Structured P2P systems (DHTs) Chord Pastry CAN Conclusions

Client/Server Architecture Well known, powerful, reliable server is a data source Clients request data from server Very successful model WWW (HTTP), FTP, Web services, etc. Server Client Internet * Figure from

Client/Server Limitations Scalability is hard to achieve Presents a single point of failure Requires administration Unused resources at the network edge P2P systems try to address these limitations

P2P Computing* P2P computing is the sharing of computer resources and services by direct exchange between systems. These resources and services include the exchange of information, processing cycles, cache storage, and disk storage for files. P2P computing takes advantage of existing computing power, computer storage and networking connectivity, allowing users to leverage their collective power to the ‘benefit’ of all. * From Publications/Peer-to-Peer_Introduction_Feb.ppt

P2P Architecture All nodes are both clients and servers Provide and consume data Any node can initiate a connection No centralized data source “The ultimate form of democracy on the Internet” “The ultimate threat to copy-right protection on the Internet” * Content from Node Internet

P2P Network Characteristics Clients are also servers and routers Nodes contribute content, storage, memory, CPU Nodes are autonomous (no administrative authority) Network is dynamic: nodes enter and leave the network “frequently” Nodes collaborate directly with each other (not through well-known servers) Nodes have widely varying capabilities

P2P Benefits Efficient use of resources Unused bandwidth, storage, processing power at the edge of the network Scalability Consumers of resources also donate resources Aggregate resources grow naturally with utilization Reliability Replicas Geographic distribution No single point of failure Ease of administration Nodes self organize No need to deploy servers to satisfy demand (c.f. scalability) Built-in fault tolerance, replication, and load balancing

P2P Applications Are these P2P systems? File sharing (Napster, Gnutella, Kazaa) Multiplayer games (Unreal Tournament, DOOM) Collaborative applications (ICQ, shared whiteboard) Distributed computation Ad-hoc networks

Popular P2P Systems Napster, Gnutella, Kazaa, Freenet Large scale sharing of files. User A makes files (music, video, etc.) on their computer available to others User B connects to the network, searches for files and downloads files directly from user A Issues of copyright infringement

Napster A way to share music files with others Users upload their list of files to Napster server You send queries to Napster server for files of interest Keyword search (artist, song, album, bitrate, etc.) Napster server replies with IP address of users with matching files You connect directly to user A to download file * Figure from

Napster Central Napster server Can ensure correct results Bottleneck for scalability Single point of failure Susceptible to denial of service Malicious users Lawsuits, legislation Search is centralized File transfer is direct (peer-to-peer)

Gnutella Share any type of files (not just music) Decentralized search unlike Napster You ask your neighbours for files of interest Neighbours ask their neighbours, and so on TTL field quenches messages after a number of hops Users with matching files reply to you * Figure from

Gnutella Decentralized No single point of failure Not as susceptible to denial of service Cannot ensure correct results Flooding queries Search is now distributed but still not scalable

Kazaa (Fasttrack network) Hybrid of centralized Napster and decentralized Gnutella Super-peers act as local search hubs Each super-peer is similar to a Napster server for a small portion of the network Super-peers are automatically chosen by the system based on their capacities (storage, bandwidth, etc.) and availability (connection time) Users upload their list of files to a super-peer Super-peers periodically exchange file lists You send queries to a super-peer for files of interest

Free riding* File sharing networks rely on users sharing data Two types of free riding Downloading but not sharing any data Not sharing any interesting data On Gnutella 15% of users contribute 94% of content 63% of users never responded to a query Didn’t have “interesting” data * Data from E. Adar and B.A. Huberman (2000), “Free Riding on Gnutella”

Anonymity Napster, Gnutella, Kazaa don’t provide anonymity Users know who they are downloading from Others know who sent a query Freenet Designed to provide anonymity among other features

Freenet Data flows in reverse path of query Impossible to know if a user is initiating or forwarding a query Impossible to know if a user is consuming or forwarding data “Smart” queries Requests get routed to correct peer by incremental discovery

Structured P2P Second generation P2P overlay networks Self-organizing Load balanced Fault-tolerant Scalable guarantees on numbers of hops to answer a query Major difference with unstructured P2P systems Based on a distributed hash table interface

Distributed Hash Tables (DHT) Distributed version of a hash table data structure Stores (key, value) pairs The key is like a filename The value can be file contents Goal: Efficiently insert/lookup/delete (key, value) pairs Each peer stores a subset of (key, value) pairs in the system Core operation: Find node responsible for a key Map key to node Efficiently route insert/lookup/delete request to this node

DHT Generic Interface Node id:m-bit identifier (similar to an IP address) Key: sequence of bytes Value:sequence of bytes put(key, value) Store (key,value) at the node responsible for the key value = get(key) Retrieve value associated with key (from the appropriate node)

DHT Applications Many services can be built on top of a DHT interface File sharing Archival storage Databases Naming, service discovery Chat service Rendezvous-based communication Publish/Subscribe

DHT Desirable Properties Keys mapped evenly to all nodes in the network Each node maintains information about only a few other nodes Messages can be routed to a node efficiently Node arrival/departures only affect a few nodes

DHT Routing Protocols DHT is a generic interface There are several implementations of this interface Chord [MIT] Pastry [Microsoft Research UK, Rice University] Tapestry [UC Berkeley] Content Addressable Network (CAN) [UC Berkeley] SkipNet [Microsoft Research US, Univ. of Washington] Kademlia [New York University] Viceroy [Israel, UC Berkeley] P-Grid [EPFL Switzerland] Freenet [Ian Clarke] These systems are often referred to as P2P routing substrates or P2P overlay networks

Chord API Node id:unique m-bit identifier (hash of IP address or other unique ID) Key: m-bit identifier (hash of a sequence of bytes) Value:sequence of bytes API insert(key, value)  store key/value at r nodes lookup(key) update(key, newval) join(n) leave()

Chord Identifier Circle Nodes organized in an identifier circle based on node identifiers Keys assigned to their successor node in the identifier circle Hash function ensures even distribution of nodes and keys on the circle

Chord Finger Table O(logN) table size i th finger points to first node that succeeds n by at least 2 i-1

Chord Key Location Lookup in finger table the furthest node that precedes key Query homes in on target in O(logN) hops

Chord Properties In a system with N nodes and K keys, with high probability… each node receives at most K/N keys each node maintains info. about O(logN) other nodes lookups resolved with O(logN) hops No delivery guarantees No consistency among replicas Hops have poor network locality

Network locality Nodes close on ring can be far in the network. N20 N41 N80 N40 * Figure from

Pastry Similar interface to Chord Considers network locality to minimize hops messages travel New node needs to know a nearby node to achieve locality Each routing hop matches the destination identifier by one more digit Many choices in each hop (locality possible)

CAN Based on a “d-dimensional Cartesian coordinate space on a d-torus” Each node owns a distinct zone in the space Each key hashes to a point in the space

CAN Routing and Node Arrival

P2P Review Two key functions of P2P systems Sharing content Finding content Sharing content Direct transfer between peers All systems do this Structured vs. unstructured placement of data Automatic replication of data Finding content Centralized (Napster) Decentralized (Gnutella) Probabilistic guarantees (DHTs)

Conclusions P2P connects devices at the edge of the Internet Popular in “industry” Napster, Kazaa, etc. allow users to share data Legal issues still to be resolved Exciting research in academia DHTs (Chord, Pastry, etc.) Improve properties/performance of overlays Applications other than file sharing are being developed