Peer-to-Peer (P2P) networks and applications. What is P2P? r “the sharing of computer resources and services by direct exchange of information”

Slides:



Advertisements
Similar presentations
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Advertisements

Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Chapter 2 Application Layer Computer Networking: A Top Down Approach, 5 th edition. Jim Kurose, Keith Ross Addison-Wesley, April A note on the use.
Application Layer 2-1 Chapter 2 Application Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
No Class on Friday There will be NO class on: FRIDAY 1/30/15.
An Overview of Peer-to-Peer. What is Peer-to-Peer (P2P)?
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Peer-to-Peer Content Sharing. P2P File Sharing Benefits Why use a P2P model for a file sharing application?
2: Application Layer P2P applications and Sockets.
Beyond Napster: An Overview of Peer-to-Peer Systems and Applications Sami Rollins.
1 Client-Server versus P2P  Client-server Computing  Purpose, definition, characteristics  Relationship to the GRID  Research issues  P2P Computing.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
CSE 124 Networked Services Fall 2009 B. S. Manoj, Ph.D 11/03/2009CSE 124 Network Services FA 2009 Some of these.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Peer-peer and Application-level Networking CS 218 Fall 2003 Multicast Overlays P2P applications Napster, Gnutella, Robust Overlay Networks Distributed.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
Data Communications and Computer Networks Chapter 2 CS 3830 Lecture 10 Omar Meqdadi Department of Computer Science and Software Engineering University.
By Shobana Padmanabhan Sep 12, 2007 CSE 473 Class #4: P2P Section 2.6 of textbook (some pictures here are from the book)
Application Layer – Peer-to-peer UIUC CS438: Communication Networks Summer 2014 Fred Douglas Slides: Fred, Kurose&Ross (sometimes edited)
Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
Content Distribution March 6, : Application Layer1.
1 P2P Computing. 2 What is P2P? Server-Client model.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Peer-to-Peer Networking. Presentation Introduction Characteristics and Challenges of Peer-to-Peer Peer-to-Peer Applications Classification of Peer-to-Peer.
Introduction of P2P systems
Peer-to-Peer Networks University of Jordan. Server/Client Model What?
Chapter 2: Application layer
2: Application Layer1 Chapter 2: Application layer r 2.1 Principles of network applications r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail  SMTP,
P2P Networking and Content Distribution
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
An Introduction to Peer-to-Peer Networks Presentation for MIE456 - Information Systems Infrastructure II Vinod Muthusamy October 30, 2003.
Application Layer 2-1 Chapter 2 Application Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012.
1 Peer-to-Peer Systems r Application-layer architectures r Case study: BitTorrent r P2P Search and Distributed Hash Table (DHT)
2: Application Layer1 Chapter 2: Application layer r 2.1 Principles of network applications  app architectures  app requirements r 2.2 Web and HTTP r.
Peer-to-Peer File Sharing Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Peer-to-Peer Networks Hongli Luo CEIT, IPFW. r Topics m Application architecture m P2P file sharing m P2P networks: Napster Gnutella KaAzA Bittorrent.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
2: Application Layer1 Chapter 2 Application Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012.
2: Application Layer 1 Chapter 2 Application Layer Computer Networking: A Top Down Approach, 5 th edition. Jim Kurose, Keith Ross Addison-Wesley, April.
Computer Networks CSE 434 Fall 2009 Sandeep K. S. Gupta Arizona State University Research Experience.
Peer-to-Peer File Sharing
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
1 A Measurement Study of Peer-to-Peer File Sharing Systems by Stefan Saroiu P. Krishna Gummadi Steven D. Gribble Presentation by Nanda Kishore Lella
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
PEAR TO PEAR PROTOCOL. Pure P2P architecture no always-on server arbitrary end systems directly communicate peers are intermittently connected and change.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
05 - P2P applications and Sockets
An example of peer-to-peer application
Part 4: Peer to Peer - P2P Applications
An Overview of Peer-to-Peer
Pure P2P architecture no always-on server
Presentation transcript:

Peer-to-Peer (P2P) networks and applications

What is P2P? r “the sharing of computer resources and services by direct exchange of information”

What is P2P? r “P2P is a class of applications that take advantage of resources – storage, cycles, content, human presence – available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable and unpredictable IP addresses P2P nodes must operate outside the DNS system and have significant, or total autonomy from central servers”

What is P2P? r “A distributed network architecture may be called a P2P network if the participants share a part of their own resources. These shared resources are necessary to provide the service offered by the network. The participants of such a network are both resource providers and resource consumers”

What is P2P? r Various definitions seem to agree on  sharing of resources  direct communication between equals (peers)  no centralized control

What is a peer? r “…an entity with capabilities similar to other entities in the system.”

Client/Server Architecture r Well known, powerful, reliable server is a data source r Clients request data from server r Very successful model  WWW (HTTP), FTP, Web services, etc. Server Client Internet * Figure from

Client/Server Limitations r Scalability is hard to achieve r Presents a single point of failure r Requires administration r Unused resources at the network edge r P2P systems try to address these limitations

P2P Architecture r All nodes are both clients and servers  Provide and consume data  Any node can initiate a connection r No centralized data source  “The ultimate form of democracy on the Internet”  “The ultimate threat to copy-right protection on the Internet” * Content from Node Internet

P2P Network Characteristics r Clients are also servers and routers  Nodes contribute content, storage, memory, CPU r Nodes are autonomous (no administrative authority) r Network is dynamic: nodes enter and leave the network “frequently” r Nodes collaborate directly with each other (not through well-known servers) r Nodes have widely varying capabilities

P2P Goals and Benefits r Efficient use of resources  Unused bandwidth, storage, processing power at the “edge of the network” r Scalability  No central information, communication and computation bottleneck  Aggregate resources grow naturally with utilization r Reliability  Replicas  Geographic distribution  No single point of failure r Ease of administration  Nodes self-organize  Built-in fault tolerance, replication, and load balancing  Increased autonomy r Anonymity – Privacy  not easy in a centralized system r Dynamism  highly dynamic environment  ad-hoc communication and collaboration

What is Peer-to-Peer (P2P)?

P2P Applications r File sharing (Napster, Gnutella, Kazaa) r Multiplayer games (Unreal Tournament, DOOM) r Collaborative applications (ICQ, shared whiteboard) r Distributed computation r Ad-hoc networks

P2P System Taxonomy r Historic r Data-centric r Computation-centric r User-centric r Network-centric r Platforms

Computation-centric

User-centric sendMessagereceiveMessagesendMessagereceiveMessage

User-centric (Common implementation) sendMessagereceiveMessagesendMessagereceiveMessage

Network-centric

Platforms Find Peers…Send Messages GnutellaInstant Messaging

P2P Goals/Benefits r Cost sharing r Resource aggregation r Improved scalability/reliability r Increased autonomy r Anonymity/privacy r Dynamism r Ad-hoc communication

P2P Challenges r Decentralization r Scalability and Performance r Anonymity r Fairness r Dynamism r Security r Transparency r Fault Resilience and Robustness

Peer-to-Peer Content Sharing

Popular file sharing P2P Systems r Napster, Gnutella, Kazaa, Freenet r Large scale sharing of files.  User A makes files (music, video, etc.) on their computer available to others  User B connects to the network, searches for files and downloads files directly from user A r Issues of copyright infringement

Research Areas r Peer discovery and group management r Data placement and searching r Reliable and efficient file exchange r Security/privacy/anonymity/trust

Design Concerns r Group Management  Per-node state  Load balancing  Fault tolerance/resiliency r Search  Bandwidth usage  Time to locate item  Success rate  Fault tolerance/resiliency

Approaches r Centralized r Unstructured r Structured (Distributed Hash Tables)

Centralized index original “Napster” design 1) when peer connects, it informs central server:  IP address  content 2) Alice queries for “Hey Jude” 3) Alice requests file from Bob centralized directory server peers Alice Bob

Centralized model BobAlice JaneJudy file transfer is decentralized, but locating content is highly centralized

Centralized r Benefits:  Low per-node state  Limited bandwidth usage  Short location time  High success rate  Fault tolerant r Drawbacks:  Single point of failure  Limited scale  Possibly unbalanced load r copyright infringement BobAlice JaneJudy

Napster r program for sharing files over the Internet r a “disruptive” application/technology? r history:  5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service  12/99: first lawsuit  3/00: 25% UWisc traffic Napster  2000: est. 60M users  2/01: US Circuit Court of Appeals: Napster knew users violating copyright laws  7/01: # simultaneous online users: Napster 160K, Gnutella: 40K, Morpheus: 300K

Napster: how does it work Application-level, client-server protocol over point-to- point TCP Four steps: r Connect to Napster server r Upload your list of files (push) to server. r Give server keywords to search the full list with. r Select “best” of correct answers. (pings)

Napster napster.com users File list is uploaded 1.

Napster napster.com user Request and results User requests search at server. 2.

Napster napster.com user pings User pings hosts that apparently have data. Looks for best transfer rate. 3.

Napster napster.com user Retrieves file User retrieves file 4.

Napster r Central Napster server  Can ensure correct results  Fast search  Bottleneck for scalability  Single point of failure  Susceptible to denial of service Malicious users Lawsuits, legislation r Hybrid P2P system – “all peers are equal but some are more equal than others”  Search is centralized  File transfer is direct (peer-to-peer)

Unstructured r fully distributed  no central server r used by Gnutella r Each peer indexes the files it makes available for sharing (and no other files) overlay network: graph r edge between peer X and Y if there’s a TCP connection r all active peers and edges form overlay net r edge: virtual (not physical) link r given peer typically connected with < 10 overlay neighbors

Gnutella: Query flooding Query QueryHit Query QueryHit Query QueryHit File transfer: HTTP r Query message sent over existing TCP connections r peers forward Query message r QueryHit sent over reverse path

Gnutella: Peer joining 1. joining peer Alice must find another peer in Gnutella network: use list of candidate peers 2. Alice sequentially attempts TCP connections with candidate peers until connection setup with Bob 3. Flooding: Alice sends Ping message to Bob; Bob forwards Ping message to his overlay neighbors (who then forward to their neighbors….) r peers receiving Ping message respond to Alice with Pong message 4. Alice receives many Pong messages, and can then setup additional TCP connections

Unstructured Bob Alice Jane Judy Carl Scalability: limited scope flooding

Unstructured r Gnutella model r Benefits:  Limited per-node state  Fault tolerant r Drawbacks:  High bandwidth usage  Long time to locate item  No guarantee on success rate  Possibly unbalanced load Bob Alice Jane Judy Carl

Gnutella Searching by flooding: r If you don’t have the file you want, query 7 of your neighbors. r If they don’t have it, they contact 7 of their neighbors, for a maximum hop count of 10. r Requests are flooded, but there is no tree structure. r No looping but packets may be received twice. r Reverse path forwarding * Figure from

Gnutella fool.* ? TTL = 2

Gnutella TTL = 1 IP X :fool.her fool.her X TTL = 1

Gnutella fool.you fool.me Y IP Y :fool.me fool.you

Gnutella IP Y :fool.me fool.you

Gnutella: strengths and weaknesses r pros:  flexibility in query processing  complete decentralization  simplicity  fault tolerance/self-organization r cons:  severe scalability problems  susceptible to attacks r Pure P2P system

Gnutella: initial problems and fixes r 2000: avg size of reachable network only hosts. Why so small?  modem users: not enough bandwidth to provide search routing capabilities: routing black holes r Fix: create peer hierarchy based on capabilities  previously: all peers identical, most modem black holes  preferential connection: favors routing to well-connected peers favors reply to clients that themselves serve large number of files: prevent freeloading

Structured r FreeNet, Chord, CAN, Tapestry, Pastry model ?

Structured r FreeNet, Chord, CAN, Tapestry, Pastry model r Benefits:  Manageable per-node state  Manageable bandwidth usage and time to locate item  Guaranteed success r Drawbacks:  Possibly unbalanced load  Harder to support fault tolerance ?

Unstructured vs Structured P2P r The systems we described do not offer any guarantees about their performance (or even correctness) r Structured P2P  Scalable guarantees on numbers of hops to answer a query  Maintain all other P2P properties (load balance, self-organization, dynamic nature) r Approach: Distributed Hash Tables (DHT)

Distributed Hash Tables (DHT) r Distributed version of a hash table data structure r Stores (key, value) pairs  The key is like a filename  The value can be file contents, or pointer to location r Goal: Efficiently insert/lookup/delete (key, value) pairs r Each peer stores a subset of (key, value) pairs in the system r Core operation: Find node responsible for a key  Map key to node  Efficiently route insert/lookup/delete request to this node r Allow for frequent node arrivals/departures

DHT Desirable Properties r Keys should mapped evenly to all nodes in the network (load balance) r Each node should maintain information about only a few other nodes (scalability, low update cost) r Messages should be routed to a node efficiently (small number of hops) r Node arrival/departures should only affect a few nodes

DHT Routing Protocols r DHT is a generic interface r There are several implementations of this interface  Chord [MIT]  Pastry [Microsoft Research UK, Rice University]  Tapestry [UC Berkeley]  Content Addressable Network (CAN) [UC Berkeley]  SkipNet [Microsoft Research US, Univ. of Washington]  Kademlia [New York University]  Viceroy [Israel, UC Berkeley]  P-Grid [EPFL Switzerland]  Freenet [Ian Clarke]

Basic Approach In all approaches: r keys are associated with globally unique IDs  integers of size m (for large m) r key ID space (search space) is uniformly populated - mapping of keys to IDs using (consistent) hashing r a node is responsible for indexing all the keys in a certain subspace (zone) of the ID space r nodes have only partial knowledge of other node’s responsibilities

Improvements: SuperPeers r KaZaA model r Hybrid centralized and unstructured r Advantages and disadvantages?

Hierarchical Overlay r between centralized index, query flooding approaches r each peer is either a super node or assigned to a super node  TCP connection between peer and its super node.  TCP connections between some pairs of super nodes. r Super node tracks content in its children

Kazaa (Fasttrack network) r Hybrid of centralized Napster and decentralized Gnutella  hybrid P2P system r Super-peers act as local search hubs  Each super-peer is similar to a Napster server for a small portion of the network  Super-peers are automatically chosen by the system based on their capacities (storage, bandwidth, etc.) and availability (connection time) r Users upload their list of files to a super-peer r Super-peers periodically exchange file lists r You send queries to a super-peer for files of interest

File Distribution: Server-Client vs P2P Question : How much time to distribute file from one server to N peers? usus u2u2 d1d1 d2d2 u1u1 uNuN dNdN Server Network (with abundant bandwidth) File, size F u s : server upload bandwidth u i : peer i upload bandwidth d i : peer i download bandwidth

File distribution time: server-client usus u2u2 d1d1 d2d2 u1u1 uNuN dNdN Server Network (with abundant bandwidth) F r server sequentially sends N copies:  NF/u s time r client i takes F/d i time to download increases linearly in N (for large N) = d cs = max { NF/u s, F/min(d i ) } i Time to distribute F to N clients using client/server approach

File distribution time: P2P usus u2u2 d1d1 d2d2 u1u1 uNuN dNdN Server Network (with abundant bandwidth) F r server must send one copy: F/u s time r client i takes F/d i time to download r NF bits must be downloaded (aggregate)  fastest possible upload rate: u s +  u i d P2P = max { F/u s, F/min(d i ), NF/(u s +  u i ) } i

Server-client vs. P2P: example Client upload rate = u, F/u = 1 hour, u s = 10u, d min ≥ u s

File distribution: BitTorrent tracker: tracks peers participating in torrent torrent: group of peers exchanging chunks of a file obtain list of peers trading chunks peer r P2P file distribution

BitTorrent (1) r file divided into 256KB chunks. r peer joining torrent:  has no chunks, but will accumulate them over time  registers with tracker to get list of peers, connects to subset of peers (“neighbors”) r while downloading, peer uploads chunks to other peers. r peers may come and go r once peer has entire file, it may (selfishly) leave or (altruistically) remain

BitTorrent (2) Pulling Chunks r at any given time, different peers have different subsets of file chunks r periodically, a peer (Alice) asks each neighbor for list of chunks that they have. r Alice sends requests for her missing chunks  rarest first Sending Chunks: tit-for-tat r Alice sends chunks to four neighbors currently sending her chunks at the highest rate  re-evaluate top 4 every 10 secs r every 30 secs: randomly select another peer, starts sending chunks  newly chosen peer may join top 4  “optimistically unchoke”

BitTorrent: Tit-for-tat (1) Alice “optimistically unchokes” Bob (2) Alice becomes one of Bob’s top-four providers; Bob reciprocates (3) Bob becomes one of Alice’s top-four providers With higher upload rate, can find better trading partners & get file faster!

P2P: searching for information File sharing (eg e-mule) r Index dynamically tracks the locations of files that peers share. r Peers need to tell index what they have. r Peers search index to determine where files can be found Instant messaging r Index maps user names to locations. r When user starts IM application, it needs to inform index of its location r Peers search index to determine IP address of user. Index in P2P system: maps information to peer location (location = IP address & port number).

P2P Case study: Skype r inherently P2P: pairs of users communicate. r proprietary application-layer protocol (inferred via reverse engineering) r hierarchical overlay with SNs r Index maps usernames to IP addresses; distributed over SNs Skype clients (SC) Supernode (SN) Skype login server

Anonymity r Napster, Gnutella, Kazaa don’t provide anonymity  Users know who they are downloading from  Others know who sent a query r Freenet  Designed to provide anonymity among other features

Freenet r Keys are mapped to IDs r Each node stores a cache of keys, and a routing table for some keys  routing table defines the overlay network r Queries are routed to the node with the most similar key r Data flows in reverse path of query  Impossible to know if a user is initiating or forwarding a query  Impossible to know if a user is consuming or forwarding data  Keys replicated in (some) of the nodes along the path

Freenet routing

N01 N02 N03 N04 N07 N06 N05 N08 K117 N08 K124 N02 K317 N08 K613 N05 K514 N01 K117 N08 K100 N07 K222 N06 K617 N Inserts are performed similarly – they are unsuccessful queries

Freenet strengths and weaknesses r pros:  complete decentralization  fault tolerance/self-organization  anonymity  scalability (to some degree) r cons:  questionable efficiency & performance  rare keys disappear from the system  improvement of performance requires high overhead (maintenance of additional information for routing)

P2P Review r Two key functions of P2P systems  Sharing content  Finding content r Sharing content  Direct transfer between peers All systems do this  Structured vs. unstructured placement of data  Automatic replication of data r Finding content  Centralized (Napster)  Decentralized (Gnutella)  Probabilistic guarantees (DHTs)

Issues with P2P r Free Riding (Free Loading)  Two types of free riding Downloading but not sharing any data Not sharing any interesting data  On Gnutella 15% of users contribute 94% of content 63% of users never responded to a query –Didn’t have “interesting” data r No ranking: what is a trusted source?  “spoofing”