Peer-to-Peer Networks

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

2/66 GET /index.html HTTP/1.0 HTTP/ OK... Clients Server.
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
CompSci 356: Computer Network Architectures Lecture 21: Content Distribution Chapter 9.4 Xiaowei Yang
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #13: P2P and Sensor Networks Shivkumar Kalyanaraman:
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
FRIENDS: File Retrieval In a dEcentralized Network Distribution System Steven Huang, Kevin Li Computer Science and Engineering University of California,
CSE331: Introduction to Networks and Security Lecture 14 Fall 2002.
P2P Network is good or bad? Sang-Hyun Park. P2P Network is good or bad? - Definition of P2P - History of P2P - Economic Impact - Benefits of P2P - Legal.
Peer-to-Peer Computing
Data Management in Peer-to- Peer Systems Qi Sun Beverly Yang.
1 Client-Server versus P2P  Client-server Computing  Purpose, definition, characteristics  Relationship to the GRID  Research issues  P2P Computing.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Information-Centric Networks05b-1 Week 5 / Paper 2 A survey of peer-to-peer content distribution technologies –Stephanos Androutsellis-Theotokis, Diomidis.
Popularity-Awareness in Temporal DHT for P2P-based Media Streaming Applications Abhishek Bhattacharya, Zhenyu Yang & Deng Pan IEEE International Symposium.
Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
1 P2P Computing. 2 What is P2P? Server-Client model.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Peer-to-Peer Networking. Presentation Introduction Characteristics and Challenges of Peer-to-Peer Peer-to-Peer Applications Classification of Peer-to-Peer.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Introduction of P2P systems
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Peer-to-Peer Networks University of Jordan. Server/Client Model What?
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
An Introduction to Peer-to-Peer Networks Presentation for MIE456 - Information Systems Infrastructure II Vinod Muthusamy October 30, 2003.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Hongil Kim E. Chan-Tin, P. Wang, J. Tyra, T. Malchow, D. Foo Kune, N. Hopper, Y. Kim, "Attacking the Kad Network - Real World Evaluation and High.
Peer-to-Peer (P2P) networks and applications. What is P2P? r “the sharing of computer resources and services by direct exchange of information”
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer to Peer Network Design Discovery and Routing algorithms
Peer-to-Peer Systems: An Overview Hongyu Li. Outline  Introduction  Characteristics of P2P  Algorithms  P2P Applications  Conclusion.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Bruce Hammer, Steve Wallis, Raymond Ho
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
CHAPTER 3 Architectures for Distributed Systems
EE 122: Peer-to-Peer (P2P) Networks
CS 162: P2P Networks Computer Science Division
An Overview of Peer-to-Peer
Presentation transcript:

Peer-to-Peer Networks Thanks to: Vinod Muthusam, U. of Toronto Jon Kubiatowicz, UC Berkeley Don Towsley, U. Mass at Amherst Mema Roussopoulos, Harvard University

What is P2P? P2P is a communications model in which each party has the same capabilities and either party can initiate a communication session. Whatis.com P2P is a class of applications that takes advantage of resources – storage, cycles, content, human presence – available at the edges of the internet. Clay Shirky, openp2p.com A type of network in which each workstation has equivalent capabilities and responsibilities. Webopedia.com A P2P computer network refers to any network that does not have fixed clients and servers, but a number of peer nodes that function as both clients and servers to other nodes on the network. Wikipedia.org

P2P is not new! Usenet: News groups first truly decentralized system DNS: Handles huge number of clients IP routing: Vastly decentralized, many equivalent routers

P2P is not new! Usenet: News groups first truly decentralized system DNS: Handles huge number of clients IP routing: Vastly decentralized, many equivalent routers

When is an application P2P? We will do an analysis based on a decision tree developed by researchers at Harvard, Stanford, Berkeley, and HP Labs 2 P2P or Not 2 P2P (IPTPS 2004) Exercise for end of class: critique and modify the decision tree

Recent Explosion of New Large Scale Applications Applications requiring immense resources CPU: Grid and grid computing Files and information: Music sharing, semantic web Bandwidth: Video streaming, content distribution Communication: IP telephony, group collaboration Storage: Data archives, massive storage Thousands to millions of nodes on the edge of the Internet can participate as sources/donors and as receivers/users.

Large scale storage applications: Web indexing (Google) Goal: index the entire Web Estimate: Google has 250,000 node cluster! Massively distributed Crawler Indexer Client Store(url, page) Index(page, keywords) Find(keywords) Distributed File System * Partial content from http://project-iris.net/talks/dht-toronto-03.ppt

Large scale storage applications: Web archives Goal: make and archive a daily check point of the Web Estimates: Web is about 57 Tbyte, compressed HTML+img New data per day: 580 Gbyte 128 Tbyte per year with 5 replicas Design: 12,810 nodes: 100 Gbyte disk each Crawler Client Store(url, page, date) Get(url, date) Distributed File System * Partial content from http://project-iris.net/talks/dht-toronto-03.ppt

Large scale storage applications: File storage OceanStore (UC Berkeley) Untrusted Infrastructure: The OceanStore is comprised of untrusted components Individual hardware has finite lifetimes All data encrypted within the infrastructure Responsible Party: Some organization (i.e. service provider) guarantees that your data is consistent and durable Not trusted with content of data, merely its integrity Mostly Well-Connected: Data producers and consumers are connected to a high-bandwidth network most of the time Exploit multicast for quicker consistency when possible Promiscuous Caching: Data may be cached anywhere, anytime

Utility-based Infrastructure Pac Bell Sprint IBM AT&T Canadian OceanStore Utility-based Infrastructure Data service provided by storage federation Cross-administrative domain Pay for Service

Large scale scientific applications SETI@Home and other projects at 500,000 BOINC volunteers Featured volunteer: Work done per day 6,514 (65 GigaFLOPS)

GRIDs and desktop grids GRID computing connects supercomputing labs (large parallel machines and databases), primarily for scientific computing Desktop grids use PCs for cycle sharing (dedicated or on the edge of the Internet) CCOF: Cluster Computing on the Fly

WaveGrid, Riding the wave of idle cycles 

Peer-to-peer for large systems Limitations of client/server architecture Benefits of P2P History of P2P systems P2P architectures P2P issues

Client/server architecture Well known, powerful, reliable server is a data source Clients request data from server Very successful model WWW (HTTP), FTP, Web services, etc. Server Client Internet * Figure from http://project-iris.net/talks/dht-toronto-03.ppt

Client/server limitations Scalability is expensive Presents a single point of failure Requires administration Unused resources at the network edge P2P systems try to address these limitations

P2P vocabulary P2P application P2P architecture P2P computing P2P network Peer-based v. P2P Terms are used interchangeably, sometimes sloppily but there are subtle differences in meaning.

P2P computing P2P computing is the sharing of computer resources and services by direct exchange between systems. These resources and services include the exchange of information, processing cycles, cache storage, and disk storage for files. P2P computing takes advantage of existing computing power, computer storage and networking connectivity, allowing users to leverage their collective power to the ‘benefit’ of all. * From http://www-sop.inria.fr/mistral/personnel/Robin.Groenevelt/ Publications/Peer-to-Peer_Introduction_Feb.ppt

P2P architecture All nodes are both clients and servers Provide and consume data Any node can initiate a connection No centralized data source “The ultimate form of democracy on the Internet” “The ultimate threat to copy-right protection on the Internet” Node Internet * Content from http://project-iris.net/talks/dht-toronto-03.ppt

P2P benefits Efficient use of resources Scalability Unused bandwidth, storage, processing power at the edge of the network Scalability Consumers of resources also donate resources Aggregate resources grow naturally with utilization Organic scaling Infrastructure-less scaling Reliability (in aggregate) Replicas Geographic distribution No single point of failure Ease of administration Nodes self organize No need to deploy servers to satisfy demand (c.f. scalability) Built-in fault tolerance, replication, and load balancing

P2P Challenges Efficient and fair use of resources Scalability How to allocate? How to deliver? How to prevent selfish behavior? How to provide incentives? Scalability How to locate resources in such a large system? How to avoid overuse of the network? How to deal with heterogeneity? Reliability and trustworthiness in open systems (fault tolerance and security) How to prevent or recover from malicious behavior Do we need or want authentication? How to deal with churn? Do we want to guarantee anonymity? Ease of administration and security What about commercial P2P? How to deal with policy issues?

P2P uses Overlay Networks Peer Peer Peer Peer IP Network Overlay Traditional Communication IP Network Tunneling Communication

P2P Architectures (Fig. 5.1) Overlay Network Unstructured Structured Architecture Model Centralized Pure P2P Hybrid (hierarchical) DHT

P2P Architectures: Tradeoffs Centralized P2P architecture - can suffer from bottleneck at the central server, simpler design Pure P2P - sometimes the content cannot be found, can potentially generate a lot of network traffic Hybrid P2P - how to select and locate the supernodes? DHT - fast routing and content discovery, more complex infrastructure and higher maintenance cost

Unstructured P2P Systems: File Sharing Napster, Gnutella, Kazaa, Freenet Large scale sharing of files. User A makes files (music, video, etc.) on their computer available to others User B connects to the network, searches for files and downloads files directly from user A Issues of copyright infringement

P2P File sharing Traffic Globally, P2P traffic now represents 55%-80% of Internet traffic [CacheLogic] * Figure from http://www.cachelogic.com/research/slide12.php

Napster A way to share music files with others Users upload their list of files to Napster server You send queries to Napster server for files of interest Keyword search (artist, song, album, bitrate, etc.) Napster server replies with IP address of users with matching files You connect directly to user A to download file * Figure from http://computer.howstuffworks.com/file-sharing.htm

Napster Central Napster server Search is centralized Can ensure correct results Bottleneck for scalability Single point of failure Susceptible to denial of service Malicious users Lawsuits, legislation Search is centralized File transfer is direct (peer-to-peer)

Gnutella Share any type of files (not just music) Decentralized search unlike Napster You ask your neighbours for files of interest Neighbours ask their neighbours, and so on TTL field quenches messages after a number of hops Users with matching files reply to you * Figure from http://computer.howstuffworks.com/file-sharing.htm

Gnutella Decentralized No single point of failure Not as susceptible to denial of service Cannot ensure correct results Flooding queries Search is now distributed but still not scalable Good at finding popular content Bad at finding rare content Two level hierarchy reduces traffic Ultrapeers Leaf peers

Freenet Data flows in reverse path of query “Smart” queries Impossible to know if a user is initiating or forwarding a query Impossible to know if a user is consuming or forwarding data “Smart” queries Requests get routed to correct peer by incremental discovery * Figure from “Protecting Freedom of Information Online with Freenet”, Ian Clarke and Scott Miller. IEEE Internet Computing, Jan/Feb 2002

Comparison of file sharing networks Napster (centralized) Bottleneck (scalability, failure, denial of service) Correct search results (centralized search) Gnutella (distributed) No bottleneck No guarantee on search results Freenet Anonymity Less efficient data transfer

Anonymity Napster, Gnutella, Kazaa don’t provide anonymity Freenet Users know who they are downloading from Others know who sent a query Freenet Designed to provide anonymity among other features

Unstructured P2P Systems: File download (bandwidth sharing) BitTorrent (35% of Internet traffic) Limewire eDonkey Large file is divided into fixed size blocks. Peers download missing blocks from other peers while uploading blocks they have to requesting peers (tit-for-tat) Blocks arrive out of order, so peer must reassemble the file. (More details later this term)

BitTorrent “Offline” search Bartered “Tit for Tat” download bandwidth No search built into protocol Bartered “Tit for Tat” download bandwidth Download one (random) chunk from a storage peer, slowly Subsequent chunks bartered with concurrent downloaders As tracked by the tracker for the file The more chunks you can upload, the more you can download Download speed starts slow, then goes fast Great for large files * Content from Hellerstein’s VLDB2004 P2P Tutorial

Structured P2P Second generation P2P overlay networks Self-organizing Load balanced Fault-tolerant Scalable guarantees on numbers of hops to answer a query Major difference with unstructured P2P systems Based on a distributed hash table interface

Distributed hash tables (DHT) Distributed version of a hash table data structure Store and retrieve (key, value) pairs The key is like a filename The value can be file contents

DHT applications Many services can be built on top of a DHT interface File sharing Archival storage Databases Naming, service discovery Chat service Rendezvous-based communication Publish/Subscribe

DHT desirable properties Keys mapped evenly to all nodes in the network Each node maintains information about only a few other nodes Messages can be routed to a node efficiently Node arrival/departures only affect a few nodes

DHT routing protocols DHT is a generic interface There are several implementations of this interface Chord [MIT] Pastry [Microsoft Research UK, Rice University] Tapestry [UC Berkeley] Content Addressable Network (CAN) [UC Berkeley] SkipNet [Microsoft Research US, Univ. of Washington] Kademlia [New York University] Viceroy [Israel, UC Berkeley] P-Grid [EPFL Switzerland] Freenet [Ian Clarke] Freenet more concerned with privacy/security, not as much on delivery guarantees. Others: Farsite, SALAD (Douceur) are about file systems

P2P Challenges: Resource Discovery in Unstructured How to find desired resources in a large scale, open and dynamic P2P network? Think of it as a graph traversal problem Flooding Random walk Expanding ring Advertisement-based Rendezvous-point History-based Many more (Resource discovery in structured P2P networks is very different and will be covered later)

P2P Challenges: Incentives & Fairness How to provide incentives for peers to participate Goodness of their hearts (seems to work) Fame, competitive spirit Credit schemes Game theory How to ensure fairness Prevent freeriders Accounting mechanisms Game theory !! Don’t even try to??

P2P Challenges: Security Again Malicious Behavior Failure to forward messages/files Faked computational results in cycle sharing Corrupted files/data/code Deliberate delay of messages to gain an advantage (e.g. games) Inconsistent information relayed to different peers Faking work to get credit Denial of service attacks Collusion among several peers to do harm Sybil attack (forging multiple identities from one peer to gain advantage)

P2P Challenges: routing Peers that are close in the overlay network can be far in the physical network. N20 N41 N80 N40 * Figure from http://project-iris.net/talks/dht-toronto-03.ppt

For class discussion: 1. Is a sensor network a P2P network? 2. If stores gave away free CDs and DVDs 24-7, what would happen to P2P computing and traffic? 3. Do this in pairs: Redesign the 2 P2P or not 2 P2P decision tree to (a) include more issues such as those covered in today’s lecture, and (b) rearrange the order of the decisions in a more natural order. Optional: to be more suitable for a specific application such as gnutella