Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

INF 123 SW ARCH, DIST SYS & INTEROP LECTURE 12 Prof. Crista Lopes.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
CompSci 356: Computer Network Architectures Lecture 21: Content Distribution Chapter 9.4 Xiaowei Yang
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Gnutella, Freenet and Peer to Peer Networks By Norman Eng Steven Hnatko George Papadopoulos.
1 Client-Server versus P2P  Client-server Computing  Purpose, definition, characteristics  Relationship to the GRID  Research issues  P2P Computing.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
Peer-to-peer: an overview Selo TE P2P is not a new concept P2P is not a new technology P2P is not a new technology Oct : first transmission.
Peer-peer and Application-level Networking CS 218 Fall 2003 Multicast Overlays P2P applications Napster, Gnutella, Robust Overlay Networks Distributed.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Top P2P File- sharing Software (some of them ). eDonkey/Overnet Especially popular in Europe, the two P2P networks eDonkey and Overnet combined support.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
Gnutella2: A Better Gnutella?

Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Peer-to-Peer Networking. Presentation Introduction Characteristics and Challenges of Peer-to-Peer Peer-to-Peer Applications Classification of Peer-to-Peer.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Peer-to-Peer Networks University of Jordan. Server/Client Model What?
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
GNUTELLA PEER-TO-PEER NETWORKING. GNUTELLA n What is Gnutella n Relation to the World Wide Web n How it Works n Sites / Links / Information.
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
Enabling Peer-to-Peer SDP in an Agent Environment University of Maryland Baltimore County USA.
1 V1-Filename.ppt / / Jukka K. Nurminen Content Search UnstructuredP2P Content Search Unstructured P2P Jukka K. Nurminen *Partly adapted from.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
FastTrack Network & Applications (KaZaA & Morpheus)
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
PEER TO PEER (P2P) NETWORK By: Linda Rockson 11/28/06.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
P2PComputing/Scalab 1 Gnutella and Freenet Ramaswamy N.Vadivelu Scalab.
By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer-to-peer systems (part I) Slides by Indranil Gupta (modified by N. Vaidya)
Peer to Peer Computing. What is Peer-to-Peer? A model of communication where every node in the network acts alike. As opposed to the Client-Server model,
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Peer-to-Peer Systems: An Overview Hongyu Li. Outline  Introduction  Characteristics of P2P  Algorithms  P2P Applications  Conclusion.
Peer to Peer Networking. Network Models => Mainframe Ex: Terminal User needs direct connection to mainframe Secure Account driven  administrator controlled.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
CMSC 691B Multi-Agent System A Scalable Architecture for Peer to Peer Agent by Naveen Srinivasan.
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
BitTorrent Vs Gnutella.
Peer-to-Peer and Social Networks
Unstructured Routing : Gnutella and Freenet
Presentation transcript:

Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research test bed Large scale network demonstrates the need for scalable P2P protocols Large scale network demonstrates the need for scalable P2P protocols A Gnutella client has 4-10 TCP connections to other peers For signaling traffic UDP is used and to make use of the benefits of server based networks a ”ultra-peer” state was created

Introduction (Cont.) ”Ultra-peer” status is self assigned by powerful peers and provides some extra functionality compared to ordinary nodes There exist many freely available Gnutella clients Some of the most popular are: Limewire Limewire Bearshare Bearshare Morpheus Morpheus Shareaza Shareaza It has the most increasing number of users It has a very pleasant GUI and connects also to eDonkey and BitTorrent

Its Main Features This protocol underlies much of the current file-sharing activity on the Internet. It is based on TCP/IP and http! A file sharing network (fsn) is a bunch of machines that exchange files using gnutella. To connect to a gnutella network, you need the IP address of one single machine that is already part of the network.

Gnutella Peer-to-peer indexing and searching service. Peer-to-peer point-to-point file downloading using HTTP. A gnutella node needs a server (or a set of servers) to “start-up”… gnutellahosts.com provides a service with reliable initial connection points But introduces a new single point of failure!

Gnutella vs. Napster Like Napster, distributed file storage and transmission Added the ability to distribute file discovery Ask your direct peers who else they know Ask your direct peers who else they know Query those machines directly Query those machines directly

Concepts of Unstructured Services There are many interesting ideas being explored; Breaking shared files into many parts to both increase bandwidth (parallel I/O) and increase security of content as no one site can access files without cooperation from its peers Breaking shared files into many parts to both increase bandwidth (parallel I/O) and increase security of content as no one site can access files without cooperation from its peers This type of technology makes censorship very hard. MojoNation has a load balancing and scheduling algorithm in the form of micro payments to reward those who contribute most to the community of peers. MojoNation has a load balancing and scheduling algorithm in the form of micro payments to reward those who contribute most to the community of peers. Gnutella - which is a family of related products -- is usually described as a P2P search engine as its interface is nearer that of a search engine than a Web file system Gnutella - which is a family of related products -- is usually described as a P2P search engine as its interface is nearer that of a search engine than a Web file system

Characteristics Gnutella is a distributed system for file sharing Gnutella is a distributed system for file sharing provide means for network discovery provide means for file searching and sharing Defines a network at the application level Defines a network at the application level Employs the concept of peer-to-peer Employs the concept of peer-to-peer all hosts are equal (symmetry) there is no central point anonymous search, but reveal the IP addresses when downloading anonymous search, but reveal the IP addresses when downloading

connection Once you establish connection to the first servent, you announce your presence. The first servent will pass on that message to all the servents that it is connected to, and so on. These servents all reply with data about themselves how many files it is sharing how many files it is sharing how many kilo bytes the files take up how many kilo bytes the files take up This already adds up to a lot of traffic!

Gnutella File Sharing model n Users register files with network neighbors n Search across the network to find files to copy n Does not require a centralized broker (as Napster) BobCarol Ted Alice Where is Final Fantasy 4?Carol has Final Fantasy 4 Copying Final Fantasy 4 Where is Final Fantasy 4? Carol has it

Decentralized File-sharing Model Peers have same capability and responsibility The communication between peers is symmetric There is no central directory server Index on the metadata of shared files is stored locally among all peers Gnutella Gnutella FreeServe FreeServe MojoNation MojoNation Resource Discovery

Decentralized (Cont.) every user acts as a client, a server or both (servent) User connects to framework and becomes a member of the community, allowing others to connect through him/her Users speak directly to other users with no intermediate or central authority No one entity controls the information that passes through the community Resource Discovery

Advantages and Disadvantages Advantages: Inherent scalability Inherent scalability Avoidance of “single point of litigation” problem Avoidance of “single point of litigation” problem Fault Tolerance Fault ToleranceDisadvantages: Slow information discovery Slow information discovery More query traffic on the network More query traffic on the network Resource Discovery

Unstructured Decentralized Services There some 200 available Napster clones to support this area Currently the most popular is Imesh [ which has some 2 million users and can share any type of file. Some of the best known file sharing systems are Some of the best known file sharing systems are MojoNation [ MojoNation [ Freenet [ Freenet [ Gnutella [ Gnutella [ These three are not server based like Napster but rather support waves of software agents expressing resource availability and interest propagating among an informal dynamic networks of peers

DFS Variations FTPNFSWebNapster Shawn Fanning Gnutella Gene AOL Freenet Ian Clark Purpose Remote file sharing Local file sharing Remote file sharing (portal) File-sharing community (portal) Decentralized file sharing community Decentralized anonymous file sharing Moderated?YesYesYesYesNoNo Access control? YesYesNoNoNoNo Search Server- based p2pp2p File transfer Client/ server p2pp2pp2p File transfer protocol ftpnfs http, caching proprietaryhttp Proprietary, encrypted, caching DFS: Distributed File Sharing

P2P File Sharing Benefits Cost sharing Resource aggregation Improved scalability/reliability Anonymity/privacyDynamism

Management/Placement Challenges Per-node state Bandwidth usage Search time Fault tolerance/resiliency

Gnutella in Details Share any type of files (not just music) Decentralized search unlike Napster You ask your neighbors for files of interest Neighbors ask their neighbors, and so on TTL field quenches messages after a number of hops TTL field quenches messages after a number of hops Users with matching files reply to you Figure from

The Gnutella protocol (v0.4) PING – Notify a peer of your existence PONG – Reply to a PING request QUERY – Find a file in the network RESPONSE – Give the location of a file PUSHREQUEST – Request a server behind a firewall to push a file out to a client.

Joining Gnutella Network A Gnutella Network The new node connects to a well known ‘Anchor’ node. Then sends a PING message to discover other nodes. PONG messages are sent in reply from hosts offering new connections with the new node. Direct connections are then made to the newly discovered nodes. New PING PONG

Properties of the Flooding Searching by flooding: If you don’t have the file you want, query 7 of your partners. If they don’t have it, they contact 7 of their partners, for a maximum hop count of 10. Requests are flooded, but there is no tree structure. No looping but packets may be received twice Note: Play gnutella animation at:

Query flooding Gnutella no hierarchy use bootstrap node to learn about others join message Send query to neighbors Neighbors forward query to all attached neighbors (floods) If queried peer has object, it sends message back to querying peer join query

More on query flooding Pros peers have similar responsibilities: no group leaders highly decentralized no peer maintains directory info Cons excessive query traffic query radius: may not have content when present bootstrap node still required maintenance of overlay network

About the Flooding There is nothing that stops a servant flooding its network region with messages. Cost of maintaining Network Cost of searching file

Breadth-First Search (BFS) = forward query = processed query = source = found result = forward response

Pros and Cons Benefits: Peers speak directly with no central authority Peers speak directly with no central authority Nobody owns the Gnutella Network and nobody can shut it down No central point of failure Limited per-node state Isolated node failure can quickly and automatically be worked around Limited per-node state Isolated node failure can quickly and automatically be worked around Free loading Free loading Scalability ScalabilityDrawbacks: Searches are less effective and can be slow Searches are less effective and can be slow Bandwidth intensive Bandwidth intensive Gnutella network evolving to include “controlled decentralization” (limewire, bearshare, toadnode) Resource Discovery

Searching for a File Gnutella Network QUERY A node broadcasts its QUERY to all its peers who in turn broadcast to their peers. Nodes route QUERYHITs along the QUERY path back to the sender containing file location details. To download files a direct connection is made using details of the host in the QUERYHIT messages. HIT

The Cooperation Spectrum

Free Riding File sharing networks rely on users sharing data Two types of free riding Downloading but not sharing any data Not sharing any interesting data On Gnutella 15% of users contribute 94% of content 63% of users never responded to a query Didn’t have “interesting” data Data from E. Adar and B.A. Huberman (2000), “Free Riding on Gnutella”

Example: GNUTELLA

Summary of the Gnutella’s Features Decentralized No single point of failure No single point of failure Not as susceptible to denial of service Not as susceptible to denial of service Cannot ensure correct results Cannot ensure correct results Flooding queries Search is now distributed but still not scalable Search is now distributed but still not scalable

Initials Problems and Fixes Freeloading: WWW sites offering search/retrieval from Gnutella network without providing file sharing or query routing Block file-serving to browser-based non-file-sharing users Prematurely terminated downloads: Software bugs long download times over modems modem users run gnutella peer only briefly (Napster problem also!) or any users becomes overloaded fix: peer can reply “I have it, but I am busy. Try again later”

Initials Problems and Fixes : avg size of reachable network only hosts Why so small? modem users: not enough bandwidth to provide search routing capabilities: routing black holes Fix: create peer hierarchy based on capabilities previously: all peers identical, most modem blackholes connection preferencing: favors routing to well-connected peers favors reply to clients that themselves serve large number of files: prevent freeloading Limewire gateway functions as Napster-like central server on behalf of other peers for searching purposes

Gnutella Enhancements Pings/Pongs can consume up to 50% of bandwidth Solutions: Pong Limiting Pong Caching Ping Multiplexing

Gnutella enhancements 2 Cache query responses Results Evolving Protocol Gnutella Developer Forum UltraPeers Alternative query routing algorithms

Can Heterogeneity Make Gnutella Scale? Ideas Replace query flooding with multiple random walks Proactive replication #replicas proportional to sqrt(request rate) Result: Two orders of magnitude improvement in terms of query-time, per node load and message traffic

Can Heterogeneity Make Gnutella Scale? 2 Gnutella assumption: All peers are equal Not true! Heterogeneity among P2P peers (dial-up users vs. college users) Evolve topology to match node capacities Use random walks over this topology

Can Heterogeneity Make Gnutella Scale? 3 Solution outline C_i, node capacity in[j,i] messages from j->i, out[i,j] messages i->j Init in[i,j]=out[i,j]=0, OutMax[i,j]=c_i/d_I Update according the messages received/sent Check if overloaded If so redirect high-input neighbor to neighbor with high OutMax (spare capacity) Intuitively, take yourself out of the loop If node cannot be found ask neighbor to throttle back Result: Average query length reduces from 70 to 2-9 hops depending on topology

Measurement Results Who is sharing what? August 2000 The topShareAs percent of whole 333 hosts (1%) 1,142,64537% 1,667 hosts (5%) 2,182,08770% 3,334 hosts (10%) 2,692,08287% 5,000 hosts (15%) 2,928,90594% 6,667 hosts (20%) 3,037,23298% 8,333 hosts (25%) 3,082,57299%

Protocol scalability Message broadcast technique imposes limitations on the network size Message broadcast technique imposes limitations on the network size packets per message = ∑noPeers i packets per message = ∑noPeers i In November 2000 dial-up bandwidth barrier reached In November 2000 dial-up bandwidth barrier reached Overlay network efficiency Random selection of peers results in inefficient use of the underlying network Random selection of peers results in inefficient use of the underlying network Redundant traffic generated on the Internet Redundant traffic generated on the Internet Problems With Gnutella TTL i=0

Heterogeneous connection qualities of the Gnutella 35% have upstream bottleneck bandwidth of at least 100Kbps only 8% have at least 10Mbps bandwidth 22% have bandwidth 100kbps or less

Number of Shared Files

Why Look at Gnutella Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts 2006: 2006: still heavily in use by about 2 million users Gnutella clients (among others): Gnutella clients (among others):LimeWireMorpheusBearShareOpenColaShareaza It has the most increasing number of users It has the most increasing number of users It has a very pleasant GUI and connects also to eDonkey and BitTorrent It has a very pleasant GUI and connects also to eDonkey and BitTorrent Ideal as a research test bed Large scale network demonstrates the need for scalable P2P protocols Large scale network demonstrates the need for scalable P2P protocols

Limewire: Improvement on Gnutella Creation peer hierarchy based on capabilities previously: all peers identical, most modem blackholes previously: all peers identical, most modem blackholes connection preferencing: connection preferencing: favors routing to well-connected peers favors reply to clients that themselves serve large number of files: prevent freeloading Limewire gateway functions as Napster-like central server on behalf of other peers for searching purposes for searching purposes

Limewire The Limewire P2P file sharing program connects to the Gnutella P2P network Limewire client software is widely recognized for its clean user interface that does not contain adware Sometimes billed as the „fastest file sharing program” Limewire claims to offer relatively good search and download performance Free Limewire software downloads are available for Windows, Linux and Macintosh operating systems Limewire Pro pay clients also exist

BearShare The BearShare P2P file sharing program is a popular free software client for the Gnutella P2P network Both free and pay downloads of BearShare file sharing programs exist

Shareaza Shareaza is an up-and-coming P2P file sharing program This client offers an extremely powerful search engine capable of connecting to multiple popular P2P networks including eDonkey, BitTorrent and Gnutella Shareaza file sharing software includes intelligence for detecting fake and/or corrupted files The free Shareaza download also contains no ads or spyware As the installed base of Shareaza client users grows expect Shareaza to become an even better P2P file sharing program expect Shareaza to become an even better P2P file sharing program

Anonymous? The person you are getting the file from knows who you are That’s not anonymous. That’s not anonymous. Other protocols exist where the owner of the files doesn’t know the requester. Peer-to-peer anonymity exists

Summary peer-to-peer networking: applications connect to peer applications focus: decentralized method of searching for files each application instance serves to: store selected files store selected files route queries (file searches) from and to its neighboring peers route queries (file searches) from and to its neighboring peers respond to queries (serve file) if file stored locally respond to queries (serve file) if file stored locally Gnutella history: 3/14/00: release by AOL, almost immediately withdrawn 3/14/00: release by AOL, almost immediately withdrawn too late: 23K users on Gnutella at 8 am this AM too late: 23K users on Gnutella at 8 am this AM many iterations to fix poor initial design (poor design turned many people off) many iterations to fix poor initial design (poor design turned many people off) What we care about: How much traffic does one query generate? How much traffic does one query generate? how many hosts can it support at once? how many hosts can it support at once? What is the latency associated with querying? What is the latency associated with querying? Is there a bottleneck? Is there a bottleneck?