Introduction Widespread unstructured P2P network A Gnutella client has 4-10 TCP connections to other peers For signalling traffic UDP is used and to make use of the benefits of server based networks a ”ultra-peer” state was created ”Ultra-peer” status is self assigned by powerful peers and provides some extra functionality compared to ordinary nodes There exist many freely available Gnutella clients Some of the most popular are: Limewire Bearshare Morpheus Shareaza It has the most increasing number of users It has a very pleasant GUI and connects also to eDonkey and BitTorrent
Main Features This protocol underlies much of the current file-sharing activity on the Internet. It is based on TCP/IP and HTTP A file sharing network (fsn) is a bunch of machines that exchange files using gnutella. To connect to a gnutella network, you need the IP address of one single machine that is already part of the network.
Gnutella Peer-to-peer indexing and searching service. Peer-to-peer point-to-point file downloading using HTTP. A gnutella node needs a server (or a set of servers) to “start-up”… gnutellahosts.com provides a service with reliable initial connection points But introduces a new single point of failure!
Gnutella vs. Napster Like Napster, distributed file storage and transmission Added the ability to distribute file discovery Ask your direct peers who else they know Query those machines directly
Characteristics Gnutella is a distributed system for file sharing provide means for network discovery provide means for file searching and sharing Defines a network at the application level Employs the concept of peer-to-peer all hosts are equal (symmetry) there is no central point anonymous search, but reveal the IP addresses when downloading
Advantages and Disadvantages Resource Discovery Advantages and Disadvantages Advantages: Inherent scalability Avoidance of “single point of litigation” problem Fault Tolerance Disadvantages: Slow information discovery More query traffic on the network
Gnutella in Details Share any type of files (not just music) Decentralized search unlike Napster You ask your neighbors for files of interest Neighbors ask their neighbors, and so on TTL field quenches messages after a number of hops Users with matching files reply to you Figure from http://computer.howstuffworks.com/file-sharing.htm
Joining Gnutella Network The new node connects to a well known ‘Anchor’ node. Then sends a PING message to discover other nodes. PONG messages are sent in reply from hosts offering new connections with the new node. Direct connections are then made to the newly discovered nodes. New PING PING PING PONG PING PING A PING PING PONG PING PING PING
Query flooding Gnutella no hierarchy use bootstrap node to learn about others join message Send query to neighbors Neighbors forward query to all attached neighbors (floods) If queried peer has object, it sends message back to querying peer query join
About the Flooding - DoS There is nothing that stops a servant flooding its network region with messages. Cost of maintaining Network Cost of searching file
The Cooperation Spectrum
Free Riding File sharing networks rely on users sharing data Two types of free riding Downloading but not sharing any data Not sharing any interesting data On Gnutella 15% of users contribute 94% of content 63% of users never responded to a query Didn’t have “interesting” data Data from E. Adar and B.A. Huberman (2000), “Free Riding on Gnutella”
Number of Shared Files
Summary store selected files peer-to-peer networking: applications connect to peer applications focus: decentralized method of searching for files each application instance serves to: store selected files route queries (file searches) from and to its neighboring peers respond to queries (serve file) if file stored locally Gnutella history: 3/14/00: release by AOL, almost immediately withdrawn too late: 23K users on Gnutella at 8 am this AM many iterations to fix poor initial design (poor design turned many people off) What we care about: How much traffic does one query generate? how many hosts can it support at once? What is the latency associated with querying? Is there a bottleneck?
Screenshots – Gnutella Logging ….
Screenshots – Gnutella Searching & Downloading ….
Image of the Gnutella network
Image of the Gnutella network
Image of the Gnutella network