Peer to Peer Networking
Network Models => Mainframe Ex: Terminal User needs direct connection to mainframe Secure Account driven administrator controlled Batch process oriented Data storage on the server only
Network Models => Client/Server Ex: WWW User interface, business rules Backend database Data storage on the server, client N-tier architecture Hierarchical
Network Models => Distributed Architecture Ex: BitTorrent Tasks may be parallel or autonomous Computation is done at the edges Geographically distributed thru single interface Data storage is distributed
Generations of P2P 1st Generation: Centralized file list Napster He who controls central file is responsible legally 2nd Generation: Decentralized file lists Gnutella, FastTrack Improvements – optimizations of decentralized search 3rd Generation: No file lists Freenet, WASTE, Entropy, MUTE Anonymity built in
The Good, Bad, and Ugly of P2P The Good Security based on social contract Free exchange of ideas Everyone’s computer can contribute to the greater good The Bad Avoids most security: Can be used for piracy The Ugly Often targeted by RIAA and others for piracy
Peer-to-Peer Concepts Bootstrapping Finding peers to connect to Peer Discovery Finding other peers in the system Content Location Finding a peer with the desired content Content Delivery Downloading from selected peer or peers
Napster Bootstrapping & Peer Discovery Centralized server Content Location Tell server IP address & filenames Send query to server - returns list of peers Content Delivery Download from a single peer
Napster in court Napster claimed they were not infringing copyright because they were not storing any songs shutdown by court injunction because case against them was likely to succeed Napster users likely guilty of direct copyright infringement - copying of a work by another Napster likely to be guilty of contributory infringement because they learned of infringement and failed to purge the materials from its system Napster likely to be guilty of vicarious infringement because they supervised or controlled the party engaging in infringing activity and had a financial interest in the activities
Gnutella peer-to-peer networking: applications connect to peer applications focus: decentralized method of searching for files each application instance serves to: store selected files route queries (file searches) from and to its neighboring peers respond to queries (serve file) if file stored locally Gnutella history: 3/14/00: release by AOL, almost immediately withdrawn too late: 10K users managed to download
Gnutella Bootstrapping First time: connect to a peer that you heard about outside of gnutella Keep a cache of peers discovered for later use Peer Discovery Try to always be connected to a fixed number Send ping message - flooded to neighbors Respond to ping with pong Contains IP address, port, # files, # KB
Gnutella: Content Location Searching by flooding: If you don’t have the file you want, query 7 of your partners. If they don’t have it, they contact 7 of their partners, for a maximum hop count of 10. Requests are flooded, but there is no tree structure. No looping but packets may be received twice. No prioritization mechanism
Gnutella Content Delivery Direct download from peer If peer is behind a firewall ask it to connect to you If you are both behind a firewall - too bad Problems No explicit rate limiting on ping frequency or query frequency - overload network Slow peers can hinder faster peers
Free Riding We want to move from the client server architecture:
Free Riding Towards a robust, decentralized p2p architecture:
Free Riding But due to free riding, we end up with:
Free Riding Characteristics Exhibits a Pareto distribution of sharers (many people have small hard disks, small bandwidth and small hearts, few have large) Hurts overall resiliency, network throughput The move from the traditional star(s) topology is less than one would wish. Equilibrium far away from global optimum
Free riding statistics on Gnutella 66% of hosts share no files 73% of hosts share ten or less files Top 1% shares 40% of the files in the network and answers 50% of the queries Top 20% share 98% of the files 61% never answered a query (no one wants their files)
Gnutella Group Leaders Ultrapeers Low bandwidth peers connect to group leader Queries through group leaders Cached hash tables Hits include estimate of upload speed Protocol extensions Parallel download Persistent, location independent filenames (URNs) LAN multicast
On came BitTorrent Author: Bram Cohen Based on Tit-for-tat Incentive - Uploading while downloading Pieces of files
Bittorrent Bootstrapping Download a.torrent file from a web server Contact listed tracker for list of peers Peer Discovery Periodically contact tracker Content Location Check with each peer to determine which blocks they have Download rarest blocks first
Bittorrent - Content Delivery Seed A server which has the entire file Other peers may also act as a seed if they linger after downloading the file Parallel Download Incentives Serve content to k connections at a time Serve to connections that give you the most Periodically serve to a random connection to see if it can do better than current connections
Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Web Server.torrent url of the tracker Pieces Piece length Name Length Files
Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Get-announce(HTTP) Web Server Peer-cache State information
Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Response-peer list (random) Web Server Peer-cache State information
Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Shake-hand(TCP) Web Server Shake-hand
Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker pieces Web Server
Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker pieces Web Server
Peer Selection (tit for tat) Incentive Mechanism Choking Algorithm Temporary refusal to upload - performed every 10s Based solely on download rate - tit for tat Optimistic Unchoking Rotating peer to optimistically unchoke Rediscover unused connections and changes Anti-snubbing When a peer receives no data from another in 60s, assume it is choked by all other peers. Refuse to upload to it except for optimistic unchoking
Strengths Better bandwidth utilization Up to 7 MB/s from the Internet. Limit free riding – tit-for-tat Coupled upload and download Spurious files not propagated Ability to resume a download
Weaknesses and Open Issues In practice, the seed does an inproportionate amount of work Peer selection strategy Can we do better than random? Block selection strategy Rarest first? How well do incentives work?