Peer-to-Peer Information Systems Week 2: File Sharing Old Dominion University Department of Computer Science CS 495/595 Fall 2003 Michael L. Nelson <mln@cs.odu.edu> 9/2/03
File Sharing Milieu we make a slightly artificial distinction between file sharing and file storage sharing has less temporal guarantees than storage file sharing projects are covered in a separate lecture Popularity KaZaA Napster Gnutella Freenet Free Haven OceanStore Publius Privacy Intermemory Persistence
Where is Napster in Our Book? Who will write for Napster? Shawn Fanning? incidentally, a Bacon # of 2: http://oracleofbacon.org/cgi-bin/oracle/movielinks?firstname=Kevin+Bacon&game=1&secondname=shawn+fanning Kasaras, “Music in the Age of Free Distribution: MP3 and Society” http://www.firstmonday.dk/issues/issue7_1/kasaras/ should have been a chapter in our book
What is an MP3? Motion Picture Experts Group 3 layers of audio from: http://www.cs.auckland.ac.nz/compsci708s1c/lectures/jpeg_mpeg/mpeg_audio.html Layer 1 : 192 Kbps per channel, simple implementation, no temporal masking. Layer 2 : 128 Kbps per channel, medium complexity, some temporal masking. Layer 3 : 64 Kbps per channel, complex implementation, temporal masking. roughly 1/12 the size of WAV files 1 cd: ~ 700 MB or 80 minutes 8.75 MB / minute mp3: ~ 960 minutes in the same 700 MB! 0.72 MB / minute instead of 16 five minute songs on a CD, you know have 192! an order of magnitude improvement
Disruptive Technologies Computers steady (or falling) price points increasing computational capabilities increasing storage capabilities Networking high speed at home, school, work File an order of magnitude improvement
Innovator’s Dilemma Clayton M. Christensen order of magnitude changes are disruptive technologies disruptive technologies change the community
OpenNap http://opennap.sourceforge.net/ Protocol http://opennap.sourceforge.net/napster.txt format of client<->server communication: <length><type><data> 100+ types!!! client<->client communication upload, download, firewalled upload, firewalled download firewalled clients use the server as a proxy session walkthrough at: http://david.weekly.org/code/napster.php3
Gnutella as described in chapter 8 super nodes to be discussed in “Performance” lecture really a protocol that is supported by many applications http://www.gnutella.com/ http://gnutella.wego.com/ servent == client and server
Gnutella Protocol Descriptor Description Ping used to discover servents on the network Pong Response to a ping; describes the answering servent Query keyword search of a servent’s holdings QueryHit Response to Query; provides information about how to retrieve a file Push Allows firewalled servents to share files Get Get a file from a servent adapted from: http://www.dcs.gla.ac.uk/~iraklis/fyp_report/node11.html
Client-Server Cocktail Party (pp. 98-99) Napster Enter the party of 35M via the foyer Connect, upload your list of files, your list is indexed at the central server Your only friend is the party host Napster tells you your list of files was recvd You ask the host where the snacks are You send your keyword search to Napster’s central server The host points you to where the snacks are Napster server tells you where the files are that match your query You go to the snacks You select a file and the Napster server brokers a download from the machine that holds it
Gnutella Cocktail Party (p. 98) Enter the party, say hello to the first person you meet Connect to any Gnutella host and issue a PING Shortly, your friends see you and say hello Your PING is broadcast to hosts “nearby”, who eventually respond with a PONG You ask your friends where the snacks are You query the hosts you “know” (via PONGs) for a file Nobody knows, so they ask people next to them, and so on until everyone in the room has been asked. If a host has the file, it responds to you. But the query is passed on other hosts as well until the whole network is canvassed. The few folks near the snacks let you know where they are. Several hits are routed back to you. You walk over to the snacks. You select from the hits that are presented to you.
Gnutella Messages Every message has a 128-bit universal unique identifier (UUID) Leach & Salz, “UUIDs and GUIDs” http://www.opengroup.org/dce/info/draft-leach-uuids-guids-01.txt hash of (among others) clock + MAC address more on this in the “Naming” lecture default time-to-live (TTL) of 7
Gnutella Messages when a mesg is recvd, (temporarily) store its UUID, then forward to peers do not forward messages with UUIDs that you have seen before every servent decrements the TTL each time they pass the message on when TTL==0, don’t forward the message Pongs & QueryHits are routed directly back to the originator and not forwarded
Gnutella search mechanism Steps: Node 2 initiates search for file A 7 1 A 4 2 6 3 5 Slide from Matei Ripenau; http://www.comp.lancs.ac.uk/computing/users/blundeln/DeptSite/public/PGNet2002Presentation.ppt
Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors 7 1 4 2 A 6 3 A 5 Slide from Matei Ripenau; http://www.comp.lancs.ac.uk/computing/users/blundeln/DeptSite/public/PGNet2002Presentation.ppt
Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message 7 1 4 2 A 6 3 A 5 Slide from Matei Ripenau; http://www.comp.lancs.ac.uk/computing/users/blundeln/DeptSite/public/PGNet2002Presentation.ppt
Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message 7 1 4 2 A 6 3 A:5 5 A Slide from Matei Ripenau; http://www.comp.lancs.ac.uk/computing/users/blundeln/DeptSite/public/PGNet2002Presentation.ppt
Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back-propagated 7 1 4 2 A:7 A:5 6 3 A 5 A Slide from Matei Ripenau; http://www.comp.lancs.ac.uk/computing/users/blundeln/DeptSite/public/PGNet2002Presentation.ppt
Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back-propagated 7 1 4 A:7 2 A:5 6 3 5 Slide from Matei Ripenau; http://www.comp.lancs.ac.uk/computing/users/blundeln/DeptSite/public/PGNet2002Presentation.ppt
Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back-propagated File download download A 7 1 4 2 6 3 5 Slide from Matei Ripenau; http://www.comp.lancs.ac.uk/computing/users/blundeln/DeptSite/public/PGNet2002Presentation.ppt
Gnutella Maps from: http://www.cybergeography.org/atlas/more_topology.html; cf. figure 8-3 (p. 109)
A More Centralized Gnutella? Reflectors (p. 112) maintains an index of its neighbors does not re-transmit the query, but answers from its own index “mini-napster” prelude to “super-nodes” (lecture on “performance”) Host caches (p. 113) bootstrapping your connection cf. our list serve for more convenient for users, but it doesn’t produce a nice random graph everyone ends up in the tightly connected cell
Kan’s Take on Scaling Chapter 8 Horizon Cellular telephony analogy number of nodes that can be “seen” within your TTL only your neighbors really matter Cellular telephony analogy coverage radius Ethernet it works!
Ritter’s Take on Scaling Jordan Ritter, “Why Gnutella Can’t Scale. No, Really” http://www.darkridge.com/~jpr5/doc/gnutella.html description of Gnutella ca. early 2001 (go through the figures; not pasted in the slides)
Ripeanu, Iamnitchi & Foster’s Take on Scaling “Mapping the Gnutella Network”, IEEE Internet Computing, 6(1), 2002 http://people.cs.uchicago.edu/~matei/PAPERS/ic.pdf Gnutella continues to grow… figure 1 from Ripeanu, Iamnitchi & Foster, IEEE IC, 6(1), 2002
Sampling of Messaging figure 2 from Ripeanu, Iamnitchi & Foster, IEEE IC, 6(1), 2002
Gnutella - Small World? figures 3&4 from Ripeanu, Iamnitchi & Foster, IEEE IC, 6(1), 2002
Gnutella - Power Law? figures 5&6 from Ripeanu, Iamnitchi & Foster, IEEE IC, 6(1), 2002
Gnutella Topology vs. Network Topology figure 7 from Ripeanu, Iamnitchi & Foster, IEEE IC, 6(1), 2002
Gnutella: 4-Cayley Trees Gunther observes that Gnutella’s architecture is a 4-Cayley Tree Gunther, “Hypernets - Good (G)news for Gnutella” http://www.perfdynamics.com/Papers/Gnews.html http://xxx.lanl.gov/abs/cs.PF/0202019 definition: “A tree in which each non-leaf graph vertex has a constant number of branches n is called an n-Cayley tree.” from: http://mathworld.wolfram.com/CayleyTree.html from Rains & Sloane, we find that Cayley worked on this ca. 1875. Rains & Sloane, “On Cayley's Enumeration of Alkanes (or 4-Valent Trees)”, Journal of Integer Sequences, Vol. 2 (1999), Article 99.1.1 http://www.math.uwaterloo.ca/JIS/cayley.html
Let’s Build a Gnutella Network! N=2, TTL=2 N=2, TTL=5 N=5, TTL=5 N=5, TTL=2