Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Peer-peer and Application-level Networking Presented by Richard Gold Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class.

Similar presentations


Presentation on theme: "1 Peer-peer and Application-level Networking Presented by Richard Gold Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class."— Presentation transcript:

1 1 Peer-peer and Application-level Networking Presented by Richard Gold Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class of 2001 for the Umass Comp Sci 791N course.. Presented originally by Jon Crowcroft

2 2 0.Introduction r Background r Motivation r outline of the tutorial

3 3 Peer-peer networking

4 4 Focus at the application level

5 5 Peer-peer networking Peer-peer applications r Napster, Gnutella, Freenet: file sharing

6 6 Peer-peer networking r Q: What are the new technical challenges? r Q: What new services/applications enabled? r Q: Is it just “networking at the application-level”? m “There is nothing new under the Sun” (William Shakespeare)

7 7 Tutorial Contents r Introduction r Client-Server v. P2P r Case Studies m Napster m Gnutella m Freenet r More another time?

8 8 Client Server v. Peer to Peer(1) r RPC/RMI r Synchronous r Assymmetric r Emphasis on language integration and binding models (stub IDL/XDR compilers etc) r Kerberos style security – access control, crypto r Messages r Asynchronous r Symmetric r Emphasis on service location, content addressing, application layer routing. r Anonymity, high availability, integrity. r Harder to get right

9 9 Peer to peer systems actually old r IP routers are peer to peer. r Routers discover topology, and maintain it r Routers are neither client nor server r Routers continually chatter to each other r Routers are fault tolerant, inherently r Routers are autonomous

10 10 Peer to peer systems r Have no distinguished role r So no single point of bottleneck or failure. r However, this means they need distributed algorithms for m Service discovery (name, address, route, metric, etc) m Neighbour status tracking m Application layer routing (based possibly on content, interest, etc) m Resilience, handing link and node failures m Etc etc etc

11 11 Next we look at some case studies r Piracy^H^H^H^H^H^content sharing r Napster r Gnutella r Freenet r Etc. (lessons here apply to other p2p systems too)

12 12 1. NAPSTER r The most (in)famous r Not the first (c.f. probably Eternity, from Ross Anderson in Cambridge) r But instructive for what it gets right, and r Also wrong… r Also has a political message…and economic and legal…etc etc etc

13 13 Napster r program for sharing files over the Internet r a “disruptive” application/technology? r history: m 5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service m 12/99: first lawsuit m 3/00: 25% UWisc traffic Napster m 2000: est. 60M users m 2/01: US Circuit Court of Appeals: Napster knew users violating copyright laws m 7/01: # simultaneous online users: Napster 160K, Gnutella: 40K, Morpheus: 300K

14 14 Napster: how does it work Application-level, client-server protocol over point- to-point TCP Four steps: r Connect to Napster server r Upload your list of files (push) to server. r Give server keywords to search the full list with. r Select “best” of correct answers. (pings)

15 15 Napster napster.com users File list is uploaded 1.

16 16 Napster napster.com user Request and results User requests search at server. 2.

17 17 Napster napster.com user pings User pings hosts that apparently have data. Looks for best transfer rate. 3.

18 18 Napster napster.com user Retrieves file User retrieves file 4.

19 19 Napster: architecture notes r centralized server: m single logical point of failure m can load balance among servers using DNS rotation m potential for congestion m Napster “in control” (freedom is an illusion) r no security: m passwords in plain text m no authentication m no anonymity

20 20 2 Gnutella r Napster fixed r Open Source r Distributed r Still very political…

21 21 Gnutella r peer-to-peer networking: applications connect to peer applications r focus: decentralized method of searching for files r each application instance serves to: m store selected files m route queries (file searches) from and to its neighboring peers m respond to queries (serve file) if file stored locally r Gnutella history: m 3/14/00: release by AOL, almost immediately withdrawn m many iterations to fix poor initial design (poor design turned many people off)

22 22 Gnutella: how it works Searching by flooding: r If you don’t have the file you want, query 7 of your partners. r If they don’t have it, they contact 7 of their partners, for a maximum hop count of 10. r Requests are flooded, but there is no tree structure. r No looping but packets may be received twice. r Reverse path forwarding(?) Note: Play gnutella animation at: http://www.limewire.com/index.jsp/p2p

23 23 Flooding in Gnutella: loop prevention Seen already list: “A”

24 24 Gnutella message format r Message ID: 16 bytes (yes bytes) r FunctionID: 1 byte indicating m 00 ping: used to probe gnutella network for hosts m 01 pong: used to reply to ping, return # files shared m 80 query: search string, and desired minimum bandwidth m 81: query hit: indicating matches to 80:query, my IP address/port, available bandwidth r RemainingTTL: decremented at each peer to prevent TTL-scoped flooding r HopsTaken: number of peer visited so far by this message r DataLength: length of data field

25 25 Gnutella: initial problems and fixes r Freeloading: WWW sites offering search/retrieval from Gnutella network without providing file sharing or query routing. m Block file-serving to browser-based non-file-sharing users r Prematurely terminated downloads: m long download times over modems m modem users run gnutella peer only briefly (Napster problem also!) or any users becomes overloaded m fix: peer can reply “I have it, but I am busy. Try again later” m late 2000: only 10% of downloads succeed m 2001: more than 25% downloads successful (is this success or failure?) www.limewire.com/index.jsp/net_improvements

26 26 Gnutella: initial problems and fixes (more) r 2000: avg size of reachable network ony 400-800 hosts. Why so smalll? m modem users: not enough bandwidth to provide search routing capabilities: routing black holes r Fix: create peer hierarchy based on capabilities m previously: all peers identical, most modem blackholes m connection preferencing: favors routing to well-connected peers favors reply to clients that themselves serve large number of files: prevent freeloading m Limewire gateway functions as Napster-like central server on behalf of other peers (for searching purposes) www.limewire.com/index.jsp/net_improvements

27 27 Anonymous? r Not anymore than it’s scalable. r The person you are getting the file from knows who you are. That’s not anonymous. r Other protocols exist where the owner of the files doesn’t know the requester. r Peer-to-peer anonymity exists. r See Eternity and, next, Freenet!

28 28 Gnutella Discussion: r Architectural lessons learned? r anonymity and security? r Other? r Good source for technical info/open questions: http://www.limewire.com/index.jsp/tech_papers r Also this year’s SIGCOMM: http://www.acm.org/sigcomm/sigcomm2002/

29 29 Freenet history r Final Year project Ian Clarke, Edinburgh University, Scotland, June, 1999Ian ClarkeEdinburgh University r Sourceforge Project, most active r V.0.1 (released March 2000) r Latest version(Sept, 2001): 0.4

30 30 What is Freenet and Why? r Distributed, Peer to Peer, file sharing system r Completely anonymous, for producers or consumers of information r Resistance to attempts by third parties to deny access to information

31 31 Freenet: How it works r Data structure r Key Management r Problems m How can one node know about others m How can it get data from remote nodes m How to add new nodes to Freenet m How does freenet manage its data

32 32 Data structure r Routing Table m Pair: node address: ip, port; corresponding key value r Data Store m Requirement: rapidly find the document given a certain key rapidly find the closest key to a given key keep track the popularity of documents and know which document to delete when under pressure

33 33 Key Management(1) r A way to locate a document anywhere r Keys are used to form a URI r Two similar keys don’t mean the subjects of the file are similar! r Keyword-signed Key(KSK) m Based on a short descriptive string, usually a set of keywords that can describe the document m Example: Bush/Oil/Iraq/Excuse m Uniquely identify a document m Potential problem – global namespace

34 34 Key Management (2) r Signed-subspace Key(SSK) m Add sender information (not personal information!) to avoid namespace conflict m Private key to sign/ public key to verify r Content-hash Key(CHK) m Message digest algorithm, Basically a hash of the document

35 Darling, Tell me the truth! Believe me, I don’t have it. Freenet: Routing Algorithm: search or insert

36 But I know Joe may have it since I borrowed similar stuff from him last time. Freenet: Routing Algorithm: search or insert A B C D I

37 A, Help me! Sorry, No Freenet: Routing Algorithm: search or insert A B C D I

38 38 Strength of routing algorithm(1) r Replication of Data Clustering (1) (Note: Not subject-clustering but key- clustering!) r Reasonable Redundancy: improve data availability.

39 39 Strength of routing algorithm(2) r New Entry in the Routing Table: the graph will be more and more connected. --- Node discovery

40 40 Network convergence r X-axis: time r Y-axis: # of pathlength r 1000 Nodes, 50 items datastore, 250 entries routing table r the routing tables were initialized to ring-lattice topology r Pathlength: the number of hops actually taken before finding the data.

41 41 Scalability r X-axis: # of nodes r Y-axis: # of pathlength r The relation between network size and average pathlenth. r Initially, 20 nodes. Add nodes regularly.

42 42 Fault Tolerance r X-axis: # of nodes failing r Y-axis: # of pathlength r The median pathlength remains below 20 even when up to 30% nodes fails.

43 43 Small world Model r X-axis: # of nodes failing r Y-axis: # of pathlength r Most of nodes have only few connections while a small number of news have large set of connections. r The authors claim it follows power law.

44 44 So far, it’s really a good model r Keep anonymity r Distributed model; data available r Converge fast r Adaptive

45 45 Is it Perfect? r How long will it take to search or insert? m Trade off between anonymity and searching efforts: How do I know what key to use?? m Can we come up a better algorithm? A good try: “Search in Power-Law Networks” r Have no idea about if search fails due to no such document or just didn’t find it. r File lifetime. Freenet doesn’t guarantee a document you submit today will exist tomorrow!!

46 46 Question?? r Anonymity? Security? r Better search algorithm? Power law? r …

47 47 r Centralized model m e.g. Napster m global index held by central authority m direct contact between requestors and providers r Decentralized model m e.g. Freenet, Gnutella m no global index – local knowledge only (approximate answers) m contact mediated by chain of intermediaries P2P Characteristics

48 48 Wrapup discussion questions (1): r What is a peer-peer network (what is not a peer-to-peer network?). Necessary: m every node is designed to (but may not by user choice) provide some service that helps other nodes in the network get service m no 1-N service providing m each node potentially has the same responsibility, functionality (maybe nodes can be polymorphic) corollary: by design, nothing (functionally) prevents two nodes from communicating directly m some applications (e.g., Napster) are a mix of peer-peer and centralized (lookup is centralized, file service is peer-peer) [recursive def. of peer-peer] m (logical connectivity rather than physical connectivity) routing will depend on service and data

49 49 Wrapup discussion questions (2): r Why peer-peer over client-server? m A well-deigned p2p provides better “scaability” r Why is client-server different from peer-peer? m peer-peer is harder to make reliable m availability different from client-server (p2p is more often at least partially “up”) m more trust is required r If all music were free in the future (and organized), would we have peer-peer. m Is there another app: ad hoc networking, any copyrighted data, peer-peer sensor data gathering and retrieval, simulation r Evolution #101 – what can we learn about systems?


Download ppt "1 Peer-peer and Application-level Networking Presented by Richard Gold Based strongly on material by Jim Kurose,Brian Levine,Don Towsley, and the class."

Similar presentations


Ads by Google