1 1/30/2008 Network Applications: P2P Applications.

1 1/30/2008 Network Applications: P2P Applications

Recap: FTP, HTTP r FTP: file transfer m ASCII (human-readable format) requests and responses m stateful server m one data channel and one control channel r HTTP m ASCII requests, header lines, entity body, and responses line m stateless server (each request should contain the full information) m one data channel 2

3 Domain Name Service (DNS) r Hierarchical delegation avoids central control, improving manageability and scalability r Redundant servers improve robustness m see http://www.internetnews.com/dev- news/article.php/1486981 for DDoS attack on root servers in Oct. 2002 (9 of the 13 root servers were crippled, but only slowed the network)http://www.internetnews.com/dev- news/article.php/1486981 m see http://www.cymru.com/DNS/index.html for performance monitoringhttp://www.cymru.com/DNS/index.html r Caching reduces workload and improve robustness

4 Problems of DNS r Domain names may not be the best way to name other resources, e.g. files r Separation of DNS query from application query does not work well in mobile, dynamic environments m e.g., locate the nearest printer r Simple query model make it hard to implement advanced query r Relatively static records m although theoretically you can update the values of the records, it is rarely enabled

Summary of Traditional C-S Network Applications r How does a client locate a server? r Is the application extensible, robust, scalable? 5 app. server C0C0 client 1 client 2 client 3 client n down speed to the clients? DNS slashdot effect, CNN on 9/11

An Upper Bound on Scalability r Assume m need to achieve same speed to all clients m only uplinks can be bottlenecks 6 server C0C0 client 1 client 2 client 3 client n C1C1 C2C2 C3C3 CnCn

An Upper Bound on Scalability  Maximum throughput R = min{C 0, (C 0 +  C i )/n}  R is clearly the upper bound  The bound is theoretically approachable  why not in practice? 7 server C0C0 client 1 client 2 client 3 client n C1C1 C2C2 C3C3 CnCn

Outline r Recap r P2P 8

9 Objectives of P2P r Bypass DNS to access resources! m examples: instant messaging, skype r Share the storage and bandwidth of individual clients to improve scalability m examples: file sharing and streaming Internet

Peer-to-Peer Computing r Quickly grown in popularity: m dozens or hundreds of file sharing applications m 50-80% Internet traffic is P2P m upset the music industry, drawn college students, web developers, recording artists and universities into court From ipoque web site; Nov. 2007

What is P2P? r But P2P is not new and is probably here to stay r Original Internet was a p2p system: m The original ARPANET connected UCLA, Stanford Research Institute, UCSB, and Univ. of Utah m No DNS or routing infrastructure, just connected by phone lines m Computers also served as routers r P2P is simply an iteration of scalable distributed systems

P2P Systems r File Sharing: BitTorren, LimeWire r Streaming: PPLive, PPStream, … r Research systems m Distributed Hash Tables m Content distribution networks r Collaborative computing: m SETI@Home project m Human genome mapping m Intel NetBatch: 10,000 computers in 25 worldwide sites for simulations, saved about 500million

Outline r Recap r P2P m the lookup problem 13

The Lookup Problem Internet N1N1 N2N2 N3N3 N6N6 N5N5 N4N4 Publisher Key=“title” Value=MP3 data… Client Lookup(“title”) ? find where a particular file is stored pay particular attention to see its equivalence of DNS

15 Outline  Recap  P2P  the lookup problem  Napster

16 Centralized Database: Napster r Program for sharing music over the Internet r History: m 5/99: Shawn Fanning (freshman, Northeasten U.) founded Napster Online music service, wrote the program in 60 hours m 12/99: first lawsuit m 3/00: 25% UWisc traffic Napster m 2000: est. 60M users m 2/01: US Circuit Court of Appeals: Napster knew users violating copyright laws m 7/01: # simultaneous online users: Napster 160K m 9/02: bankruptcy We are referring to the Napster before closure. 03/2000

17 Napster: How Does it Work? Application-level, client-server protocol over TCP A centralized index system that maps files (songs) to machines that are alive and with files Steps: r Connect to Napster server r Upload your list of files (push) to server r Give server keywords to search the full list r Select “best” of hosts with answers

18 Napster Architecture

Napster: Publish I have X, Y, and Z! Publish insert(X, 123.2.21.23)... 123.2.21.23

Napster: Search Where is file A? Query Reply search(A) --> 123.2.0.18 124.1.0.1 123.2.0.18 124.1.0.1

Napster: Ping ping 123.2.0.18 124.1.0.1 ping

Napster: Fetch 123.2.0.18 124.1.0.1 fetch

23 Napster Messages General Packet Format [chunksize] [chunkinfo] [data...] CHUNKSIZE: Intel-endian 16-bit integer size of [data...] in bytes CHUNKINFO: (hex) Intel-endian 16-bit integer. 00 - login rejected 02 - login requested 03 - login accepted 0D - challenge? (nuprin1715) 2D - added to hotlist 2E - browse error (user isn't online!) 2F - user offline 5B - whois query 5C - whois result 5D - whois: user is offline! 69 - list all channels 6A - channel info 90 - join channel 91 - leave channel …..

24 Centralized Database: Napster r Summary of features: a hybrid design m control: client-server (aka special DNS) for files m data: peer to peer r Advantages m simplicity, easy to implement sophisticated search engines on top of the index system r Disadvantages m application specific (compared with DNS) m lack of robustness, scalability: central search server single point of bottleneck/failure m easy to sue !

25 Variation: BitTorrent r A global central index server is replaced by one tracker per file (called a swarm) m reduces centralization; but needs other means to locate trackers r The bandwidth scalability management technique is more interesting m more later

26 Outline  Recap  P2P  the lookup problem  Napster (central query server; distributed data servers)  Gnutella

Gnutella r On March 14 th 2000, J. Frankel and T. Pepper from AOL’s Nullsoft division (also the developers of the popular Winamp mp3 player) released Gnutella r Within hours, AOL pulled the plug on it r Quickly reverse-engineered and soon many other clients became available: Bearshare, Morpheus, LimeWire, etc. 27

28 Decentralized Flooding: Gnutella r On startup, client contacts other servents (server + client) in network to form interconnection/peering relationships m servent interconnection used to forward control (queries, hits, etc) r How to find a resource record: decentralized flooding m send requests to neighbors m neighbors recursively forward the requests

29 Decentralized Flooding B A C E F H J S D G I K M N L

30 Decentralized Flooding B A C E F H J S D G I K send query to neighbors M N L  Each node forwards the query to its neighbors other than the one who forwards it the query

31 Background: Decentralized Flooding B A C E F H J S D G I K M N L  Each node should keep track of forwarded queries to avoid loop !  nodes keep state (which will time out---soft state)  carry the state in the query, i.e. carry a list of visited nodes

32 Decentralized Flooding: Gnutella r Basic message header m Unique ID, TTL, Hops r Message types m Ping – probes network for other servents m Pong – response to ping, contains IP addr, # of files, etc. m Query – search criteria + speed requirement of servent m QueryHit – successful response to Query, contains addr + port to transfer from, speed of servent, etc. m Ping, Queries are flooded m QueryHit, Pong: reverse path of previous message

33 Advantages and Disadvantages of Gnutella r Advantages: m totally decentralized, highly robust r Disadvantages: m not scalable; the entire network can be swamped with flood requests especially hard on slow clients; at some point broadcast traffic on Gnutella exceeded 56 kbps m to alleviate this problem, each request has a TTL to limit the scope each query has an initial TTL, and each node forwarding it reduces it by one; if TTL reaches 0, the query is dropped (consequence?)

Aside: Search Time?

Aside: All Peers Equal? 56kbps Modem 10Mbps LAN 1.5Mbps DSL 56kbps Modem 1.5Mbps DSL

Aside: Network Resilience Partial TopologyRandom 30% dieTargeted 4% die from Saroiu et al., MMCN 2002

Flooding: FastTrack (aka Kazaa) r Modifies the Gnutella protocol into two-level hierarchy r Supernodes m Nodes that have better connection to Internet m Act as temporary indexing servers for other nodes m Help improve the stability of the network r Standard nodes m Connect to supernodes and report list of files r Search m Broadcast (Gnutella-style) search across supernodes r Disadvantages m Kept a centralized registration  prone to law suits

38 Outline  Recap  P2P  the lookup problem  Napster (central query server; distributed data server)  Gnutella (decentralized, flooding)  Freenet

39 Freenet r History m final year project Ian Clarke, Edinburgh University, Scotland, June, 1999Ian ClarkeEdinburgh University r Goals: m totally distributed system without using centralized index or broadcast (flooding) m respond adaptively to usage patterns, transparently moving, replicating files as necessary to provide efficient service m provide publisher anonymity, security m free speech : resistant to attacks – a third party shouldn’t be able to deny (e.g., deleting) the access to a particular file (data item, object)

40 Basic Structure of Freenet r Each machine stores a set of files; each file is identified by a unique identifier (called key or id) r Each node maintains a “routing table” m id – file id, key m next_hop node – where a file corresponding to the id might be available m file – local copy if one exists id next_hop file … … …

41 Query r API: file = query(id); r Upon receiving a query for file id m check whether the queried file is stored locally if yes, return it if not, forward the query message –key step: search for the “closest” id in the table, and forward the message to the corresponding next_hop id next_hop file … … …

42 Query Example 4 n1 f4 12 n2 f12 5 n3 9 n3 f9 3 n1 f3 14 n4 f14 5 n3 14 n5 f14 13 n2 f13 3 n6 n1 n2 n3 n4 4 n1 f4 10 n5 f10 8 n6 n5 query(10) 1 2 3 44’ 5 Beside the routing table, each node also maintains a query table containing the state of all outstanding queries that have traversed it  to backtrack

43 Query: the Complete Process r API: file = query(id); r Upon receiving a query for file id m check whether the queried file is stored locally if yes, return it; otherwise m check TTL to limit the search scope each query is associated a TTL that is decremented each time the query message is forwarded when TTL=1, the query is forwarded with a probability TTL can be initiated to a random value within some bounds to obscure distance to originator m look for the “closest” id in the table with an unvisited next_hope node if found one, forward the query to the corresponding next_hop otherwise, backtrack –ends up performing a Depth First Search (DFS)-like traversal –search direction ordered by closeness to target r When file is returned it is cached along the reverse path (any advantage?)

1 1/30/2008 Network Applications: P2P Applications.

Similar presentations

Presentation on theme: "1 1/30/2008 Network Applications: P2P Applications."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 1/30/2008 Network Applications: P2P Applications.

Similar presentations

Presentation on theme: "1 1/30/2008 Network Applications: P2P Applications."— Presentation transcript:

Similar presentations

About project

Feedback