Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.

Similar presentations


Presentation on theme: "1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn."— Presentation transcript:

1 1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn

2 2 Peer-to-Peer - Introduction "opposite" of Client/Server no central servers  information highly distributed every peer acts as a client AND server -> can query, reply to queries and route messages at the same time every peer can directly "talk" to any other peer

3 3 Popular Peer-to-Peer Networks Napster Gnutella Freenet FastTrack (Kazaa) CHORD, CAN, PASTRY, TAPESTRY

4 4 Napster was used primarily for file sharing NOT a pure peer-to-peer network => hybrid system peer turns to central DB for querying (client/server) peer downloads directly from other peer(s) (peer-to-peer)

5 5 Napster central DB 6 5 12 4 3 1. Query 2. Response 3. Download Request 4. File Peer

6 6 Gnutella - overview pure peer-to-peer used for file sharing very popular => practically proven ? very simple protocol no routing "intelligence" messages are always broadcast

7 7 Gnutella - PING/PONG 1 5 2 4 3 6 7 8 Ping 1 Known Hosts: 2 3,4,5 6,7,8 Pong 2 Pong 4 Pong 3 Pong 5Pong 3,4,5 Pong 6,7,8 Pong 6 Pong 7 Pong 8 Pong 6,7,8 Query/Response analogous

8 8 Gnutella - Pro & Con VERY simple protocol => easy to implement very little overhead practically proven functionality (?) message broadcasts flood network =>heavy network traffic => bad, bad scalibility

9 9 Gnutella – Reachable Peers T=1T=2T=3T=4T=5T=6T=7T=8 N=2246810121416 N=339214593189381765 N=4416521604841,4564,37213,120 N=55251054251,7056,82527,305109,225 N=66361869364,68623,436117,186585,936 N=77493011,81310,88565,317391,9092,351,461 N=88644563,20022,408156,8641,098,0567,686,400

10 10 Gnutella – Generated Traffic in Bytes (1) T=1T=2T=3T=4T=5T=6T=7T=8 N=21663324986648309961,1621,328 N=32497471,7433,7357,71915,68731,62363,495 N=43321,3284,31613,2840,172120,848362,8761,088,960 N=54152,0758,71535,275141,515566,4752,266,3159,065,675 N=64982,98815,43877,688388,9381,945,1889,726,43848,632,688 N=75814,06724,983150,479903,4555,421,31132,528,447195,171,263 N=86645,31237,848265,6001,859,86413,019,71291,138,648637,971,200 query message length: 83 bytes simple query relaying (no responses)

11 11 Gnutella – Generated Traffic in Bytes (2) T=1T=2T=3T=4T=5T=6T=7T=8 N=3 283.681,418.44,822.5613,900.336,594.791,061.3218,15508.638 N=4 378.242,647.6812,860.253,710.1206,897758,3712,688,5309,306,220 N=5 472.84,255.226,949.6147,986753,173,658,05017,214,20079,185,000 N=6 567.366,240.9648,793332,4732,105,47012,743,50074,798,500429,398,000 N=7 661.928,604.9680,092.3651,9914,941,12335,823,800252,002,0001,734,360,000 N=8 756.4811,347.2122,551,160,44010,242,00086,526,900709,521,0005,693,470,000 Mean percentage of users who typically share content: 30% Mean perctg. of users who typically have responses to search queries: 40% Mean number of search responses the typical respondent offers: 10 Mean length of search responses the typical respondent offers: 60  "Standard client settings yield a whopping 17MB generated in response to […] search query "

12 12 Freenet - Concepts peer-to-peer file storage & retrieval system every document has a globally unique ID efficient (?) retrieval algorithm –documents are retrieved with sublinear effort routing based on likelihood of answer capability focus on security

13 13 Freenet – Query Routing (1) every peer maintains routing table table contains known peers along with the IDs of the documents their are storing a request is routed to the peer most likely to have an answer (closest matching ID) responses are sent back upstream intermediate peers also store document and augment their routing tables

14 14 Freenet – Query Routing (2) Routing Table B: 14, 20 Doc Cache 19, 30 AB C D Routing Table C: 19, 30 D: 45, 51 Doc Cache 14,20 Routing Table B: 14, 20 X: 47, 60 Doc Cache 5, 89 Routing Table B: 14, 20 Z: 105, 110 Doc Cache 17, 45, 51, 102, 205 1. Query for doc 17 3. C has no match -> backtrack 2. Forward to best match 4. Forward query to 2 nd best match 5. Send back doc 17 Routing Table C: 19, 30 D: 17, 45, 51 Doc Cache 14, 17, 20 6. Route back response Routing Table B: 14, 17, 20 X: 47, 60 Doc Cache 5, 17, 89

15 15 Freenet – Document Insert analogous to query routing insert is routed to the peer most likely to be interested in new doc (closest matching ID) intermediate peers cache document and augment routing tables until TTL is reached

16 16 Freenet - Discussion efficient routing algorithm (compared to Gnutella) adequate security features/heuristics (the more popular a document, the more frequently it gets cached) no metasearch no updates, deletes possible worst case query routing = DFS

17 17 FUtella – Concepts peer-to-peer platform for general knowledge sharing tries to model learning style of humans content-based routing combines and extends approaches from: –Gnutella (message format) –JXTA (peer groups) –JXTA Search (queryspaces and registrations) –FreeNet (routing of registration discoveries)

18 18 FUtella - Knowledge Groups E MiMi M1M1... Group Head: Peer E Members M 1 - M i FUtella Net Knowledge Group: Queryspace "Computer Architecture" Inserts Registration

19 19 FUtella - Knowledge Group Discovery 1 Routing Table "computer" -> B "computer analysis" -> Y Registration Cache "computer": B "computer analysis": Y AB C D Routing Table "computer analysis" -> C "computer systems" -> D "data base" -> A Registration Cache "computer analysis" : Y "computer systems": Z "data base" : X Routing Table "computer" -> B "data base" -> X Registration Cache "computer": B "data base": X Routing Table "computer" -> B "computer systems" -> Z "computer architecture" -> E Registration Cache "computer systems": Z "computer": B "computer architecture": E 1. Discovery request "computer architecture" 3. C has no cached registration for "computer architecture -> backtrack 2. Forward discovery request 4. Forward discovery request to 2 nd best match

20 20 FUtella - Knowledge Group Discovery 2 ABD Routing Table "computer analysis" -> C "computer architecture" -> D "computer systems" -> D "data base" -> A Registration Cache "computer analysis" : Y "computer architecture": E "computer systems": Z "data base" : X Routing Table "computer" -> B "computer architecture" -> D "data base" -> X Registration Cache "computer": B "computer architecture": E "data base": X Routing Table "computer" -> B "computer systems" -> Z "computer architecture" -> E Registration Cache "computer systems": Z "computer": B "computer architecture": E 5. Discovery response Containing registration "computer architecture": E 6. Forward discovery response

21 21 Futella - Query Processing AB C D 1. Discovery request "computer architecture" 2. Forward discovery request 3. C has no cached registration for "computer architecture -> backtrack 4. Forward discovery request to 2 nd best match 5. Discovery response containing cached registration 6. Forward discovery response EM1M1 MiMi...... 8. Forward query to member 9. Query response Knowledge group "computer architecture" 7. Send query

22 22 Futella - Test Results (1) Total Number of Messages dynamic peersstatic peers semi-dynamic peers 0 50000 100000 150000 200000 250000 # msg threshold 2 no threshold Gnutella

23 23 FUtella - Test Results (2)

24 24 Conclusion first and second generation P2P systems still most widely used practically proven very flexible in terms of topology bad scalibility (Gnutella) no guaranteed lower bound on query effort (Freenet) (scientificly) far better approach: DHTs (see next presentation)

25 25 Questions ? ?


Download ppt "1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn."

Similar presentations


Ads by Google