Improving Gnutella Willy Henrique Säuberli Seminar in Distributed Computing, 16. November 2005 Papers: I.Making Gnutella-like P2P Systems Scalable; SIGCOMM.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
INF 123 SW ARCH, DIST SYS & INTEROP LECTURE 12 Prof. Crista Lopes.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Intel Research Seattle Sylvia Ratnasamy, Lee Breslau, Scott Shenker, and Nick Lanham.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
1 Unstructured Routing : Gnutella and Freenet Presented By Matthew, Nicolai, Paul.
Making Gnutella-like P2P Systems Scalable Presented by: Karthik Lakshminarayanan Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, and Scott.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Wide-area cooperative storage with CFS
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
P2P Group Meeting (ICS/FORTH) Monday, 21 February, 2005 Making Gnutella-like P2P Systems Scalable (Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick.
1 - CS7701 – Fall 2004 Review of: Making Gnutella-like P2P Systems Scalable Paper by: – Yatin Chawathe (AT&T) –Sylvia Ratnasamy (Intel) –Lee Breslau (AT&T)
Introduction of P2P systems
1 BitHoc: BitTorrent for wireless ad hoc networks Jointly with: Chadi Barakat Jayeoung Choi Anwar Al Hamra Thierry Turletti EPI PLANETE 28/02/2008 MAESTRO/PLANETE.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Structuring P2P networks for efficient searching Rishi Kant and Abderrahim Laabid Abderrahim Laabid.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau (Several slides have been taken.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Lecture 2 Distributed Hash Table
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
"A Measurement Study of Peer-to-Peer File Sharing Systems" Stefan Saroiu, P. Krishna Gummadi Steven D. Gribble, "A Measurement Study of Peer-to-Peer File.
By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.
An overview of Gnutella
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Peer to Peer Network Design Discovery and Routing algorithms
Peer-to-peer systems (part I) Slides by Indranil Gupta (modified by N. Vaidya)
A Reputation-Based Approach for Choosing Reliable Resources in Peer-to-Peer Networks E. Damiani S. De Capitani di Vimercati S. Paraboschi P. Samarati F.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Peer-to-Peer File Sharing Systems Group Meeting Speaker: Dr. Xiaowen Chu April 2, 2004 Centre for E-transformation Research Department of Computer Science.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Chapter 29 Peer-to-Peer Paradigm Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
BitTorrent Vs Gnutella.
Peer-to-Peer Data Management
Peer-to-Peer and Social Networks
Early Measurements of a Cluster-based Architecture for P2P Systems
EE 122: Peer-to-Peer (P2P) Networks
GIA: Making Gnutella-like P2P Systems Scalable
Unstructured Routing : Gnutella and Freenet
Presentation transcript:

Improving Gnutella Willy Henrique Säuberli Seminar in Distributed Computing, 16. November 2005 Papers: I.Making Gnutella-like P2P Systems Scalable; SIGCOMM 2003 II.Peer-to-Peer Overlays: Structured, Unstructured, or Both? MSR-TR III.Should We Build Gnutella on a Structured Overlay? HotNets-II 2004

2 Motivation In the spring of 2000, when Gnutella was a hot topic on everyone's mind, a concerned few of us in the open- source community just sat back and shook our heads. Something just wasn't right. Any competent network engineer that observed a running gnutella application would tell you, through simple empirical observation alone, that the application was an incredible burden on modern networks and would probably never scale. I myself was just stupefied at the gross abuse of my limited bandwidth, Jordan Ritter - Why Gnutella Can't Scale. No, Really.

3 Overview Systems –Gnutella 0.4 –Gnutella 0.6 –Pastry/DHT (Distributed Hash Table) Gia –Topology adaptation –Flow Control –One-hop Replication –Search Protocol –Evaluation Structural Gnutella –Overhead of maintaining structured/unstructured overlay –Overhead of queries in structured/unstructured overlay Conclusions

4 Gnutella 0.4 Original Gnutella Specification: Acquisition of addresses is not part of the protocol -> Host cache services predominant way TCP/IP connection to servant and ASCII string sent: GNUTELLA CONNECT/ \n\n Servant response GNUTELLA OK\n\n (anything else interpreted as rejection) Sending of any of Gnutella protocol descriptors -> file requests done over http requests

5 Gnutella 0.4 Gnutella Protocol descriptors: Descriptor Header: Possible descriptors: PING: empty payload (probe for servants) PONG: port, IP,#files,#KB (response to PING) QUERY: minimum speed, search criteria QUERYHIT: #hits, port, IP, speed, result set, servant identifier PUSH: servant identifier, file index, port, IP (if firewalled) Descriptor ID TTL Payload Descriptor Hops Payload length

6 Gnutella 0.4 Descriptor Routing PONG carried along same path like PING QueryHit carried along same path like Query PUSH carried along same path like QueryHit PING and Query forwarded to all connected servants, except the one that sent Servant decrements TTL and increments Hops field Servants avoids forwarding descriptors with ID already seen.

7 Gnutella 0.4 IP: Q QQ QQ Q Q Q Q Q Q Q H Q Q H Q H H Q Q Q H H Q H H

8 Gnutella 0.4 Problems 1.Flooding -> queries received several times 2.Churn -> high rate of joining and leaving 3.Node Overloading -> to much connections 4.No bootstrapping in protocol (mostly done central) 5.No load balancing -> queries, downloads

9 Overview Systems –Gnutella 0.4 –Gnutella 0.6 –Pastry/DHT (Distributed Hash Table) Gia –Topology adaptation –Flow Control –One-hop Replication –Search Protocol –Evaluation Structural Gnutella –Overhead of maintaining structured/unstructured overlay –Overhead of queries in structured/unstructured overlay Conclusions

10 Gnutella 0.6 The Ultra peer system has been found effective for this purpose. It is a scheme to have a hierarchical Gnutella network by categorizing the nodes on the network as leaves and ultra peers. A leaf keeps only a small number of connections open, and that is to ultra peers. An ultra peer acts as a proxy to the Gnutella network for the leaves connected to it. This has an effect of making the Gnutella network scale, by reducing the number of nodes on the network involved in message handling and routing, as well as reducing the actual traffic among them. RFC-Gnutella Chapter 2.3, Leaf Mode and Ultrapeer Mode

11 Gnutella 0.6 Improvements: GWebCache for addresses X-Try header (for rejected connection) host addresses stored in pong messages store addresses from QueryHit in local cache Nodes classified as Peers and Leaves

12 Gnutella 0.6 requirements for Ultrapeers: no firewall suitable operating system sufficient bandwidth sufficient uptime sufficient RAM and CPU

13 Overview Systems –Gnutella 0.4 –Gnutella 0.6 –Pastry/DHT (Distributed Hash Table) Gia –Topology adaptation –Flow Control –One-hop Replication –Search Protocol –Evaluation Structural Gnutella –Overhead of maintaining structured/unstructured overlay –Overhead of queries in structured/unstructured overlay Conclusions

14 Pastry/DHT peers distributed on Ring structure peers id computed with hash function of IP successor: next peer in id space predecessor: last peer in id space files matched to nodes with hash function Chord: id space of 2 b, e.g. b=128 additional pointer to all peers with address id+2 i, i=0..b-1

15 Pastry/DHT b =4 id =0..15 pred succ

16 Pastry: Routing table: Joining of node n: –join over node s –copy of s routing table –copy of i-th row of node n to message to nodes in row i Leaving: failure detection, copy value of neighbour Pastry/DHT R R R1 e.g. id= R0

17 Pastry/DHT Problem of DHT: failure causes loss of items and disconnection in ring -> each peer keeps list of log 2 (N) next nodes ->files replicated in successors not designed for heterogeneous network ->files distribution independent of capacity designed for exact word queries

18 Overview Systems –Gnutella 0.4 –Gnutella 0.6 –Pastry/DHT (Distributed Hash Table) Gia –Topology adaptation –Flow Control –One-hop Replication –Search Protocol –Evaluation Structural Gnutella –Overhead of maintaining structured/unstructured overlay –Overhead of queries in structured/unstructured overlay Conclusions

19 Gia Design Design: dynamic topology adaptation: Most nodes within short range of high capacity node active flow control avoid overloaded hot-spots one-hop replication all nodes maintain pointers to content of neighbours search protocol biased random walks directed to high-capacity nodes

20 Gia – Topology Adaptation Topology adaptation High capacity high degree (~supernodes) –level of satisfaction: Minimum/maximum number of connections prefer neighbours with higher capacity and lower degree drop neighbours with highest degree

21 Gia – Topology Adaptation

22 Gia - Flow control Flow Control peers periodically assign tokens to neighbours –queries only forwarded if token received -> overloaded nodes stop receiving queries –token proportionally to capacity -> more capacity, more queries can be sent -> more queries from nodes with high capacity -peers not using tokens are marked as inactive -> get less tokens

23 Gia – One-hop Replication One-hop Replication peers keep index of files at neighbours -> response to queries includes files at neighbour peers keep copy of files at neighbours -> paper tried to improve network structure and network querying. Copy of file would improve availability Query: smooth criminal? Smooth criminal! ? ? ? ? ? ? With One-hop Replication. Smooth criminal! Query: smooth criminal?? ?

24 Gia Search Protocol Search Protocol Random walk instead of flooding Query forwarded to neighbour with highest capacity Book-keeping of queries to avoid redundant paths –node remembers paths used –query only forwarded if MAX_RESPONSES not reached –addresses of nodes already mentioned in Query Hit attached to query

25 Evaluation Gia Reference Systems: FLOOD: search flooding network RWRT: Random Walk over Random Topology SUPER: nodes classified as normal or supernode

26 Evaluation Gia Gia Super Flood RWRT

27 Evaluation Gia Gia Super Flood RWRT

28 Evaluation Gia RWRT better than FLOOD, specially high replication factor Extremely low hop-counts at higher replication rate Performance of FLOOD decreases with system size

29 Evaluation Gia How to handle churn Failure in network may lead to loss of query –Keep-alive messages –query reissued if no keep alive-messages received –to avoid loss of queries do to adaptation, paths are kept for a while, to reroute queryHits

30 Gia Network is unstructured Why not DHTS/keep network unstructured? 1.P2P clients are extremely transient (ø 60 min.) 2.Keyword search more often than exact-match 3.Designed to improve query performance, but most queries are for hay not needle 4.DHT maps files to users (not a user decision) 5.Don‘t support complex queries 6.Don‘t cope with churn (high overhead for leaving)

31 Overview Systems –Gnutella 0.4 –Gnutella 0.6 –Pastry/DHT (Distributed Hash Table) Gia –Topology adaptation –Flow Control –One-hop Replication –Search Protocol –Evaluation Structural Gnutella –Overhead of maintaining structured/unstructured overlay –Overhead of queries in structured/unstructured overlay Conclusions

32 Structured overlay Gnutella 0.4 improved with Pastry network structure up to 32 peers in network table Boostrapping like in Pastry I‘m alive for failure Results Pastry maintains more neighbours overhead between 0.4(4) and 0.4(8) overhead grows with network size, but slowly overhead negligible for all systems

33 Structella - Maintenance

34 Structured overlay Gnutella 0.6 improved with Pastry network structure supernodes implemented in network –supernodes organized in pastry network –normal nodes attached randomly to supernodes Gia improved with Pastry network structure Builds network with pastry structure based on gia neighbour selection principles (satisfaction)

35 Superpasty - Maintenance

36 HeteroPastry -Maintenance

37 Overview Systems –Gnutella 0.4 –Gnutella 0.6 –Pastry/DHT (Distributed Hash Table) Gia –Topology adaptation –Flow Control –One-hop Replication –Search Protocol –Evaluation Structural Gnutella –Overhead of maintaining structured/unstructured overlay –Overhead of queries in structured/unstructured overlay Conclusions

38 Structured overlay Results presented only considered overhead for maintain structure. Explore advantages of structured overlays using querying advantages of Gia network structure helps avoiding that queries visit nodes several times route queries to nodes with higher capacity

39 Pastry – Query overhead

40 Pastry – Success rate

41 Structured overlay

42 HP/SP - success rate

43 HP/SP – Query delay

44 HP/SP – Query overhead

45 Conclusions Most work experimental –Gia introduces several techniques that help efficiency Problems to deal: –High rate of churn –High heterogeneity of nodes in bandwidth, query rate, CPU, RAM, availability –different configurations lead to different solutions Structures –not a solution, but may help improve efficiency Implementation for results on real network: –legal issues –highly distributed system –no control of single peers in real environment

46 Sources Original Gnutella 0.4 specification: RFC-Gnutella Pastry/DHT Jie Wu; Handbook on Theoretical and Algorithmic Aspects of Sensor, Ad Hoc Wireless, and Peer-to-Peer Networks, Chapter 39 Papers: –Making Gnutella-like P2P Systems Scalable. Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, S. Shenker –Peer-to-Peer Overlays: Structured, Unstructured, or Both? Miguel Castro, Manuel Costa and Antony Rowstron –Should We Build Gnutella on a Structured Overlay? M. Castro, M. Costa, A. Rowstron –Why Gnutella Can't Scale. No, Really. Jordan Ritter