Mapping the Gnutella Network Presented By: Tony Young M.Math Candidate October 7th, 2004.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
1 Denial-of-Service Resilience in P2P File Sharing Systems Dan Dumitriu (EPFL) Ed Knightly (Rice) Aleksandar Kuzmanovic (Northwestern) Ion Stoica (Berkeley)
Building Low-Diameter P2P Networks Eli Upfal Department of Computer Science Brown University Joint work with Gopal Pandurangan and Prabhakar Raghavan.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
1 Unstructured Routing : Gnutella and Freenet Presented By Matthew, Nicolai, Paul.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Gnutella, Freenet and Peer to Peer Networks By Norman Eng Steven Hnatko George Papadopoulos.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
1 Freenet  Addition goals to file location: -Provide publisher anonymity, security -Resistant to attacks – a third party shouldn’t be able to deny the.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Presentation by Manasee Conjeepuram Krishnamoorthy.
P2P File Sharing Systems
Freenet: A Distributed Anonymous Information Storage and Retrieval System Presentation by Theodore Mao CS294-4: Peer-to-peer Systems August 27, 2003.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
P2P Architecture Case Study: Gnutella Network
Chapter 4: Managing LAN Traffic
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
Gnutella2: A Better Gnutella?
Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
03/19/02Scalab Seminar Series1 Mapping the Gnutella Network Macroscopic Properties of Large Scale P2P Systems Ramaswamy N.Vadivelu Scalab, ASU.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
Peer Pressure: Distributed Recovery in Gnutella Pedram Keyani Brian Larson Muthukumar Senthil Computer Science Department Stanford University.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
P2PComputing/Scalab 1 Gnutella and Freenet Ramaswamy N.Vadivelu Scalab.
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
A Reputation-Based Approach for Choosing Reliable Resources in Peer-to-Peer Networks E. Damiani S. De Capitani di Vimercati S. Paraboschi P. Samarati F.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design Authors: Matei Ripeanu Ian Foster Adriana.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
Spring 2000CS 4611 Routing Outline Algorithms Scalability.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Ad Hoc On-Demand Distance Vector Routing (AODV) ietf
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
Mobile IP THE 12 TH MEETING. Mobile IP  Incorporation of mobile users in the network.  Cellular system (e.g., GSM) started with mobility in mind. 
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
An example of peer-to-peer application
CHAPTER 3 Architectures for Distributed Systems
Presentation by Theodore Mao CS294-4: Peer-to-peer Systems
Unstructured Routing : Gnutella and Freenet
Presentation transcript:

Mapping the Gnutella Network Presented By: Tony Young M.Math Candidate October 7th, 2004

Outline Introduction Gnutella in Depth The Crawler Analysis of Network Summary and Improvements Paper Review

Outline Introduction Gnutella in Depth The Crawler Analysis of Network Summary and Improvements Paper Review

Introduction Peer to peer systems have recently exploded onto the internet scene Two main contributing factors: Low cost and high availability of resources (computing and storage) Increased network connectivity (proliferation of “always on” connections)

Introduction Peer systems build a virtual topology (overlay) with its own routing mechanisms The topology of the overlay and routing protocols directly affects Performance: Number of physical hops to send a message through virtual overlay Reliability: Will a message actually reach the other end Scalability: Can other nodes be added while keeping performance good Anonymity: Can we protect the identity of nodes in the network

Introduction Gnutella is studied in depth and analysis is performed to determine how the overlay affects the four characteristics previously mentioned Started by capturing the network topology and behaviour Performed a macroscopic analysis of the network to evaluate costs and benefits Investigated possible improvements

Introduction Two questions drive analysis What is the connectivity structure of Gnutella? How well does the Gnutella overlay map to the actual network topology?

Introduction Connectivity Structure Networks as diverse as natural networks usually have a few well connected nodes and many poorly connected nodes I.e. Power Law Networks We will see Gnutella is not a pure power law network, but still has good fault tolerance and is less resistant to DoS attacks

Introduction Overlay Topology Important for ISP’s: overlays that don’t map closely to the physical topology adds additional stress on the infrastructure and costs ISP’s more money Scalability is directly linked to efficient use of network resources

Outline Introduction Gnutella in Depth The Crawler Analysis of Network Summary and Improvements

Gnutella in Depth Gnutella is an open protocol It is decentralized and unstructured Allows group membership and searching of available files for download Gnutella should operate in a dynamic environment where hosts can join/leave at any time Gnutella should experience good performance and scalability External attacks should not cause data loss or performance degradation Users seeking or providing unpopular material should stay anonymous

Gnutella in Depth Gnutella nodes are called “servents” (SERVer-cliENTS) Provide a client-side interface to allow searching of file base Provide server-side storage, routing and response to network messages and requests

Gnutella in Depth To connect, a node contacts an “always on” host (I.e. gnutella.com) and sends a PING Node replies with a PONG and forwards the PING on to other nodes in the network who reply with PONG messages and forward the PING on PING stops after TTL hops

Gnutella in Depth To find files, users submit QUERY messages to other nodes Messages are broadcast to all neighbours who forward them on to other neighbours, etc. for TTL hops QUERY RESPONSE messages are returned to the querying node

Gnutella in Depth To download a file, nodes send GET and PUSH messages to individual hosts holding a file I.e. transfer requests and transfers are routed directly between communicating hosts, and not back-propagated

Gnutella in Depth Messaging protocol has three important features TTL and “hops passed” fields are attached to each message Randomly generated message ID is attached to each message Each node keeps track of recently routed messages to prevent re-broadcasting and to implement back- propagation

Gnutella in Depth PING message contains the host address and name, number of files and size of data store PONG message contains the same information from the host that received the PING

Gnutella in Depth PING messages propagate until TTL has expired Hop count incremented at each servent receiving the PING Message propagates until hop count = TTL PONG messages are back-propagated (I.e. sent on the reverse path that the original message followed) to the host initiating the PING

Gnutella in Depth QUERY messages are sent the same way as a PING message Nodes check the search string requested against the names of their locally stored files QUERY RESPONSE messages are back-propagated to the querying node and include information necessary to download the file

Outline Introduction Gnutella in Depth The Crawler Analysis of Network Summary and Improvements Paper Review

The Crawler In order to conduct the network tests, a crawler was developed to gather information about the virtual topology Crawler starts with a list of active nodes and sends a PING message to each of them PONG messages are received and the IP, port, number of stored files and size of archive are stored in a table PING propagates to other nodes and PONG back propagates to crawler

The Crawler A sequential version of the crawler was initially developed I.e. send a PING with an empirically determined optimal TTL to a set of nodes; resend to the nodes where the PING stops, etc. Proved to be very slow: 50 hours to collect data from a node network Slowness means two things: Not scalable: Will get slower as we add more nodes Does not give an accurate network snapshot: network changes drastically over 50 hours!

The Crawler A distributed crawler was developed next Client-Server architecture Server maintains node list and creates a network graph Clients receive a list of nodes to contact and discover neighbours for Decided to use only 50 clients at once Reduces invasiveness of search and consumption of network resources Reduced crawling time to a couple of hours for a large initial list and a network of nodes

The Crawler Network membership is defined as follows A node is a member of the network if the crawler is able to connect to it A node might be excluded from network membership if it was reported as active by a server or other node, but the crawler could not contact it This might happen if nodes go offline before the crawler can contact them

Outline Introduction Gnutella in Depth The Crawler Analysis of Network Growth Trends Traffic Estimates Connectivity and Reliability Overlay vs. Topology Summary and Improvements Paper Review

Analysis of Network Data was collected over a 6 month period Data shows: Overhead traffic is reducing Traffic volume is a significant barrier to growth

Growth Trends Size of network is growing rapidly Largest connected component in November 2000 had neighbours Largest connected component in May 2001 had neighbours! Number of neighbours for the largest connected component has grown 25 times!

Growth Trends

Despite the explosive growth, most nodes are not connected long Successive crawls of the network found: 40% of nodes leave the network in less than 4 hours 25% of nodes are alive for more than 24 hours

Traffic Estimates A modified version of the crawler recorded traffic generated across one randomly chosen link 36% of total traffic (in bytes) is user generated QUERY messages 55% is group membership (PING/PONG) messages 9% is non-standard or malformed messages N.B. File transfer traffic is excluded

Traffic Estimates After June 2001 (when new Gnutella implementation was released) 92% of total traffic (in bytes) was QUERY messages 8% is group membership (PING/PONG) messages N.B. File transfer traffic is excluded

Traffic Estimates 95% of all nodes are reachable within 7 hops. Thus, each message typically uses a TTL = 7 Most links are expected to support similar amounts of traffic for these reasons As verified empirically, the total Gnutella generated traffic is proportional to the number of connections in the network However, the average number of connections per node stays relatively constant as the network grows

Traffic Estimates

The total traffic estimate for the Gnutella network is 1 Gbps I.e connections for a node network times 6 kbps per connection This is approximately 330 TB/month! Excluding file transfers!

Traffic Estimates This total is 1.7% of the total internet traffic in US backbones in December 2000 This volume of traffic is believed to be an obstacle to further growth The underlying network topology must be used more efficiently to allow scaling and wider deployment

Connectivity and Reliability Note: Nodes decide locally: How many connections to support When to add or drop a connection Recent research shows that many natural systems organize themselves into “power law networks” I.e. networks where a few nodes are well connected and most nodes have very few connections

Connectivity and Reliability Power law networks: Number of nodes with L links (connections) is proportional to L -k where k is system- dependent Resilient to losing many poorly connected nodes Falls apart quickly if only a few well connected nodes are lost Extremely robust to random failures, but vulnerable to targeted attacks

Connectivity and Reliability Power law networks appear as a linear system on a log-log plot Data for December 2000 shows that early Gnutella networks were power law Data for March 2001 shows that later Gnutella networks are a mixture There are a constant number of nodes with fewer than 10 links Above 10 links, nodes follow a power law structure

Connectivity and Reliability

Why did the distribution change? Two possible reasons: About 20% of Gnutella users have modem connections - DSL and up can support more connections Gnutella users run as many connections as their network can support - perception is that more connections = better query results

Connectivity and Reliability Does the change in distribution affect reliability? Yes! Preserves resilience to random failures Makes network less dependent on well connected nodes and hence less prone to DoS attacks

Overlay vs. Topology Peer systems change the way bandwidth is used on the internet Servers are at the edge of the network now, and peers are constantly downloading Most ISP’s use flat-rate billing Peer systems may break this model!

Overlay vs. Topology Due to the amount of traffic peer systems generate, efficient use of resources is important The greater the mismatch between the overlay and the physical network topology, the more messages need to be transmitted to route information from A to B This means more stress on the network resources

Overlay vs. Topology Communication from A to all other nodes requires one message over the D - E link

Overlay vs. Topology Communication from A to all other nodes requires six messages over the D - E link

Overlay vs. Topology How well does Gnutella map to the topology? Assume that domain names are roughly evident of the hierarchy of the internet Check how well generated traffic maps to the cluster of domain names found by the crawler

Overlay vs. Topology After analysis of 10 overlays, it was found that Gnutella nodes often connect to peers outside of their respective domains Thus, it appears that Gnutella does not make efficient use of the underlying topology

Outline Introduction Gnutella in Depth The Crawler Analysis of Network Summary and Improvements Paper Review

Summary and Improvements Gnutella has a multimodal connectivity distribution that is partially constant and partially power law Network is resilient to random failures Network is harder to attack by malicious parties, but not immune to DoS attacks Gnutella makes little effort to ward off attackers E.g. topology, connectivity and traffic information is easy to obtain and can be used to plan attacks

Summary and Improvements Gnutella’s traffic volume is a significant fraction of all internet traffic Makes the future growth of the network reliant on efficient use of the topology Gnutella’s overlay does not match the network topology very well This increases quite substantially the number of messages and the amount of network traffic generated

Summary and Improvements Necessary improvements Make efforts to hide overlay and connectivity information (encryption?) Match overlay more closely with topology Limits to growth must be solved first and fast at the rate that Gnutella is growing

Summary and Improvements Suggested Improvements Exploit locality of files and query distribution (I.e. caching and localized queries) Replace query flooding strategy with something more efficient (I.e. superpeer routing and group communication)

Outline Introduction Gnutella in Depth The Crawler Analysis of Network Summary and Improvements Paper Review

Organization Some discussions of the Gnutella architecture and protocols were scattered throughout the paper Should have combined everything into a more logical order inside the protocol section Writing Style Generally very good. Some missing words and poor grammar

Paper Review Novel Ideas Presented a qualitative and quantitative analysis of the Gnutella network, and some important points for P2P as a whole Content Some backing information was missing Some claims were made without supporting evidence, or just referring the reader to another paper

Questions?