Characteristics of Current P2P File-Sharing Systems (with a brief excursion into network measurement tools) Stefan Saroiu P. Krishna Gummadi Steven Gribble.

Slides:



Advertisements
Similar presentations
A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Evaluation of a Scalable P2P Lookup Protocol for Internet Applications
Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck.
Incentives Build Robustness in BitTorrent Bram Cohen.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
MMCN 19 Jan 2005 Ooi Wei Tsang Peer-to-Peer Streaming.
1 Turning Heterogeneity into an Advantage in Overlay Routing Gisik Kwon Dept. of Computer Science and Engineering Arizona State University Published in.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
End-to-end Asymmetric Link Capacity Estimation Ling-Jyh Chen, Tony Sun, Guang Yang, M.Y. Sanadidi, Mario Gerla Dept. of Computer Science, University of.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
15-441: Computer Networking Lecture 26: Networking Future.
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
CSE331: Introduction to Networks and Security Lecture 14 Fall 2002.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao Cisco Systems, Inc. (Joint work with Christine Lv, Edith Cohen, Kai Li and Scott Shenker)
Measuring and Analyzing the Characteristics of Napster and Gnutella Hosts S. Saroiu, P. Gummadi, and S. Gribble Multimedia Systems Journal Volume 8, Issue.
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Worm Defense. Outline  Internet Quarantine: Requirements for Containing Self-Propagating Code  Netbait: a Distributed Worm Detection Service  Midgard.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
A Study on Mobile P2P Systems Hongyu Li. Outline  Introduction  Characteristics of P2P  Architecture  Mobile P2P Applications  Conclusion.
Network Topologies.
Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.
Presentation by Manasee Conjeepuram Krishnamoorthy.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
P2P Architecture Case Study: Gnutella Network
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
Exploring VoD in P2P Swarming Systems By Siddhartha Annapureddy, Saikat Guha, Christos Gkantsidis, Dinan Gunawardena, Pablo Rodriguez Presented by Svetlana.
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
Thesis Proposal Data Consistency in DHTs. Background Peer-to-peer systems have become increasingly popular Lots of P2P applications around us –File sharing,
Global NetWatch Copyright © 2003 Global NetWatch, Inc. Factors Affecting Web Performance Getting Maximum Performance Out Of Your Web Server.
A P2P file distribution system ——BitTorrent Pegasus Team CMPE 208.
Vulnerabilities in peer to peer communications Web Security Sravan Kunnuri.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
SProbe: Another Tool for Measuring Bottleneck Link Bandwidth Stefan Saroiu P. Krishna Gummadi Steven Gribble University of Washington.
Peer Pressure: Distributed Recovery in Gnutella Pedram Keyani Brian Larson Muthukumar Senthil Computer Science Department Stanford University.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
Networking Fundamentals. Basics Network – collection of nodes and links that cooperate for communication Nodes – computer systems –Internal (routers,
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Peer to Peer Network Design Discovery and Routing algorithms
1 A Measurement Study of Peer-to-Peer File Sharing Systems by Stefan Saroiu P. Krishna Gummadi Steven D. Gribble Presentation by Nanda Kishore Lella
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Hakim Weatherspoon CS294-4: Peer-to-Peer Systems By Stefan Saroiu, P. Krishna Gummadi,
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
Geethanjali College Of Engineering and Technology Cheeryal( V), Keesara ( M), Ranga Reddy District. I I Internal Guide Mrs.CH.V.Anupama Assistant Professor.
Adlib : A Self-Tuning Index for Dynamic Peer-to-Peer Systems Proceedings of the 21st International Conference on Data Engineering(ICDE'05) Prasanna Ganesan.
SDN and Security Security as a service in the cloud
A Measurement Study of Peer-to-Peer File Sharing Systems
Overlay Networking Overview.
A Measurement Study of Napster and Gnutella
An Overview of Peer-to-Peer
EE 122: Lecture 22 (Overlay Networks)
Algorithms for Selecting Mirror Sites for Parallel Download
#02 Peer to Peer Networking
Presentation transcript:

Characteristics of Current P2P File-Sharing Systems (with a brief excursion into network measurement tools) Stefan Saroiu P. Krishna Gummadi Steven Gribble University of Washington

Peer-to-Peer Frenzy Both research and industrial excitement –CAN, Chord, Past, Tapestry, JXTA, Farsite, Publius, Morpheus, AudioGalaxy Basic Premise –wide-area, distributed system –voluntary, ad-hoc, dynamic home-user peers exchange information (mostly large files) Many proposals, yet nobody knows the participating peers’ characteristics and behavior

SS SS napster.com P P P P P P Q R D P P P P P P P Q Q Q Q Q D R P S peer server Q R D response query file download NapsterGnutella R Napster & Gnutella

Methodology 2 stages: 1.periodically crawl Gnutella/Napster discover peers and their metadata 2.feed output from crawl into measurement tools: bottleneck bandwidth – SProbe latency – SProbe peer availability – LF degree of content sharing – Napster crawler

Network Bandwidth Scenarios Network measurements Dynamic server/peer selection P2P overlay formation –or application-level multicast Placement of content replicas

Network Bandwidth 1.Throughput: –number of transferred bytes during a fix interval of time 2.Available bandwidth: –the maximum attainable throughput of a newly started flow 3.Bottleneck bandwidth: –maximum throughput ideally obtained across the slowest link Hard to measure: –throughput, available bandwidth Easier to measure: –bottleneck bandwidth

One-Packet Model slope = bandwidthbottleneck 1 probing packet Traversal Time Packet Size

Packet-Pair Model bottleneck bandwidth time dispersion proportional to bottleneck bandwidth Δt sizepacket bandwidthbottleneck 

Vital Properties of an Ideal Tool Accurate Fast: –1 min/measurement too slow Scalable: –flooding the network will not work Works in Uncooperative Environments –can’t deploy software at both endpoints

Properties of an Ideal Tool Active: –existent traffic might not be suitable TCP/UDP based: –ICMP heavily filtered Cross-traffic resilient: –should detect and give up in the face of cross traffic Works on Asymmetric Paths Flexible to Bandwidth Changes Controlled Evaluations

Current Tools Desired Properties Path- char pcharclinkbprobepathrateNettimerSProbe Accurate Fast Uncooperative Environments * Scalable TCP/UDP Active Cross-traffic * Asymmetric Bandwidth changes Controlled Evaluations

SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

SProbe Uses TCP Tricks From local host To remote host –No cooperation needed LocalRemote SYN packet RST packet

SProbe Uses TCP Tricks From remote To local –Involuntary cooperation of application layer LocalRemote (Web) HTTP Get request Data packet ACK (last data packet)

SProbe’s Accuracy

More SProbe Bottleneck Bandwidth Latency Availability (LF): –send a SYN packet –receive: SYN/ACK – host active RST – host inactive, but online nothing – host offline

P2P Characteristics How many peers are “server-like”? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?

P2P Characteristics How many peers are “server-like”? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?

Higher Downstream Bandwidths

Most Peers have Cable Modem-like Bandwidths

Yes, Lots of Cable Modems

Closest 20% are 4X closer than furthest 20%

Two horizontal bands – East Coast and Transoceanic Links

Availability Period probes yield data like: start end

Availability Period probes yield data like: Divide into two periods Keep segments that: –start in 1 st period –end in 1 st or 2 nd periods –draw conclusion only on segments no larger than 2 nd period start end 12 hours

Median Session is about one hour (same for both systems)

Gnutella/Napster Uptime

P2P Characteristics How many peers are “server-like”? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?

Who Has the Files?

Correlation of Free-Riding with B/W

P2P Characteristics How many peers are “server-like”? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?

It’s all about incentive!

Lack of Knowledge is Universal

P2P Characteristics How many peers are “server-like”? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?

Power-Law Networks are here to Stay Barabasi and Albert showed that networks which… –grow by continuous addition of new nodes –exhibit preferential attachment (likelihood of connecting to a node depends on the node’s degree) …power-law distribution of vertex degree Internet, WWW, Gnutella

Resilience to Failures Power-law networks (Cohen et al.): –very resilient in face of random node failures a giant spanning cluster still exists –fairly resilient in face of cascading failures –very vulnerable in face of orchestrated attacks (towards high-degree nodes)

Gnutella Fri Feb 16 05:21:52-05:23:22 PST1771 hosts Popular sites: adams a.Stanford.EDU

30% random failures 1771 – 471 – 294 hostsFri Feb 16 05:21:52-05:23:22 PST

4% orchestrated failures Fri Feb 16 05:21:52-05:23:22 PST hosts

Discussion Heterogeneity: –3 orders of magnitude of bandwidth 50Kbps-100Mbps –6 orders of magnitude of latency 10us-10s –>4 orders of magnitude in availability 1%-99.99% Peers should not be treated as equals

Cooperating, Well-Behaved Peers Incentive: –game-theoretic approaches of enforcing local behavior for global benefit System enforcement: –peers can: measure each others characteristics (SProbe) enforce the reported ones –a reported 56Kbps peer should not download content at higher speed

Feedback to Current Proposals CAN, Chord, Past: –great memory and lookup algorithms: log(N) time and space –at the price of maintaining rigid network structure: hypercubes, butterflies, Plaxton trees –unclear how network structure is maintained given heterogeneity and dynamics of peers Conjecture – these networks will have a hard time stabilizing: –will need lots of routine, maintenance traffic

Instead Gnutella… Easy join procedure: –this simplicity gave Gnutella its power-law shape Easy to implement protocol (broadcast) Lots of maintenance traffic already –although the protocol has become smarter with its subsequent versions Searching is a nightmare

Document Popularity Follows Zipf distribution –long-tailed Popular documents become more popular with Napster/Gnutella Currently, need to resubmit queries in the hope that someone will answer Wish-list based system

Wide-area Network Measurements Sending a few packets can be identified with hostile behavior Even a few SYN packets are sufficient to trigger software firewalls –dialogue box pops up – possible scan from washington.edu, click OK or Cancel Many confused, angry, threatening s sent to many people (security, root, Ed): –active Internet measurements are not simple to perform

Excerpt from “Thank you for your reply. Unfortunately, I did not authorise anybody from washington.edu to attempt to crack into my computer. Attempting to break into computers is a crime in Australia. Please advise the names and contact details of the people involved in this "research" so that I can contact the Australian Federal Police, who will no doubt contact your Federal Bureau of Investigation to investigate this incident and institute criminal proceedings against those concerned.”

Current Work Quantify and show that current proposals are too rigid for Napter/Gnutella-like peers dynamics Wish-list, delayed exchange system –big distributed scheduling problem SGet –a downloading tool with automatic server selection –no bandwidth is wasted

Questions? Beautiful Sieg Hall “Pride of UW”