1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

Slides:



Advertisements
Similar presentations
A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.
Advertisements

Characterizing Overlay Topologies & Dynamics in Peer-to-Peer Networks Daniel Stutzbach, Reza Rejaie University of Oregon Subhabrata Sen AT&T Labs IEEE.
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Peer to Peer and Distributed Hash Tables
Rarest First and Choke Algorithms are Enough Arnaud LEGOUT INRIA, Sophia Antipolis France G. Urvoy-Keller and P. Michiardi Institut Eurecom France.
The BitTorrent Protocol. What is BitTorrent?  Efficient content distribution system using file swarming. Does not perform all the functions of a typical.
Amir Rasti Reza Rejaie Dept. of Computer Science University of Oregon.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Identity and search in social networks Presented by Pooja Deodhar Duncan J. Watts, Peter Sheridan Dodds and M. E. J. Newman.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
1 CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Measurement Studies Lecture 23 Reading: See links on website All Slides © IG.
Peer to Peer (P2P) Networks and File sharing. By: Ryan Farrell.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
Fresh Analysis of Streaming Media Stored on the Web Rabin Karki M.S. Thesis Presentation Advisor: Mark Claypool Reader: Emmanuel Agu 10 Jan, 2011.
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
Improving Lookup Performance over a Widely-Deployed DHT Daniel Stutzbach Reza Rejaie The ION P2P Project University of.
 We developed a fast and tunable crawler, Cruiser.  Cruiser uses a master-slave architecture, parallel crawling, and leverages the two-tier topology.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
1 Denial-of-Service Resilience in P2P File Sharing Systems Dan Dumitriu (EPFL) Ed Knightly (Rice) Aleksandar Kuzmanovic (Northwestern) Ion Stoica (Berkeley)
Characterizing the Two-Tier Gnutella Topology  Gnutella, FastTrack, and eDonkey use two-tier overlay topologies.  Our initial study focuses on Gnutella.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Kyushu University Graduate School of Information Science and Electrical Engineering Department of Advanced Information Technology Supervisor: Professor.
Measuring and Analyzing the Characteristics of Napster and Gnutella Hosts S. Saroiu, P. Gummadi, and S. Gribble Multimedia Systems Journal Volume 8, Issue.
Understanding Mesh-based Peer-to-Peer Streaming Nazanin Magharei Reza Rejaie.
Understanding Churn in Peer-to-Peer Networks Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Internet Measurement Conference.
Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
On Unbiased Sampling for Unstructured Peer-to-Peer Networks Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Nick Duffield –
Prof. Reza Rejaie Computer & Information Science University of Oregon Winter 2003 An Overview of Internet Multimedia Networking.
Amir Rasti Daniel Stutzbach Reza Rejaie The ION P2P Project University of Oregon On the Long-term Evolution of the Two-Tier.
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Measurements of Peer-to-Peer Systems Pradnya Karbhari Nov 25 th, 2003 CS 8803: Network Measurements Seminar.
Presentation by Manasee Conjeepuram Krishnamoorthy.
P2P File Sharing Systems
Introduction Widespread unstructured P2P network
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.
COCONET: Co-Operative Cache driven Overlay NETwork for p2p VoD streaming Abhishek Bhattacharya, Zhenyu Yang & Deng Pan.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Peer-to-Peer Networks University of Jordan. Server/Client Model What?
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
ACM NOSSDAV 2007, June 5, 2007 IPTV Experiments and Lessons Learned Panelist: Klara Nahrstedt Panel: Large Scale Peer-to-Peer Streaming & IPTV Technologies.
MULTI-TORRENT: A PERFORMANCE STUDY Yan Yang, Alix L.H. Chow, Leana Golubchik Internet Multimedia Lab University of Southern California.
Multimedia Computing & Networking Shanyu Zhao, Daniel Stutzbach, Reza Rejaie Multimedia & Internetworking Research Group (Mirage) Computer & Information.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Content Distribution in Unstructured Peer-to-Peer Networks Daniel Stutzbach Committee Members: Professor Reza Rejaie Professor Ginnie Lo Professor Art.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
"A Measurement Study of Peer-to-Peer File Sharing Systems" Stefan Saroiu, P. Krishna Gummadi Steven D. Gribble, "A Measurement Study of Peer-to-Peer File.
Sampling Techniques for Large, Dynamic Graphs Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Nick Duffield – AT&T Labs—Research.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
School of Electrical Engineering &Telecommunications UNSW Cost-effective Broadcast for Fully Decentralized Peer-to-peer Networks Marius Portmann & Aruna.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Malugo – a scalable peer-to-peer storage system..
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
Large-Scale Monitoring of DHT Traffic Ghulam Memon – University of Oregon Reza Rejaie – University of Oregon Yang Guo – Corporate Research, Thomson Daniel.
P2P Networking: Freenet Adriane Lau November 9, 2004 MIE456F.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
PEER-TO-PEER NETWORK FAMILIES
Copyright notice © 2008 Raul Jimenez - -
Distributed Network Traffic Feature Extraction for a Real-time IDS
EE 122: Peer-to-Peer (P2P) Networks
A Measurement Study of Napster and Gnutella
CS 162: P2P Networks Computer Science Division
The BitTorrent Protocol
Presentation transcript:

1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia Computing and Networking 2006 (MMCN’06), 18-19th January 2006 San Jose, California, USA

2 Outlines Measurement study of modern Gnutella system Conduct static, topological and dynamic analysis Help to improve design and evaluations of P2P file-sharing applications

3 Previous studies Focus on a small population Be more than three years old Not examine dynamics of file characteristics over time and correlation between the overlay topology and file distribution

4 Why Gnutella Top three (eDonkey2K, FastTrack, Gnutella) Gnutella has Browse-Host extension to extract the list of shared files from peers One of most studied P2P systems; compare and contrast with previous studies

5 Original Gnutella A new node joins the system (Node A) Node A connects to some node (Node B) by pre- existing list, a particular website, IRC and etc Node B sends its working nodes to Node A Node A connects provided nodes till certain threshold During search, Node A sends requests to connected nodes which in turn forward requests

6 Original Gnutella Nodes reply the request directly or indirectly depending on the firewall existence Node A downloads file pieces from one ore more positive nodes Unlike Napster, Gnutella is decentralized; flood-based searches

7 Modern Gnutella Contrast to unstructured overlay topology, most modern Gnutella clients adopt a two-tier overlay structure Ultrapeers and leaf peers (majority) Legacy peers (not implement ultrapeer feature)

8 Measurement methodology Problems of general crawlers Slow, distorted, inflate population Previous studies Partial snapshot, periodic probe of a fixed group Significance is doubted Goal of this work Capture entire population (?) Short period

9 Measurement methodology Topology crawl List of neighboring nodes Content crawl List of available files of each node Need more

10 Cruiser Parallel P2P crawler Orders of magnitude faster than previous crawlers (?) Master-slave architecture Slave crawls hundreds of peers and master coordinates multiple slaves Increase degree of concurrency

11 Cruiser Using 6 off-the-shelf 1GHz GNU/Linux boxes, crawl takes 15min + 5.5hr + 15min ~ 6 hours Each content crawl takes 10GB log file containing file name and content hash

12 Dataset Three measurement periods; within each period, take snapshots everyday 6/8/2005-6/18/2005, 8/23/2005-9/9/2005 and 10/11/ /21/2005 Examine both short and long timescales

13 Dataset

14 Sources of unreachable nodes Firewall Severe network congestion Peer departed Not support Browse Host protocol Ultrapeers: depart Leaf peers: depart and firewall Contact 20% peers (~half a million)

15 Problems Low-bandwidth TCP connection Some crawls do not complete after the timeout threshold, as they are sent at extremely low rate File identity File name is not a reliable file identifier; so this work use content hash Post-processing More than 100 million distinct files Divide into 7 segments randomly, trim files of less than 10 copies in a segment, combine trimmed back to one

16 Static analysis Ratio of free riders Degree of resources sharing among cooperative peers File popularity distribution File type analysis

17 Ratio of free riders Free riders drop, ratio of ultrapeers is lower, long-lived peers slightly higher, # files not strongly correlate

18 Degree of resources sharing among cooperative peers Distribution of # peers sharing x files – power-law distribution

19 Degree of resources sharing among cooperative peers Distribution of contributed disk space – power-law distribution

20 Degree of resources sharing among cooperative peers Correlation not as strong as previous studies Discernable line with slope 3.7MB/file which is typical size of MP3 audio file

21 File popularity distribution

22 File type analysis

23 File type analysis Previous studiesCurrent studies Music67.2% files 79.2% bytes 67% files 40% bytes Video2.1% files 19.1% bytes 6% files 52.5% bytes

24 Topological analysis Per-file perspective – figure a & b Per-peer perspective – figure c

25 Topological analysis Churn (dynamics of peer participation) is dominant factor Depart Join Leaf peers become ultrapeers Rapid change in overlay topology prevents formation of topological clustering

26 Dynamics analysis Variations in shared files by individual peers Variations in popularity of individual files Trends in popularity variations

27 Variations in shared files by individual peers

28 Variations in popularity of individual files Focus on top 100 and top 1000 files

29 Trends in popularity variations Track top 10 files across several days (fig a & b) Over several months (fig c)

30 Conclusion Use parallel crawl to obtain snapshots of peer connectivity and available files Conduct three types of analysis Understand the distribution, correlation and dynamics of available files

31 Summary of findings Free riding significantly drops # shared files and contributed storage space by individual peers follow power-law distribution  most peers contribute little disk space (<100MB) while small # peers contribute very large space (50-100GB) Popularity of individual files follow Zipf distribution  small # files are extremely popular but majority of files are very unpopular

32 Summary of findings Most popular file type is MP3 file (2/3 of all files, 1/3 of all bytes) Popularity and occupied space by video files has tripled over past few years # video files < 1/10 of audio files but occupy 25% more bytes 93% of bytes or 73% of files are multimedia files

33 Summary of findings Files are randomly distributed; no strong correlation between the available files at peers that are one, two or three hops apart in overlay topology Shared files by individual slowly change over timescale of days; more popular files experience larger variations in popularity