Download presentation
Presentation is loading. Please wait.
1
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia Computing and Networking 2006 (MMCN’06), 18-19th January 2006 San Jose, California, USA
2
2 Outlines Measurement study of modern Gnutella system Conduct static, topological and dynamic analysis Help to improve design and evaluations of P2P file-sharing applications
3
3 Previous studies Focus on a small population Be more than three years old Not examine dynamics of file characteristics over time and correlation between the overlay topology and file distribution
4
4 Why Gnutella Top three (eDonkey2K, FastTrack, Gnutella) Gnutella has Browse-Host extension to extract the list of shared files from peers One of most studied P2P systems; compare and contrast with previous studies
5
5 Original Gnutella A new node joins the system (Node A) Node A connects to some node (Node B) by pre- existing list, a particular website, IRC and etc Node B sends its working nodes to Node A Node A connects provided nodes till certain threshold During search, Node A sends requests to connected nodes which in turn forward requests
6
6 Original Gnutella Nodes reply the request directly or indirectly depending on the firewall existence Node A downloads file pieces from one ore more positive nodes Unlike Napster, Gnutella is decentralized; flood-based searches
7
7 Modern Gnutella Contrast to unstructured overlay topology, most modern Gnutella clients adopt a two-tier overlay structure Ultrapeers and leaf peers (majority) Legacy peers (not implement ultrapeer feature)
8
8 Measurement methodology Problems of general crawlers Slow, distorted, inflate population Previous studies Partial snapshot, periodic probe of a fixed group Significance is doubted Goal of this work Capture entire population (?) Short period
9
9 Measurement methodology Topology crawl List of neighboring nodes Content crawl List of available files of each node Need more
10
10 Cruiser Parallel P2P crawler Orders of magnitude faster than previous crawlers (?) Master-slave architecture Slave crawls hundreds of peers and master coordinates multiple slaves Increase degree of concurrency
11
11 Cruiser Using 6 off-the-shelf 1GHz GNU/Linux boxes, crawl takes 15min + 5.5hr + 15min ~ 6 hours Each content crawl takes 10GB log file containing file name and content hash
12
12 Dataset Three measurement periods; within each period, take snapshots everyday 6/8/2005-6/18/2005, 8/23/2005-9/9/2005 and 10/11/2005-10/21/2005 Examine both short and long timescales
13
13 Dataset
14
14 Sources of unreachable nodes Firewall Severe network congestion Peer departed Not support Browse Host protocol Ultrapeers: depart Leaf peers: depart and firewall Contact 20% peers (~half a million)
15
15 Problems Low-bandwidth TCP connection Some crawls do not complete after the timeout threshold, as they are sent at extremely low rate File identity File name is not a reliable file identifier; so this work use content hash Post-processing More than 100 million distinct files Divide into 7 segments randomly, trim files of less than 10 copies in a segment, combine trimmed back to one
16
16 Static analysis Ratio of free riders Degree of resources sharing among cooperative peers File popularity distribution File type analysis
17
17 Ratio of free riders Free riders drop, ratio of ultrapeers is lower, long-lived peers slightly higher, # files not strongly correlate
18
18 Degree of resources sharing among cooperative peers Distribution of # peers sharing x files – power-law distribution
19
19 Degree of resources sharing among cooperative peers Distribution of contributed disk space – power-law distribution
20
20 Degree of resources sharing among cooperative peers Correlation not as strong as previous studies Discernable line with slope 3.7MB/file which is typical size of MP3 audio file
21
21 File popularity distribution
22
22 File type analysis
23
23 File type analysis Previous studiesCurrent studies Music67.2% files 79.2% bytes 67% files 40% bytes Video2.1% files 19.1% bytes 6% files 52.5% bytes
24
24 Topological analysis Per-file perspective – figure a & b Per-peer perspective – figure c
25
25 Topological analysis Churn (dynamics of peer participation) is dominant factor Depart Join Leaf peers become ultrapeers Rapid change in overlay topology prevents formation of topological clustering
26
26 Dynamics analysis Variations in shared files by individual peers Variations in popularity of individual files Trends in popularity variations
27
27 Variations in shared files by individual peers
28
28 Variations in popularity of individual files Focus on top 100 and top 1000 files
29
29 Trends in popularity variations Track top 10 files across several days (fig a & b) Over several months (fig c)
30
30 Conclusion Use parallel crawl to obtain snapshots of peer connectivity and available files Conduct three types of analysis Understand the distribution, correlation and dynamics of available files
31
31 Summary of findings Free riding significantly drops # shared files and contributed storage space by individual peers follow power-law distribution most peers contribute little disk space (<100MB) while small # peers contribute very large space (50-100GB) Popularity of individual files follow Zipf distribution small # files are extremely popular but majority of files are very unpopular
32
32 Summary of findings Most popular file type is MP3 file (2/3 of all files, 1/3 of all bytes) Popularity and occupied space by video files has tripled over past few years # video files < 1/10 of audio files but occupy 25% more bytes 93% of bytes or 73% of files are multimedia files
33
33 Summary of findings Files are randomly distributed; no strong correlation between the available files at peers that are one, two or three hops apart in overlay topology Shared files by individual slowly change over timescale of days; more popular files experience larger variations in popularity
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.