June 3, Correlating Topology and Path Characteristics of Overlay Networks and the Internet GP2PC’06, in conjunction with IEEE CCGrid2006 A. Iosup, P. Garbacki, J. Pouwelse, D.H.J. Epema PDS Group, ST/EWI, TU Delft
June 3, Outline Motivation, Goals, and Statistics Background: The BitTorrent File-Sharing Network The MultiProbe Framework The Measurements Setup The Results Using Our Results Conclusion
June 3, P2P File-Sharing is Growing at a Fast Pace… P2P file-sharing Daily 85M users [Pouwelse, ICT Kenniscongres’06] From 10% to 70% Internet traffic in 5 years: P2P file-sharing is the largest Internet application today [Parker, CacheLogic’04 & ’06] ~70% - P2P File-Sharing
June 3, … we Need to Understand Behavior and Performance … Measuring Underlay/Overlay Networks How to build a large-scale infrastructure for measuring P2P and Internet characteristics at the same time? How to measure a representative part of a P2P network? Characterizing Overlay Networks and Their Users Where are the overlay network users located? What is the geographical distribution of traffic? What is the connectivity amongst users? What is the application throughput? Correlating Underlay/Overlay Measurements How do P2P file-sharing networks map to their Internet underlay?
June 3, … through Measurements of the Largest P2P File-Sharing Network: BitTorrent (arguably) “BitTorrent traffic amounts to 20% of Tier 1 and 2 ISPs traffic! ” “BitTorrent traffic amounts to 50%+ P2P File-Sharing traffic” [Parker, IEEE WCW2005]
June 3, Related experimental work Measurements Gnutella [Saroiu et al., MMCN’02] Kazaa [Ripeanu et al., WIAPP'03; Gummadi et al., SOSP’03] BitTorrent flashcrowds, infrastructure availability, content integrity, peer download bandwidth [Izal, PAM’04; Pouwelse et al., IPTPS’05] eDonkey [Le Fessant et al., IPTPS’04] General [Sen & Wang, IMW’02] Injection and moderation Movies [Byers et al., ACM DRM WS’03] Kazaa pollution [Liang et al., Infocom’05] Download Freeriding [Adar & Huberman, XEROX TR '00]
June 3, The BitTorrent P2P File-Sharing Network Data as torrents (file, chunks,.torrent) Peer, Tracker, and Web-site levels Tit-for-tat: use all available bandwidth Mostly fresh files (excellent support for spikes in interest – flashcrowds)
June 3, Outline Motivation, Goals, and Statistics Background: The BitTorrent File-Sharing Network The MultiProbe Framework The Measurements Setup The Results Using Our Results Conclusion
June 3, The MultiProbe Framework 1. SiteStats Select the largest BitTorrent web site (sort by no. torrents/users) Select the largest torrents (sort by no. users) Active-start measurements 2. GetPeers, 3. PeerPing Probes initiate contact with other peers Get bandwidth information Passive-start measurements 4. ListenPeers, 5. TrackPeers Probes wait to be contacted by other peers Multi-source traceroute 6. Post-processing Automated tools to process 10s of GB of data 6
June 3, The Measurements Setup Largest Site: Pirates Bay Active-start vs. Passive-start Torrents
June 3, The Results: Geographical Distribution BitTorrent is now globally represented, EU dominant
June 3, The Results: Application-Level Bandwidth Average bandwidth is 500Kbps, double the one observed 2 years ago [Pouwelse et al., IPTPS’05] Two groups with similar bandwidth characteristics: Europe, North America, and Asia South America, Oceania, and Africa.
June 3, The Results: Summary 100 nodes DAS (The Dutch Grid), 50/300 nodes PlanetLab Shared data (files) traffic 50 GB/day 450,000 unique peers, 20M IP addresses 2000/2000 torrents active-start 695/750 torrents passive-start (Top150, +95% Top700) ~40 M recorded events, ~10GB uncompressed data/day Correlated Internet and overlay network characteristics Geographical distribution of BitTorrent users Average bandwidth 500Kbps (doubled in 2 years) Over 75% of BitTorrent traffic is hidden (full range of TCP ports), while 50% users/25% traffic still on standard ports Distribution of inter-peer IP path hop count, AS traversals, intra-AS hop count, latency, …
June 3, Outline Motivation, Goals, and Statistics Background: The BitTorrent File-Sharing Network The MultiProbe Framework The Measurements Setup The Results Using Our Results Conclusion
June 3, Using these results (1): Compare Previous (IPTPS’05) and Current Measurements Previously: cover most recent 100 files Problem: Potentially biased geographical distribution of peers Conclusion: Internet Provider caching has potential Solution: cover ALL files with more than 100 users Confirmed continental distribution (Europe dominant) Strong location bias in previous measurements (Germany is not dominant) For some countries, ISP traversals = 0, so local caching can and should be used to increase user experience New insights regarding average bandwidth, TCP port mapping, AS/ISP coverage
June 3, Using these results (2): From BW to Collaborative Downloads Majority of users connected through asymmetric links Asymmetric links already at a disadvantage in BitTorrent (upload bandwidth limit, because of tit-for-tat) Avg. bandwidth of 500Kbps > Asymmetric links (upload) capacity ( Kbps), so peers with asymmetric links even more at a disadvantage (BitTorrent tit-for-tat favors better connections) Exploit: Collaborative Downloading With download helpers, downloading speed-up to 6x P. Garbacki, A. Iosup, D.H.J.Epema, M. van Steen,2Fast: Collaborative Downloads in Peer-To-Peer Networks,(submitted).
June 3, Using these results (3): BitTorrent Now Works at Global Scale Observed global distribution of users World-wide geographical location Greatly increased community size, favoring social interaction Increased technical knowledge for heavy users (port numbers) Exploit: Tribler, a new social paradigm in P2P 4 months, 4500 users, downloads J. Pouwelse, P. Garbacki, J. Wang, A. Bakker, J.Yang, A. Iosup, D.H.J.Epema, M.Reinders, M. van Steen, H.Sips, Tribler: A Social-Based Peer-to-Peer System, In IPTPS'06, February, 2006, Santa Barbara, CA, USA
June 3, Using these results (4): How can you leverage these results? (Anonymized) Data is available for download 50 files, 40GB uncompressed Test/improve your P2P algorithms with realistic workloads Realistic file sizes Realistic community sizes Realistic user geographical location Realistic user Internet location (hops, latency, bandwidth)
June 3, Conclusions and ongoing work MultiProbe framework for large-scale P2P file sharing measurements Experience BitTorrent, 450,000+ unique peers, observed 50+ TB/day Correlated Internet and overlay network characteristics Currently building a P2P Traces Archive for the benefit of the whole community!
June 3, Thank you! Questions? Remarks? Observations? Help building our community’s P2P Traces Archive ! MultiProbe [google: “iosup”] Many thanks to Neil Spring (Scriptroute), Paulo Anita (website).