1 Where in the World is Carmen BitDiego? And who is she, anyways… Alexandru IOSUP The 12th annual ASCI Computing Workshop
2 Introduction (1 of 3) Peer-2-Peer File sharing Everybody has the same rights. P2P average everybody? Who? Where? When? How? Why? Tons of studies over the past 5 years Saroiu’02, Yazti’02, Yzal’04, Pouwelse’04 We go for something else! (tbs)
3 Introduction (2 of 3) BitTorrent Most used P2P network today (53% traffic) Attributes 2 nd gen. P2P network – no centralized servers; optimizes transfer speed; favors high-bandwidth users; files are split in chunks Peers – Trackers – Web sites Tit-for-tat sharing mechanism – everybody gives some; except when they don’t… no search at peer level Owners are called seeds, we are called leeches So much to know: I want my BitTorrent today!
4 Enters Carmen… Carmen SanDiego Famous spy Location: unknown Likes: to hide Clues to where she is: history, complicated hints Never caught Carmen BitDiego Famous P2P network Location: unknown Likes: who knows? Clues to where she is: some history, lightweight hints Caught (?) NO Multi-files studies NO Country-per-file NO Organizations NO, NO, NO… Who is this Carmen, anyways…
5 Introduction (3 of 3) We track Carmen BitDiego Tracked data attributes Users got 204,454,719,497,935B (ok, 204,5TB) 40,000,000 contacts 200,000 unique users (*) 120 files 9 specific media types The first aliased media view 7 unique views We got her now! Or is it…
6 Mission statement We want to know about Carmen BitDiego Where she goes Continent, country, city, organization When she goes Time-patterns per country Time-patterns in seeds/leeches ratio How many file chunks at any time? With whom she hangs out Special users? Super-peers, collector peers Is she a good companion? How many users get what they want? We’re getting to this info in no time…
7 Outline of the presentation Intro Enters Carmen… Mission statement Our data looks like this… Methods, or how to catch her Results, or how we caught her Conclusions (done) (we are here) (coming up next)
8 Our data looks like this… We track 120 files 120 trace files Time stamp, IP, port, # of chunks = record = 1 observation 12 big traces (+500,000 observations/trace) December 2003 – January small traces March global categories All, Big, Small 9 special categories Movies, Games, Music, Applications Alias media Same contents, different names Same language Different language
9 Outline of the presentation Intro Enters Carmen… Mission statement Our data looks like this… Methods, or how to catch her Results, or how we caught her Conclusions (we are here) (coming up next) (done)
10 Methods, or how to catch her We want to know about Carmen BitDiego Where she goes Un-DNS(*): continent (1), country (2), city (3), organization (4) When she goes (5) Parse and correlate Time-patterns per country Parse and correlate Time-patterns in seeds/leeches ratio Parse and correlate How many file chunks at any time? With whom she hangs out (6) Super-peers = nodes that own more than one complete file Collector peers = nodes that try to get more than one file Is she a good companion? (7) How many users get what they want? * Thanks MaxMind (GeoIP lib, database) and WebLog Expert (databases)
11 Outline of the presentation Intro Enters Carmen… Mission statement Our data looks like this… Methods, or how to catch her Results, or how we caught her Conclusions (we are here) (coming up next) (done) WARNING! We show only a selection of our results!
12 Results, or how we caught her Where she goes continent Europe is now the biggest BitTorrent consumer (not NA) Tit-for-tat discourages low-bandwidth users!
13 Results, or how we caught her Where she goes continent Not the same distribution for different sets of files! Europe is now the biggest BitTorrent consumer (not NA) Tit-for-tat discourages low-bandwidth users! Coarse media locality property Asia > North America (themed game)
14 Results, or how we caught her Where she goes country US still the biggest overall BitTorrent consumer – continent view can be misleading! NL is only 6 th !
15 US still the biggest overall BitTorrent consumer – continent view can be misleading! Where she goes country Not the same distribution for different sets of files! Localized versions of the files attract local users! Themed files attract very specific audiences! What about a marketing study based on BitTorrent file ranks? Fine media locality! Countries have habits! Results, or how we caught her Hong Kong, Chile : soccer management sim Israel : action movie Japan : animes The Nederlands 6 th Romania ~50 th
16 Results, or how we caught her Where she goes city Oldenburg, Eschborn, Herndon … Internet nodes placed outside major cities – cannot use this to track real users! 30% unknown – not reliable! Dispersed locations
17 Results, or how we caught her Where she goes organization Not the same distribution for different sets of files! We’d like to thank: The Walt Disney Company, Sony Corporation, SANYO Electric Software Co. Ltd., and Merrill Lynch for actively supporting BitTorrent! 1 ISP covers +60% users 10 ISPs cover <50% users ISP caching policy different for different files and communities! Academic institutions < 10% users!
18 Results, or how we caught her When she goes Time-patterns per country 8:30AM, 1PM, 6-9PM, 12-1AM mostly at work, during slow hours? Europe guides the time-patterns!
19 Results, or how we caught her When she goes How many file chunks at any time? The network is not robust all the time – attacks at these precise moments could be fatal! Causes: - trackers down - users interest down - others
20 Results, or how we caught her When she goes Time-patterns per no. of chunks/seeders/leeches ratio users:seeds ~ 10:1 leeches:seeds ~ 9:1 chunks:seeds ~ 1000:1
21 Results, or how we caught her With whom she hangs out Super-peers = nodes that own more than one complete file Collector peers = nodes that try to get more than one file # users / # files decreases exponentially! Group Small: Collectors (n files) ~ 2x Superpeers (n files)
22 Results, or how we caught her Is she a good companion?1 Point = 1% of any file Group Small users download whole files! Aliased Media results in exponential drop! people drop after getting 1/many Group Small Avg. (any) ~ 81 points Avg. (1 file) ~ 113 points Aliased Media Avg. (any) ~ 52 points Avg. (1 file) ~ 109 points Users download 1 file then disconnect! YES!
23 Outline of the presentation Intro Enters Carmen… Mission statement Our data looks like this… Methods, or how to catch her Results, or how we caught her Conclusions (done) (we are here) (done)
24 Conclusions Carmen BitDiego Famous P2P network Location: known Likes: established (study per specific file groups) Clues to where she is: complete hints Multi-files study Continents, Country, Cities, Organizations, global and per-file Time-patterns in the users/seeds/leeches behavior (also country) Super-nodes / collector nodes analysis Carmen BitDiego almost caught! Trivial and Non-trivial locality properties Alias media hints Need a full study w/ these methods to catch her!
25 Thank you… Questions? Remarks? Observations? All welcome! Alexandru IOSUP TU Delft I would like to thank Johan Pouwelse and Pawel Garbacki for all their help in creating this study. Thank you, Johan! Thank you, Pawel! Their previous work: