1 YouTube Traffic Characterization: A View From the Edge Phillipa Gill¹, Martin Arlitt²¹, Zongpeng Li¹, Anirban Mahanti³ ¹ Dept. of Computer Science, University of Calgary, Canada ² Enterprise Systems & Software Lab, HP Labs, USA ³ Dept. of Computer Science and Engineering, IIT Delhi, India
2 Introduction The way people use the Web is changing. Creation and sharing of media: Fast, easy, cheap! Volume of data associated with extremely popular online media.
3 What is Web 2.0? User generated content Text: Wordpress, Blogspot Photos: Flickr, Facebook Video: YouTube, MySpace Social Networking Facebook, MySpace Tagging Flickr, YouTube
4 YouTube: Facts and Figures Founded in February 2005 Enabled users to easily share movies by converting them to Flash Largest video sharing Website on the Internet [Alexa2007] Sold to Google for $1.65 billion in November 2006
5 How YouTube Works (1/2) GET: /watch?v=wQVEPFzkhaM OK (text/html) GET: /vi/fNaYQ4kM4FE/2.jpg OK (img/jpeg)
6 How YouTube Works (2/2) GET: swfobject.js OK (application/x-javascript) GET: /p.swf OK (video/flv) GET: /get_video?video_id=wQVEPFzkhaM OK (application/shockwave-flash)
7 Our Contributions Efficient measurement framework One of the first extensive characterizations of Web 2.0 traffic File properties File access patterns Transfer properties Implications for network and content providers
8 Outline Introduction & Background Contributions Methodology Results Implications Conclusions
9 Our View Points Edge (University Campus) 28,000 students 5,300 faculty & staff /16 address space 300Mb/s full-duplex network link Global Most popular videos
10 Campus Data Collection Goals: Collect data on all campus YouTube usage Gather data for an extended period of time Protect user privacy Challenges: YouTube’s popularity Monitor limitations Volume of campus Internet usage
11 Our Methodology Identify servers providing YouTube content Use bro to summarize each HTTP transaction in real time Restart bro daily and compress the daily log Map visitor identifier to a unique ID
12 Categories of Transactions Complete – the entire transaction was parsed successfully Interrupted – TCP connection was reset Gap – monitor missed a packet Failure – transaction could not be parsed
13 Categories of Transactions (2) Status% of Total% of Video Complete Interrupted Gap Failure5.75-
14 Our Traces Start Date: Jan. 14, 2007 End Date: Apr. 8, 2007 Total Valid Transactions: 23,250,438 Total Bytes: 6.54 TB Total Video Requests: 625,593 Total Video Bytes: 6.45 TB Unique Video Requests: 323,677 Unique Video Bytes: 3.26 TB
15 HTTP Response Codes Code% of Responses% of Bytes 200 (OK) (Partial Content) (Found) (See Other) (Not Modified) xx (Client Error) xx (Server Error)
16 Global Data Collection Crawling all videos is infeasible Focus on top 100 most popular videos Four time frames: daily, weekly, monthly and all time. 2 step data collection: Retrieve pages of most popular videos Use YouTube API to get details on these videos
17 Outline Introduction & Background Contributions Methodology Results Implications Conclusions
18 Results Campus Usage Patterns File Properties File Access Patterns Transfer Properties
19 Campus Usage Patterns Reading Break
20 Results Campus Usage Patterns File Properties File Access Patterns Transfer Properties
21 Unique File Sizes Video data is significantly larger than the other content types
22 Time Since Modification Videos and images rarely modified Text and application data modified more frequently
23 Video Durations Spike around 3 minutes likely music videos Campus videos are relatively short: μ=3.3 min
24 Summary of File Properties Video content is much larger than other content types Image and video content is more static than application and text content Video durations are relatively short Videos viewed on campus tend to be more than 1 month old
25 Results Campus Usage Patterns File Properties File Access Patterns Transfer Properties
26 Relative Popularity of Videos Video popularity follows a weak Zipf distribution Possibly due to edge network point of view β = 0.56
27 Commonality of Videos ~10% commonality between consecutive days during the week ~5% commonality between consecutive days on the weekend
28 Summary of File Referencing Zipf distribution is weak when observed from the edge of the network There is some overlap between videos viewed on consecutive days Significant amount of content viewed on campus is non-unique
29 Results Campus Usage Patterns File Properties File Access Patterns Transfer Properties
30 Transfer Sizes Flash player (p.swf, player2.swf) Javascripts
31 Transfer Durations Video transfers have significantly longer durations than other content types
32 Summary of Transfer Properties Javascript and flash objects have an impact on the size of files transferred Video transfers have significantly larger sizes and durations
33 Outline Introduction & Background Contributions Methodology Results Implications Conclusions
34 Implications for Network Providers Web 2.0 poses challenges to caching Larger multimedia files More diversity in content Meta data may be used to improve caching efficiency
35 Implications for Content Providers Multimedia content is large! 65,000 videos/day x 10MB/video = 19.5 TB/month Long tail effect -> much of the content will be unpopular Cheap storage solutions Longer transfer durations for video files more CPU cycles required for transfers
36 Conclusions Multimedia content has much larger transfer sizes and durations than other content types From the edge of the network, video popularity follows a weak Zipf distribution Web 2.0 facilitates diversity in content which poses challenges to caching New approaches are needed to efficiently handle the resource demands of Web 2.0 sites
37 Questions? Contact
38 Ignore the slides after this one
39 Download to Bitrate-Ratio
40 Time of Day and Day of Week Traffic Patterns
41 Video Ages 73% of campus videos are older than 1 month 5% of campus videos are older than 1 year
42 Absolute Growth in Working Set Half the video content transferred is non-unique
43 What is different about Web 2.0? Web 1.0:
44 What is different about Web 2.0? Web 2.0: