1 YouTube Traffic Characterization: A View From the Edge Phillipa Gill¹, Martin Arlitt²¹, Zongpeng Li¹, Anirban Mahanti³ ¹ Dept. of Computer Science, University.

Slides:



Advertisements
Similar presentations
The Internet and the Web
Advertisements

Introduction to Computing Using Python CSC Winter 2013 Week 8: WWW and Search  World Wide Web  Python Modules for WWW  Web Crawling  Thursday:
An Introduction to the Internet and the Web Frank McCown COMP 250 – Internet Development Harding University.
By: Chris Hayes. Facebook Today, Facebook is the most commonly used social networking site for people to connect with one another online. People of all.
Evaluating Web Software Reliability By Zumrut Akcam, Kim Gero, Allen Chestoski, Javian Li & Rohan Warkad CSI518 – Group 1.
Flickr Information propagation in the Flickr social network Meeyoung Cha Max Planck Institute for Software Systems With Alan Mislove.
1 Network Measurements of a Wireless Classroom Network Carey Williamson Nuha Kamaluddeen Department of Computer Science University of Calgary.
Fresh Analysis of Streaming Media Stored on the Web Rabin Karki M.S. Thesis Presentation Advisor: Mark Claypool Reader: Emmanuel Agu 10 Jan, 2011.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
TC2-Computer Literacy Mr. Sencer February 4, 2010.
Measurement, Modeling, and Analysis of a Peer-to-Peer File sharing Workload Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble, Henry.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
Algorithms (Contd.). How do we describe algorithms? Pseudocode –Combines English, simple code constructs –Works with various types of primitives Could.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Towards a Better Understanding of Web Resources and Server Responses for Improved Caching Craig E. Wills and Mikhail Mikhailov Computer Science Department.
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.
Introduction to eValid Presentation Outline What is eValid? About eValid, Inc. eValid Features System Architecture eValid Functional Design Script Log.
1 WAN Measurements Carey Williamson Department of Computer Science University of Calgary.
Social Networking Sites  By:  Frank Wu  Lu Xie  Yuri Chung  Paige Borah.
Social Media: YouTube as a Case. 2 New generation of video sharing service Feb.15th, 2005 Some statistics: 60 hours video uploaded very minute 4 billion.
By Raza / Faisal By: Raza Usmani Faisal Khan. What is SEO? It is the process of affecting the visibility of a website or a web page in a search engine's.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
CHAPTER 2 Communications, Networks, the Internet, and the World Wide Web.
Item Web 2.0 application relevant to teacher’s work.
Build a Free Website1 Build A Website For Free 2 ND Edition By Mark Bell.
Authors: Xu Cheng, Haitao Li, Jiangchuan Liu School of Computing Science, Simon Fraser University, British Columbia, Canada. Speaker : 童耀民 MA1G0222.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
Infrastructure for Better Quality Internet Access & Web Publishing without Increasing Bandwidth Prof. Chi Chi Hung School of Computing, National University.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Christine Laham, Fahed Abdu, David Dezano,Shelly Kim.
1 A Comparative Study of Handheld and Non-Handheld Traffic in Campus Wi-Fi Networks Aaron Gember, Ashok Anand, and Aditya Akella University of Wisconsin—Madison.
Using Facebook to Connect With Customers Part 1. Outline Questions from Librarians Introduction to Facebook Uses for Facebook Facebook for Personal Use.
Objective Understand concepts used to web-based digital media. Course Weight : 5%
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
MIS 424 Professor Sandvig. Overview  Why Analytics?  Two major approaches:  Server logs  Google Analytics.
Building a Search Engine Friendly ™ eCommerce Website ECMTA Webinar July 2008 Mountain Media is a trademarks of New Earth Technologies. All other logos/images.
Not So Fast Flux Networks for Concealing Scam Servers Theodore O. Cochran; James Cannady, Ph.D. Risks and Security of Internet and Systems (CRiSIS), 2010.
Characterizing User Access To Videos On The World Wide Web MMCN 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Peter Parnes.
TCP/IP (Transmission Control Protocol / Internet Protocol)
Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.
Streaming and Content Delivery SECTIONS 7.4 AND 7.5.
Measurement-based Analysis of the Video Characteristics of Twitch.tv Mark Claypool, Daniel Farrington, and Nicholas Muesch Computer.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
We.b : The web of short URLs Demetris Antoniades, lasonas Polakis, Gerogios Kontaxis, Elias Athansapoulos, Sotiris loannidis, Evangelos P.Markatos, Thomas.
Copyright © 2002 Pearson Education, Inc. Slide 3-1 Internet II A consortium of more than 180 universities, government agencies, and private businesses.
Scalable Data Scale #2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries.
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
#16 Application Measurement Presentation by Bobin John.
Investigating QoS of Web Services by Distributed Evaluation Zibin Zheng Feb. 8, 2010 Department of Computer Science & Engineering.
WebScan: Implementing QueryServer 2.0 Karl Geiger, Amgen Inc. BRS NA UG August 1999.
1 Chapter 22 World Wide Web (HTTP) Chapter 22 World Wide Web (HTTP) Mi-Jung Choi Dept. of Computer Science and Engineering
Evaluating Web Software Reliability By Zumrut Akcam, Kim Gero, Allen Chestoski, Javian Li & Rohan Warkad CSI518 – Group 1.
Understanding Online Social Network Usage from a Network Perspective F. Schneider et al (T-Labs, AT&T) Internet Measurement Conference 2009 Networking.
The Internet Salihu Ibrahim Dasuki (PhD) CSC102 INTRODUCTION TO COMPUTER SCIENCE.
+ Responsive Technology Performance, efficiency and elegance are the three key elements that make our platform unique. Each of the features in this presentation.
Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio.
Accelerating Peer-to-Peer Networks for Video Streaming
Block 5: An application layer protocol: HTTP
CISC103 Web Development Basics: Web site:
Steve Ko Computer Sciences and Engineering University at Buffalo
An Overview of A Case Study of the Uses Supported by Higher Education Computer Networks and An Analysis of Application Traffic Mark Pisano dps2017.
Introducing the World Wide Web
Facebook Clone Script | Social Network Script - Open Source Social Network Script
Web Information retrieval
CISC103 Web Development Basics: Web site:
Steve Ko Computer Sciences and Engineering University at Buffalo
Who is the King of the Hill? Traffic Analysis over a 4G Network
EE 122: HyperText Transfer Protocol (HTTP)
Slides prepared by Sarah Benis Scheier-Dolberg
Presentation transcript:

1 YouTube Traffic Characterization: A View From the Edge Phillipa Gill¹, Martin Arlitt²¹, Zongpeng Li¹, Anirban Mahanti³ ¹ Dept. of Computer Science, University of Calgary, Canada ² Enterprise Systems & Software Lab, HP Labs, USA ³ Dept. of Computer Science and Engineering, IIT Delhi, India

2 Introduction The way people use the Web is changing. Creation and sharing of media: Fast, easy, cheap! Volume of data associated with extremely popular online media.

3 What is Web 2.0? User generated content Text: Wordpress, Blogspot Photos: Flickr, Facebook Video: YouTube, MySpace Social Networking Facebook, MySpace Tagging Flickr, YouTube

4 YouTube: Facts and Figures Founded in February 2005 Enabled users to easily share movies by converting them to Flash Largest video sharing Website on the Internet [Alexa2007] Sold to Google for $1.65 billion in November 2006

5 How YouTube Works (1/2) GET: /watch?v=wQVEPFzkhaM OK (text/html) GET: /vi/fNaYQ4kM4FE/2.jpg OK (img/jpeg)

6 How YouTube Works (2/2) GET: swfobject.js OK (application/x-javascript) GET: /p.swf OK (video/flv) GET: /get_video?video_id=wQVEPFzkhaM OK (application/shockwave-flash)

7 Our Contributions Efficient measurement framework One of the first extensive characterizations of Web 2.0 traffic File properties File access patterns Transfer properties Implications for network and content providers

8 Outline Introduction & Background Contributions Methodology Results Implications Conclusions

9 Our View Points Edge (University Campus) 28,000 students 5,300 faculty & staff /16 address space 300Mb/s full-duplex network link Global Most popular videos

10 Campus Data Collection Goals: Collect data on all campus YouTube usage Gather data for an extended period of time Protect user privacy Challenges: YouTube’s popularity Monitor limitations Volume of campus Internet usage

11 Our Methodology Identify servers providing YouTube content Use bro to summarize each HTTP transaction in real time Restart bro daily and compress the daily log Map visitor identifier to a unique ID

12 Categories of Transactions Complete – the entire transaction was parsed successfully Interrupted – TCP connection was reset Gap – monitor missed a packet Failure – transaction could not be parsed

13 Categories of Transactions (2) Status% of Total% of Video Complete Interrupted Gap Failure5.75-

14 Our Traces Start Date: Jan. 14, 2007 End Date: Apr. 8, 2007 Total Valid Transactions: 23,250,438 Total Bytes: 6.54 TB Total Video Requests: 625,593 Total Video Bytes: 6.45 TB Unique Video Requests: 323,677 Unique Video Bytes: 3.26 TB

15 HTTP Response Codes Code% of Responses% of Bytes 200 (OK) (Partial Content) (Found) (See Other) (Not Modified) xx (Client Error) xx (Server Error)

16 Global Data Collection Crawling all videos is infeasible Focus on top 100 most popular videos Four time frames: daily, weekly, monthly and all time. 2 step data collection: Retrieve pages of most popular videos Use YouTube API to get details on these videos

17 Outline Introduction & Background Contributions Methodology Results Implications Conclusions

18 Results Campus Usage Patterns File Properties File Access Patterns Transfer Properties

19 Campus Usage Patterns Reading Break

20 Results Campus Usage Patterns File Properties File Access Patterns Transfer Properties

21 Unique File Sizes Video data is significantly larger than the other content types

22 Time Since Modification Videos and images rarely modified Text and application data modified more frequently

23 Video Durations Spike around 3 minutes likely music videos Campus videos are relatively short: μ=3.3 min

24 Summary of File Properties Video content is much larger than other content types Image and video content is more static than application and text content Video durations are relatively short Videos viewed on campus tend to be more than 1 month old

25 Results Campus Usage Patterns File Properties File Access Patterns Transfer Properties

26 Relative Popularity of Videos Video popularity follows a weak Zipf distribution Possibly due to edge network point of view β = 0.56

27 Commonality of Videos ~10% commonality between consecutive days during the week ~5% commonality between consecutive days on the weekend

28 Summary of File Referencing Zipf distribution is weak when observed from the edge of the network There is some overlap between videos viewed on consecutive days Significant amount of content viewed on campus is non-unique

29 Results Campus Usage Patterns File Properties File Access Patterns Transfer Properties

30 Transfer Sizes Flash player (p.swf, player2.swf) Javascripts

31 Transfer Durations Video transfers have significantly longer durations than other content types

32 Summary of Transfer Properties Javascript and flash objects have an impact on the size of files transferred Video transfers have significantly larger sizes and durations

33 Outline Introduction & Background Contributions Methodology Results Implications Conclusions

34 Implications for Network Providers Web 2.0 poses challenges to caching Larger multimedia files More diversity in content Meta data may be used to improve caching efficiency

35 Implications for Content Providers Multimedia content is large! 65,000 videos/day x 10MB/video = 19.5 TB/month Long tail effect -> much of the content will be unpopular Cheap storage solutions Longer transfer durations for video files more CPU cycles required for transfers

36 Conclusions Multimedia content has much larger transfer sizes and durations than other content types From the edge of the network, video popularity follows a weak Zipf distribution Web 2.0 facilitates diversity in content which poses challenges to caching New approaches are needed to efficiently handle the resource demands of Web 2.0 sites

37 Questions? Contact

38 Ignore the slides after this one

39 Download to Bitrate-Ratio

40 Time of Day and Day of Week Traffic Patterns

41 Video Ages 73% of campus videos are older than 1 month 5% of campus videos are older than 1 year

42 Absolute Growth in Working Set Half the video content transferred is non-unique

43 What is different about Web 2.0? Web 1.0:

44 What is different about Web 2.0? Web 2.0: