1 The Stretched Exponential Distribution of Internet Media Access Patterns Lei Yahoo! Inc. Enhua Ohio State University Songqing George.

Slides:



Advertisements
Similar presentations
A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.
Advertisements

A Case Study of Traffic Locality in Internet P2P Live Streaming Systems Yao George Mason University Lei Yahoo! Inc. Fei George Mason University.
1 Internet Streaming Media Delivery: Zhen Xiao Joint work with Lei Guo, Enhua Tan, Songqing Chen, Oliver Spatchcheck, and Xiaodong Zhang Delving into A.
ICNP’07, Beijing, China1 PSM-throttling: Minimizing Energy Consumption for Bulk Data Communications in WLANs Enhua Tan 1, Lei Guo 1, Songqing Chen 2, Xiaodong.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining.
Computer Science Generating Streaming Access Workload for Performance Evaluation Shudong Jin 3nd Year Ph.D. Student (Advisor: Azer Bestavros)
Amir Rasti Reza Rejaie Dept. of Computer Science University of Oregon.
Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Presented For Cs294-4 Fall 2003 By Jon Hess.
Measurement, Modeling and Analysis of a Peer-to-Peer File-Sharing Workload Krishna Gummadi, Richard Dunn, Stefan Saroiu Steve Gribble, Hank Levy, John.
1 10 Web Workload Characterization Web Protocols and Practice.
GlobeTraff A traffic workload generator for the performance evaluation of ICN architectures K.V. Katsaros, G. Xylomenos, G.C. Polyzos A.U.E.B. (presented.
1 School of Computing Science Simon Fraser University, Canada Modeling and Caching of P2P Traffic Mohamed Hefeeda Osama Saleh ICNP’06 15 November 2006.
Fresh Analysis of Streaming Media Stored on the Web Rabin Karki M.S. Thesis Presentation Advisor: Mark Claypool Reader: Emmanuel Agu 10 Jan, 2011.
P2P 2.0 and it’s impact on the Internet
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
Measurement, Modeling, and Analysis of a Peer-to-Peer File sharing Workload Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble, Henry.
Delving into Internet Streaming Media Delivery: Written by: Lei Guo, Enhua Tan, Songqing Chen, Zhen Xiao, Oliver Spatscheck, Xiaodong Zhang Presented by:
1 The Content and Access Dynamics of a Busy Web Server: Findings and Implications Venkata N. Padmanabhan Microsoft Research Lili Qiu Cornell University.
Web Caching Robert Grimm New York University. Before We Get Started  Interoperability testing  Type theory 101.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.
1 Drafting Behind Akamai (Travelocity-Based Detouring) AoJan Su, David R. Choffnes, Aleksandar Kuzmanovic, and Fabian E. Bustamante Department of Electrical.
An Overlay Multicast Infrastructure for Live/Stored Video Streaming Visual Communication Laboratory Department of Computer Science National Tsing Hua University.
Understanding Churn in Peer-to-Peer Networks Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Internet Measurement Conference.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna.
Web Caching Robert Grimm New York University. Before We Get Started  Illustrating Results  Type Theory 101.
Adaptive Content Delivery for Scalable Web Servers Authors: Rahul Pradhan and Mark Claypool Presented by: David Finkel Computer Science Department Worcester.
1 YouTube Traffic Characterization: A View From the Edge Phillipa Gill¹, Martin Arlitt²¹, Zongpeng Li¹, Anirban Mahanti³ ¹ Dept. of Computer Science, University.
Internet Streaming Media Delivery:
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.
Social Media: YouTube as a Case. 2 New generation of video sharing service Feb.15th, 2005 Some statistics: 60 hours video uploaded very minute 4 billion.
Web Caching and Content Delivery. Caching for a Better Web Performance is a major concern in the Web Proxy caching is the most widely used method to improve.
Mobile App Monetization: Understanding the Advertising Ecosystem Vaibhav Rastogi.
Can Internet Video-on-Demand Be Profitable? SIGCOMM 2007 Cheng Huang (Microsoft Research), Jin Li (Microsoft Research), Keith W. Ross (Polytechnic University)
1 Analyzing Patterns of User Content Generation in Online Social Networks Lei Guo, Yahoo! Enhua Tan, Ohio State University Songqing Chen, George Mason.
IBM Research, Network Server Systems Software © 2006 IBM Corporation Characterizing Instant Messaging Traffic in an Enterprise Network Lei Guo, the Ohio.
Add image. 3 “ Content is NOT king ” today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.
Jesse E. Simsarian and Marcus Duelk Bell Laboratories, Alcatel-Lucent, Holmdel, NJ, 15th IEEE Workshop on Local and Metropolitan.
1 Measurements, Analysis, and Modeling of BitTorrent-like Systems Lei Guo 1, Songqing Chen 2, Zhen Xiao 3, Enhua Tan 1, Xiaoning Ding 1, and Xiaodong Zhang.
HPL- 10/3/2015 Lucy Cherkasova H 1 Characterizing Locality, Evolution, and Life Span of Accesses in Enterprise Media Server Workloads Ludmila Cherkasova.
Distributing Layered Encoded Video through Caches Authors: Jussi Kangasharju Felix HartantoMartin Reisslein Keith W. Ross Proceedings of IEEE Infocom 2001,
CPSC 441: Multimedia Networking1 Outline r Scalable Streaming Techniques r Content Distribution Networks.
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.
HUAWEI TECHNOLOGIES CO., LTD. Page 1 Survey of P2P Streaming HUAWEI TECHNOLOGIES CO., LTD. Ning Zong, Johnson Jiang.
Microsoft Research1 Characterizing Alert and Browse Services for Mobile Clients Atul Adya, Victor Bahl, Lili Qiu Microsoft Research USENIX Annual Technical.
1 Insights into Access Patterns of Internet Media Systems Lei Guo.
Characterizing User Access To Videos On The World Wide Web MMCN 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Peter Parnes.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Practical LFU implementation for Web Caching George KarakostasTelcordia Dimitrios N. Serpanos University of Patras.
NUS.SOC.CS Roger Zimmermann (based in part on slides by Ooi Wei Tsang) 1 Proxy Caching for Streaming Media.
1 Measurements, Analysis, and Modeling of BitTorrent-like Systems Lei Guo, Ph.D. Candidate Park Graduate Research Award Presentation.
1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.
NTMS 2012 GlobeTraff: a traffic workload generator for the performance evaluation of future Internet architectures K.V. Katsaros, G. Xylomenos, G.C. Polyzos.
March 2001 CBCB The Holy Grail: Media on Demand over Multicast Doron Rajwan CTO Bandwiz.
MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
#16 Application Measurement Presentation by Bobin John.
1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
On the scale and performance of cooperative Web proxy caching 2/3/06.
Whole Page Performance Leeann Bent and Geoffrey M. Voelker University of California, San Diego.
Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio.
Presenter: Kuei-Yu Hsu Advisor: Dr. Kai-Wei Ke 2013/9/30 Performance analysis of video streaming on different hybrid CDN & P2P infrastructure.
Measurements, Analysis, and Modeling of BitTorrent-like Systems
Peer-to-Peer Information Systems Week 6: Performance
EE 122: Lecture 22 (Overlay Networks)
Xiaodong Zhang Ohio State University
Presentation transcript:

1 The Stretched Exponential Distribution of Internet Media Access Patterns Lei Yahoo! Inc. Enhua Ohio State University Songqing George Mason University Zhen Peking University Xiaodong Ohio State University Presented in PODC 2008

2 Media content on the Internet Video traffic is doubling every 3 to 4 months No. 3 1.Yahoo 2.Google 3.YouTube Video applications are mainstream

3 Existing media delivery systems ♫ ♫ P2P network CDN wireless High CPU and bandwidth consumption, unsatisfactory QoS Caching for performance improvement –A lot of research studies –Commercial products Access patterns play an important role –Exploit temporal locality –High performance with cost effective hardware Seldom used in reality

4 Zipf model of Internet traffic patterns Zipf distribution (power law) –Characterizes the property of scale invariance –Heavy tailed, scale free rule –Income distribution: 80% of social wealth owned by 20% people (Pareto law) –Web traffic: 80% Web requests access 20% pages (Breslau, INFOCOM’99) System implications –Objectively caching the working set in proxy –Significantly reduce network traffic log i log y slope: -   : 0.6~0.8 i y heavy tail Reference rank distribution i : rank of objects y i : number of references

5 Does Internet media traffic follow Zipf’s law? Chesire, USITS’01: Zipf-like Cherkasova, NOSSDAV’02: non-Zipf Acharya, MMCN’00: non-Zipf Yu, EUROSYS’06: Zipf-like Web media systemsVoD media systems Live streaming and IPTV systems Veloso, IMW’02: Zipf-like Sripanidkulchai, IMC’04: non - Zipf P2P media systems Gummadi, SOSP’03: non-Zipf Iamnitchi, INFOCOM’04: Zipf-like

6 Inconsistent media access pattern models Still based on the Zipf model –Zipf with exponential cutoff –Zipf-Mandelbrot distribution –Generalized Zipf-like distribution –Two-mode Zipf distribution –Fetch-at-most-once effect –Parabolic fractal distribution –…–… All case studies –Based on one or two workloads –Different from or even conflict with each other An insightful understanding is essential to –Content delivery system design –Internet resource provisioning –Performance optimization heuristic assumptions

7 Outline Motivation and objectives Stretched exponential of Internet media traffic Dynamics of access patterns in media systems Caching implications Concluding remarks

8 Workload summary 16 workloads in different media systems –Web, VoD, P2P, and live streaming –Both client side and server side Different delivery techniques –Downloading, streaming, pseudo streaming –Overlay multicast, P2P exchange, P2P swarming Data set characteristics –Workload duration: 5 days - two years –Number of users: –Number of requests: –Number of objects: nearly all workloads available on the Internet all major delivery techniques data sets of different scales

9 Stretched exponential distribution Media reference rank follows stretched exponential distribution (verified Chi-square test) log i ycyc b slope: -a i : rank of media objects ( N objects) y : number of references Reference rank distribution: fat head and thin tail in log-log scale straight line in log x - y c scale (SE scale) log i log y fat head thin tail c : stretch factor

10 Set 1: Web media systems (server logs) ST-SVR-01 (15 MB) *HPC-98 (14 MB) *HPLabs-99 (120 MB) HPC-98: enterprise streaming media server logs of HP corporation (29 months) HPLabs: logs of video streaming server for employees in HP Labs (21 months) ST-SVR-01: an enterprise streaming media server log workload like HPC-98 (4 months) log scale in x axis powered scale y c log scale fat headthin tail c = 0.22 R 2 ~ 1 Y left: y^c scale Y right: log scale R 2 : coefficient of determination (1 means a perfect fit) x : rank of media object y : number of references to the object

11 Set 2: Web media systems (req packets) ST-CLT-05 (4.5 MB) PS-CLT-04 (1.5 MB) ST-CLT-04 (2 MB) PS-CLT-04: HTTP downloading/pseudo streaming requests, 9 days ST-CLT-04: RTSP/MMS on-demand streaming requests, 9 days ST-CLT-05: RTSP/MMS on-demand streaming requests, 11 days All collected from a big cable network hosted by a large ISP powered scale y c log scale fat headthin tail log scale in x axis x : rank of media object y : number of references to the object

12 Set 3: VoD media systems mMoD-98: logs of a multicast Media-on-Demand video server, 194 days CTVoD-04: streaming serer logs of a large VoD system by China telecom, 219 days, reported as Zipf in EUROSYS’06 IFILM-06: number of web page clicks to video clips in IFILM site, 16 weeks (one week for the figure) YouTube-06: cumulative number of requests to YouTube video clips, by crawling on web pages publishing the data *mMoD-98 (125 MB) *CTVoD-04 (300 MB) IFILM-06 (2.25 MB) YouTube-06 (3.4 MB) powered scale y c log scale fat headthin tail log scale in x axis

13 Set 4: P2P media systems BT-03 (636 MB) *KaZaa-02 (300 MB) *KaZaa-03 (5 MB) KaZaa-02: large video file (> 100 MB. Files smaller than 100 MB are intensively removed) transferring in KaZaa network, collected in a campus network, 203 days. KaZaa-03: music files, movie clips, and movie files downloading in KaZaa network, 5 days, reported as Zipf in INFOCOM’04. BT-03: 48 days BitTorrent file downloading (large video and DVD images) recorded by two tracker sites

14 Set 5: Live streaming and movie pictures IMDB-06 Akamai-03 Movie-02 Akamai-03: server logs of live streaming media collected from akamai CDN, 3 months, reported as two-mode Zipf in IMC’04 Movie-02: US movie box office ticket sales of year IMDB-06: cumulative number of votes for top 250 movies in Internet Movie Database web site

15 Why Zipf observed before? Media traffic is driven by user requests Intermediate systems may affect traffic pattern –Effect of extraneous traffic (ads video insertion) –Filtering effect due to caching (through a proxy) Biased measurements may cause Zipf observation cache proxy ad server media server

16 Extraneous media traffic meta file link web server streaming media server ads server ads clip flag clip video program ad and flag video are pushed to clients mandatorily ads clip flag clip video prog ads clip flag clip video prog

17 Do not represent user access patterns –High request rate (high popularity) –High total number of requests Not necessary Zipf with extraneous traffic –Extraneous traffic changes –Always SE without extraneous traffic Small object sizes, small traffic volume Effects of extraneous traffic on reference rank distributions Reference rates prog ads flag : 2 objects2005: merged into 1 object Non-Zipf Zipf with extraneous traffic SE 2004 SE 2005 without extraneous traffic

18 Caching effect Web workload: caching can cause a “flattened head” in log-log scale Stretched exponential is not caused by caching effect Local replay events can be traced by WM/RM streaming media protocols log i log y Zipf Filtered by Web cache log i log y Stretched exponential Play summary Cache validation Server logs packet sniffer

Web --- Video CCDF of req (log) raw data linear fit Time after object birth (day) BitTorrent media file Why media access pattern is not Zipf “Rich-get-richer” and Zipf / power law –Pareto law: income distribution Web access is Zipf –Popular pages can attract more users –Pages update to keep popular –Yahoo ranks No.1 more than six years –Zipf-like for long duration Media access is not –Popularity decreases with time exponentially –Media objects are immutable –Rich-get-richer not present –Non-Zipf in long duration Number of distinct weekly top N popular objects in 16 weeks Top 1 Web object never changes Top 1 video object changes every week 16 1

20 Outline Motivation and objectives Stretched exponential of Internet media traffic Dynamics of access patterns in media systems Caching implications Concluding remarks

21 Dynamics of access patterns in media systems Media reference rank distribution in log-log scale –Different systems have different access patterns –The distribution changes over time in a system (NOSSDAV’02) All follow stretched exponential distribution –Stretch factor c –Minus of slope a Physical meanings –Media file sizes –Aging effects of media objects –Deviation from the Zipf model log i ycyc # of references slope: -a c : stretch factor rank

22 Stretch factors of different systems

23 Stretch factor and media file size Other issues –Different encoding rates (file length in time is a better metric) –Different content type: entertainment, educational, business Stretch factor c reflects the view time of a video object EDU BIZ file size vs. stretch factor c 0 – 5 MB: c <= – 100 MB: 0.2 ~ 0.3 > 100 MB: c >= 0.3 c increases with file size

24 log i ycyc # of references slope: -a c : stretch factor rank slope = obj # requested obj. over time Stretched exponential parameters over time In a media system over time t –Constant request rate req –Constant object birth rate obj –Constant median file size Stretch factor c is a time invariant constant Parameter a increases with time t Objects created in [0, t) Objects created in (- , 0]: ~ O(log t) Popularity decreases exponentially 0

25 Evolution of media reference rank distribution Web mediaP2P media Reference rank distribution (slope = -a) Parameter a increase with time converge to a constant Caused by objects created before workload collection

26 E F Deviation from the Zipf model a increases with c ( c < 2) a increases with req / obj a increases with t Big media files have large deviation Deviation increases with time |EF| |OE| OE EF Big files O rank (log) reference # (log) SE Zipf

27 Example: YouTube Video Measurements in IMC’07 80 days in a campus network All downloads for all time Large a, large deviation Small a, small deviation Short time, old objects dominant: N’(t) >> obj t No old objects N’(t) = 0 Campus users: small request rate req Global users: large request rate req Zipf non-Zipf

28 Outline Motivation and objectives Stretched exponential of Internet media traffic Dynamics of access patterns in media systems Caching implications Conclusion

29 Caching analysis methodology Analyze media caching for short term workload –Distribution is stationary (evolution takes time) –Requests are independent –Object has unit size log i log y Zipf-like log i log y Stretched exponential Intuition from the distribution shape Highly concentrated requestsLess concentrated requests

30 Modeling caching performance Media caching is far less efficient than Web caching Parameter selection Zipf: typical Web workload (  = 0.8) SE: typical streaming media workload (c = 0.2, a = 0.25, same as ST-CLT-05) Asymptotic analysis for small cache size k (k << N) Zipf SE Web media

31 Potential of long term media caching Short term: Requests dominated by old objects, dilute concentration Long term: Requests dominated by new objects –Request concentration: significantly increased –Request correlation: objects can be purged when unpopular To achieve maximal concentration –Very long time (months to years) and huge amount of storage –Peer-to-peer systems are scalable for this purpose CDF of object ages CDF of requests in 11 days old objects new objects object popularity decreases exponentially

32 Concluding Remarks Media access patterns do not fit Zipf model Media access patterns are stretched exponential Our findings implies that –Client-server based proxy systems are not effective to deliver media contents –P2P systems are most suitable for this purpose We provide an analytical basis for the effectiveness of P2P media content delivery infrastructure

33 Thank you!