TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

Slides:



Advertisements
Similar presentations
Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson.
Advertisements

1 “Tracking the Evolution of Web Traffic: Felix Hernandez-Campos, Kevin Jeffay, F. Donelson Smith IEEE/ACM International Symposium on Modeling,
Memory System Characterization of Big Data Workloads
An Empirical Study of Real Audio Traffic A. Mena and J. Heidemann USC/Information Sciences Institute In Proceedings of IEEE Infocom Tel-Aviv, Israel March.
1 School of Computing Science Simon Fraser University, Canada Modeling and Caching of P2P Traffic Mohamed Hefeeda Osama Saleh ICNP’06 15 November 2006.
Fresh Analysis of Streaming Media Stored on the Web Rabin Karki M.S. Thesis Presentation Advisor: Mark Claypool Reader: Emmanuel Agu 10 Jan, 2011.
EEC-484/584 Computer Networks Lecture 6 Wenbing Zhao
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
TC2-Computer Literacy Mr. Sencer February 4, 2010.
EEC-484/584 Computer Networks Discussion Session for HTTP and DNS Wenbing Zhao
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Exploiting SCI in the MultiOS management system Ronan Cunniffe Brian Coghlan SCIEurope’ AUG-2000.
Wide-area Network Acceleration for the Developing World Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)
Tracking the Evolution of Web Traffic: Felix Hernandez-Campos, Kevin Jeffay F. Donelson Smith IEEE/ACM International Symposium on Modeling, Analysis.
Web Caching Robert Grimm New York University. Before We Get Started  Illustrating Results  Type Theory 101.
Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
February 11, 2003Ninth International Symposium on High Performance Computer Architecture Memory System Behavior of Java-Based Middleware Martin Karlsson,
Junxian Huang 1 Feng Qian 2 Yihua Guo 1 Yuanyuan Zhou 1 Qiang Xu 1 Z. Morley Mao 1 Subhabrata Sen 2 Oliver Spatscheck 2 1 University of Michigan 2 AT&T.
 Proxy Servers are software that act as intermediaries between client and servers on the Internet.  They help users on private networks get information.
Towards Understanding Modern Web Traffic
Systems I Locality and Caching
On the Use and Performance of Content Distribution Networks Balachander Krishnamurthy Craig Wills Yin Zhang Presenter: Wei Zhang CSE Department of Lehigh.
1 One-Click Hosting Services: A File-Sharing Hideout Demetris Antoniades Evangelos P. Markatos ICS-FORTH Heraklion,
Design and Implement an Efficient Web Application Server Presented by Tai-Lin Han Date: 11/28/2000.
Global NetWatch Copyright © 2003 Global NetWatch, Inc. Factors Affecting Web Performance Getting Maximum Performance Out Of Your Web Server.
Infrastructure for Better Quality Internet Access & Web Publishing without Increasing Bandwidth Prof. Chi Chi Hung School of Computing, National University.
A Geographical Characterization of YouTube: a Latin American View Fernando Duarte, Fabrício Benevenuto, Virgílio Almeida, Jussara Almeida Federal University.
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
Healing the Web: An Overview of CoDeeN & Related Projects Vivek Pai, Larry Peterson + many others Princeton University.
Web Prefetching Between Low-Bandwidth Clients and Proxies : Potential and Performance Li Fan, Pei Cao and Wei Lin Quinn Jacobson (University of Wisconsin-Madsion)
1 A Comparative Study of Handheld and Non-Handheld Traffic in Campus Wi-Fi Networks Aaron Gember, Ashok Anand, and Aditya Akella University of Wisconsin—Madison.
Segment-Based Proxy Caching of Multimedia Streams Authors: Kun-Lung Wu, Philip S. Yu, and Joel L. Wolf IBM T.J. Watson Research Center Proceedings of The.
PlanetLab Applications and Federation Kiyohide NAKAUCHI NICT 23 rd ITRC Symposium 2008/05/16 Aki NAKAO Utokyo / NICT
POSTER TEMPLATE BY: Whitewater HTTP Vulnerabilities Nick Berry, Joe Joyce, & Kevin Vaccaro. Syntax & Routing Attempt to capture.
Context-Aware Interactive Content Adaptation Iqbal Mohomed, Jim Cai, Sina Chavoshi, Eyal de Lara Department of Computer Science University of Toronto MobiSys2006.
Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.
Understanding the Performance of Web Caching System with an Analysis Model and Simulation Xiaosong Hu Nur Zincir-Heywood Sep
Web Cache Redirection using a Layer-4 switch: Architecture, issues, tradeoffs, and trends Shirish Sathaye Vice-President of Engineering.
Kiew-Hong Chua a.k.a Francis Computer Network Presentation 12/5/00.
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
Sharing Social Content from Home: A Measurement-driven Feasibility Study Massimiliano Marcon Bimal Viswanath Meeyoung Cha Krishna Gummadi NOSSDAV 2011.
The Intranet.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.
Performance of Web Proxy Caching in Heterogeneous Bandwidth Environments IEEE Infocom, 1999 Anja Feldmann et.al. AT&T Research Lab 발표자 : 임 민 열, DB lab,
Hot Systems, Volkmar Uhlig
We.b : The web of short URLs Demetris Antoniades, lasonas Polakis, Gerogios Kontaxis, Elias Athansapoulos, Sotiris loannidis, Evangelos P.Markatos, Thomas.
MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.
Performance Limitations of ADSL Users: A Case Study Matti Siekkinen, University of Oslo Denis Collange, France Télécom R&D Guillaume Urvoy-Keller, Ernst.
Video Caching in Radio Access network: Impact on Delay and Capacity
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
Web Proxy Caching: The Devil is in the Details Ramon Caceres, Fred Douglis, Anja Feldmann Young-Ho Suh Network Computing Lab. KAIST Proceedings of the.
#16 Application Measurement Presentation by Bobin John.
CoDeeN,Large Files, & CoDeploy KyoungSoo Park, Vivek Pai, Larry Peterson Princeton University.
Modeling and Caching of P2P Traffic Osama Saleh Thesis Defense and Seminar 21 November 2006.
John S. Otto Mario A. Sánchez John P. Rula Fabián E. Bustamante Northwestern, EECS.
John S. Otto, Mario A. Sanchez, David R. Choffnes*, Fabián E. Bustamante, Georgos Siganos** Northwestern, EECS * U.
The Dark Side of the Web: An Open Proxy’s View Vivek Pai, Limin Wang, KyoungSoo Park, Ruoming Pang, and Larry Peterson Princeton University.
Wide-area Network Acceleration for the Developing World
Coral: A Peer-to-peer Content Distribution Network
Scale and Performance in the CoBlitz Large-File Distribution Service
The Impact of Replacement Granularity on Video Caching
Web Caching? Web Caching:.
Co* Projects : CoDNS, CoDeploy, CoMon
Direct Internet 3 Iridium Proprietary and Confidential 9/18/2018.
ECE 671 – Lecture 16 Content Distribution Networks
Distributed Systems CS
Objectives To understand the about types of computer network
Presentation transcript:

TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

2 IMPROVING NETWORK ACCESS IN THE DEVELOPING WORLD Internet access is a scarce commodity in the developing world: expensive / slow Our focus: improving performance of connected network access Non-focus: providing/extending connectivity (e.g., DTN, WiLDNet) Sunghwan Ihm, Princeton University 2

3 POSSIBLE OPTIONS Web proxy caching Whole objects Single endpoint (local) Designated cacheable traffic only WAN acceleration Packet-level caching Mostly for enterprise Two (or more) endpoints, coordinated Effective in first world Sunghwan Ihm, Princeton University 3

4 DEVELOPING WORLD QUESTIONS How effective are these approaches? Systems designed for first-world use Most traffic studies small, first-world focused How similar is developing region traffic? Any new opportunities to exploit? Differences in traffic Differences in cost/tradeoffs System design issues Sunghwan Ihm, Princeton University 4

5 UNDERSTANDING DEVELOPING WORLD TRAFFIC Goal Shape system design by better understanding the traffic optimization opportunities Requirements Large-scale, content-focused analysis Sunghwan Ihm, Princeton University 5

6 PRIOR TRAFFIC ANALYSIS WORK Large scale traffic analysis Internet Study 2007, 2008/2009 by ipoque One million users High-level characteristics via DPI First-world focus Developing world traffic analysis Du et al. WWW’06, Johnson et al. NSDR’10 Proxy-level analysis from kiosk, Internet cafes, and community centers Sunghwan Ihm, Princeton University 6

7 OUR APPROACH Combine best features Large-scale and content-focused First world and developing world Use traffic from CoDeeN content distribution network (CDN) Global proxy (500+ PlanetLab nodes) Running since million requests per day Sunghwan Ihm, Princeton University 7

8 WHAT TO ANALYZE? 1. Traffic profile 2. Caching opportunities 3. User behavior Sunghwan Ihm, Princeton University 8

9 DATA COLLECTION Origin Web Server Local Proxy Cache User Browser Cache CoDeeN Cache WAN Assume local proxy caches Focus on cache misses only Capture full content 9 9 Sunghwan Ihm, Princeton University

10 DATA SET Duration: 1 week (March 25-31, 2010) # Requests: 157 Million Volume: 3 TeraBytes # Clients (unique IPs): 348 K # Countries/Regions: 190 /8 networks coverage: 61.3% /16 networks coverage: 24.1% Sunghwan Ihm, Princeton University 10

11 TOP COUNTRIES Requests %Bytes % Clients % PL CN SA Etc. 11 DE (Germany) US (United States) RU (Russian Federation) AE (United Arab Emirates) PL (Poland) CN (China) SA (Saudi Arabia) DE US PL CN PL SA DE AE RU Etc.(185 Countries)

12 OECD VS. DEVREG OECD: the first world 27 high-income economies from OECD member countries 25% of total traffic DevReg: the developing world The remaining 163 countries and 3 OECD members: Mexico, Poland, and Turkey 75% of total traffic Sunghwan Ihm, Princeton University 12

13 ANALYSIS #1: TRAFFIC PROFILE Conjecture: DevReg users visit low-bandwidth Web pages (small objects and text-heavy) We often hear a variant of “Offline Wikipedia content suffices for developing world users” Sunghwan Ihm, Princeton University 13

14 Small: median 3KB vs. 5KB Large: similar demand/profile 16KB OBJECT SIZE Sunghwan Ihm, Princeton University 14

15 TEXT AND IMAGES DevReg has a higher fraction of images Exact opposite of bandwidth conjecture Sunghwan Ihm, Princeton University 15

16 VIDEO AND AUDIO DevReg: higher fraction of video & audio Music videos and MP3 songs Sunghwan Ihm, Princeton University 16

17 APPLICATION (FLASH) DevReg has a higher fraction of application traffic Median near 7% Sunghwan Ihm, Princeton University 17

18 ANALYSIS #1 SUMMARY Some evidence that DevReg-visited sites have smaller objects, but DevReg users visit large pages as well, and DevReg users seek a higher fraction of rich content than OECD users Sunghwan Ihm, Princeton University 18

19 ANALYSIS #2: CACHING OPPORTUNITY Conjecture: little gain from larger caches Some analysis suggests 1GB sufficient Typical cache size < 20GB Object-based caching Sunghwan Ihm, Princeton University 19

20 CONTENT-BASED CHUNK CACHING Split content into chunks Name chunks by content (SHA-1 hash) Cache chunks instead of objects Fetch content, send only modified chunks Two endpoints needed Applies to “uncacheable” content ABCDE Sunghwan Ihm, Princeton University 20

21 OVERALL REDUNDANCY 64 KB: objects or parts of large object 1 KB: parts of text pages 128 bytes: paragraphs or sentences Sunghwan Ihm, Princeton University 21

22 CACHE BEHAVIOR SIMULATION Simulate one week’s traffic Cache misses only LRU cache replacement policy Determine size for near-ideal hit rate Calculate byte hit ratio (BHR) Vary storage size (from 10MB to max) Results for US, China, and Brazil Sunghwan Ihm, Princeton University 22

23 US – 213 GB

24 CHINA – 559 GB

25 BRAZIL – 44 GB

26 ANALYSIS #2 SUMMARY Chunk caching useful Reduces WAN (cache miss) traffic Complements existing Web proxies Larger caches useful Useful reduction in miss rate Cheap compared to bandwidth costs Sunghwan Ihm, Princeton University 26

27 ANALYSIS #3: USER BEHAVIOR Conjecture: as first-world Web pages get larger, DevReg users suffer delays Mechanism: observe aborted transfers Intentional termination Automatic when browsing away Abort = users bored or downloads slow Sunghwan Ihm, Princeton University 27

28 CANCELLED OBJECT SIZE C-CDF Cancelled objects larger than normal (red) Complete objects (green) much larger than actual download (blue) Most downloads less than 10MB Sunghwan Ihm, Princeton University 28

29 CANCELLED TRANSFER VOLUME 17% of transfers are terminated early Due to the early termination, 25% of actual traffic If fully downloaded, would have been 80% of all bytes Overall traffic increase of 375% Sunghwan Ihm, Princeton University 29

30 CANCELLED CONTENT TYPES Most canceled responses were text Most bytes from video/audio/application Sunghwan Ihm, Princeton University 30

31 % CANCELLED REQUESTS CDF OECD cancel more often than DevReg Median almost double Sunghwan Ihm, Princeton University 31

32 ANALYSIS #3 SUMMARY Many transactions aborted Previewing video files Content-based caching is effective OECD users less patient than DevReg Cheap bandwidth = more sampling? Sunghwan Ihm, Princeton University 32

33 CONCLUSIONS First glimpse at CoDeeN traffic Large-scale, content-focused analysis OECD and developing world Many DevReg assumptions are false In fact, strong desire for rich content, and Patient despite slow connections Systems implications Chunk caching worth more exploration Larger caches very useful Sunghwan Ihm, Princeton University 33