Download presentation
Presentation is loading. Please wait.
Published byLeon Gordon Modified over 9 years ago
1
TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)
2
2 IMPROVING NETWORK ACCESS IN THE DEVELOPING WORLD Internet access is a scarce commodity in the developing world: expensive / slow Our focus: improving performance of connected network access Non-focus: providing/extending connectivity (e.g., DTN, WiLDNet) Sunghwan Ihm, Princeton University 2
3
3 POSSIBLE OPTIONS Web proxy caching Whole objects Single endpoint (local) Designated cacheable traffic only WAN acceleration Packet-level caching Mostly for enterprise Two (or more) endpoints, coordinated Effective in first world Sunghwan Ihm, Princeton University 3
4
4 DEVELOPING WORLD QUESTIONS How effective are these approaches? Systems designed for first-world use Most traffic studies small, first-world focused How similar is developing region traffic? Any new opportunities to exploit? Differences in traffic Differences in cost/tradeoffs System design issues Sunghwan Ihm, Princeton University 4
5
5 UNDERSTANDING DEVELOPING WORLD TRAFFIC Goal Shape system design by better understanding the traffic optimization opportunities Requirements Large-scale, content-focused analysis Sunghwan Ihm, Princeton University 5
6
6 PRIOR TRAFFIC ANALYSIS WORK Large scale traffic analysis Internet Study 2007, 2008/2009 by ipoque One million users High-level characteristics via DPI First-world focus Developing world traffic analysis Du et al. WWW’06, Johnson et al. NSDR’10 Proxy-level analysis from kiosk, Internet cafes, and community centers Sunghwan Ihm, Princeton University 6
7
7 OUR APPROACH Combine best features Large-scale and content-focused First world and developing world Use traffic from CoDeeN content distribution network (CDN) Global proxy (500+ PlanetLab nodes) Running since 2003 30+ million requests per day Sunghwan Ihm, Princeton University 7
8
8 WHAT TO ANALYZE? 1. Traffic profile 2. Caching opportunities 3. User behavior Sunghwan Ihm, Princeton University 8
9
9 DATA COLLECTION Origin Web Server Local Proxy Cache User Browser Cache CoDeeN Cache WAN Assume local proxy caches Focus on cache misses only Capture full content 9 9 Sunghwan Ihm, Princeton University
10
10 DATA SET Duration: 1 week (March 25-31, 2010) # Requests: 157 Million Volume: 3 TeraBytes # Clients (unique IPs): 348 K # Countries/Regions: 190 /8 networks coverage: 61.3% /16 networks coverage: 24.1% Sunghwan Ihm, Princeton University 10
11
11 TOP COUNTRIES Requests %Bytes % Clients % PL CN SA Etc. 11 DE (Germany) US (United States) RU (Russian Federation) AE (United Arab Emirates) PL (Poland) CN (China) SA (Saudi Arabia) DE US PL CN PL SA DE AE RU Etc.(185 Countries)
12
12 OECD VS. DEVREG OECD: the first world 27 high-income economies from OECD member countries 25% of total traffic DevReg: the developing world The remaining 163 countries and 3 OECD members: Mexico, Poland, and Turkey 75% of total traffic Sunghwan Ihm, Princeton University 12
13
13 ANALYSIS #1: TRAFFIC PROFILE Conjecture: DevReg users visit low-bandwidth Web pages (small objects and text-heavy) We often hear a variant of “Offline Wikipedia content suffices for developing world users” Sunghwan Ihm, Princeton University 13
14
14 Small: median 3KB vs. 5KB Large: similar demand/profile 16KB OBJECT SIZE Sunghwan Ihm, Princeton University 14
15
15 TEXT AND IMAGES DevReg has a higher fraction of images Exact opposite of bandwidth conjecture Sunghwan Ihm, Princeton University 15
16
16 VIDEO AND AUDIO DevReg: higher fraction of video & audio Music videos and MP3 songs Sunghwan Ihm, Princeton University 16
17
17 APPLICATION (FLASH) DevReg has a higher fraction of application traffic Median near 7% Sunghwan Ihm, Princeton University 17
18
18 ANALYSIS #1 SUMMARY Some evidence that DevReg-visited sites have smaller objects, but DevReg users visit large pages as well, and DevReg users seek a higher fraction of rich content than OECD users Sunghwan Ihm, Princeton University 18
19
19 ANALYSIS #2: CACHING OPPORTUNITY Conjecture: little gain from larger caches Some analysis suggests 1GB sufficient Typical cache size < 20GB Object-based caching Sunghwan Ihm, Princeton University 19
20
20 CONTENT-BASED CHUNK CACHING Split content into chunks Name chunks by content (SHA-1 hash) Cache chunks instead of objects Fetch content, send only modified chunks Two endpoints needed Applies to “uncacheable” content ABCDE Sunghwan Ihm, Princeton University 20
21
21 OVERALL REDUNDANCY 40% @ 64 KB: objects or parts of large object 60% @ 1 KB: parts of text pages 65% @ 128 bytes: paragraphs or sentences Sunghwan Ihm, Princeton University 21
22
22 CACHE BEHAVIOR SIMULATION Simulate one week’s traffic Cache misses only LRU cache replacement policy Determine size for near-ideal hit rate Calculate byte hit ratio (BHR) Vary storage size (from 10MB to max) Results for US, China, and Brazil Sunghwan Ihm, Princeton University 22
23
23 US – 213 GB
24
24 CHINA – 559 GB
25
25 BRAZIL – 44 GB
26
26 ANALYSIS #2 SUMMARY Chunk caching useful Reduces WAN (cache miss) traffic Complements existing Web proxies Larger caches useful Useful reduction in miss rate Cheap compared to bandwidth costs Sunghwan Ihm, Princeton University 26
27
27 ANALYSIS #3: USER BEHAVIOR Conjecture: as first-world Web pages get larger, DevReg users suffer delays Mechanism: observe aborted transfers Intentional termination Automatic when browsing away Abort = users bored or downloads slow Sunghwan Ihm, Princeton University 27
28
28 CANCELLED OBJECT SIZE C-CDF Cancelled objects larger than normal (red) Complete objects (green) much larger than actual download (blue) Most downloads less than 10MB Sunghwan Ihm, Princeton University 28
29
29 CANCELLED TRANSFER VOLUME 17% of transfers are terminated early Due to the early termination, 25% of actual traffic If fully downloaded, would have been 80% of all bytes Overall traffic increase of 375% Sunghwan Ihm, Princeton University 29
30
30 CANCELLED CONTENT TYPES Most canceled responses were text Most bytes from video/audio/application Sunghwan Ihm, Princeton University 30
31
31 % CANCELLED REQUESTS CDF OECD cancel more often than DevReg Median almost double Sunghwan Ihm, Princeton University 31
32
32 ANALYSIS #3 SUMMARY Many transactions aborted Previewing video files Content-based caching is effective OECD users less patient than DevReg Cheap bandwidth = more sampling? Sunghwan Ihm, Princeton University 32
33
33 CONCLUSIONS First glimpse at CoDeeN traffic Large-scale, content-focused analysis OECD and developing world Many DevReg assumptions are false In fact, strong desire for rich content, and Patient despite slow connections Systems implications Chunk caching worth more exploration Larger caches very useful Sunghwan Ihm, Princeton University 33
34
sihm@cs.princeton.edu http://www.cs.princeton.edu/~sihm/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.