Wide-area Network Acceleration for the Developing World Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

Slides:

Advertisements

Similar presentations

Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol Li Fan, Pei Cao and Jussara Almeida University of Wisconsin-Madison Andrei Broder Compaq/DEC.

Advertisements

Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson.

Kernel memory allocation

Difference Engine: Harnessing Memory Redundancy in Virtual Machines by Diwaker Gupta et al. presented by Jonathan Berkhahn.

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy University of California at Santa Barbara.

ICDCS’07, Toronto, Canada1 SCAP: Smart Caching in Wireless Access Points to Improve P2P Streaming Enhua Tan 1, Lei Guo 1, Songqing Chen 2, Xiaodong Zhang.

Resilient Peer-to-Peer Streaming Paper by: Venkata N. Padmanabhan Helen J. Wang Philip A. Chou Discussion Leader: Manfred Georg Presented by: Christoph.

1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State Vsevolod V. Panteleenko Computer Science & Engineering.

Scalable Flow-Based Networking with DIFANE 1 Minlan Yu Princeton University Joint work with Mike Freedman, Jennifer Rexford and Jia Wang.

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.

Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.

Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,

Exploiting SCI in the MultiOS management system Ronan Cunniffe Brian Coghlan SCIEurope’ AUG-2000.

presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.

Object Naming & Content based Object Search 2/3/2003.

Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.

1 stdchk : A Checkpoint Storage System for Desktop Grid Computing Matei Ripeanu – UBC Sudharshan S. Vazhkudai – ORNL Abdullah Gharaibeh – UBC The University.

A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan.

1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.

CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.

Locality-Aware Request Distribution in Cluster-based Network Servers Presented by: Kevin Boos Authors: Vivek S. Pai, Mohit Aron, et al. Rice University.

Towards Understanding Modern Web Traffic

Slingshot: Deploying Stateful Services in Wireless Hotspots Ya-Yunn Su Jason Flinn University of Michigan.

Performance Tradeoffs for Static Allocation of Zero-Copy Buffers Pål Halvorsen, Espen Jorde, Karl-André Skevik, Vera Goebel, and Thomas Plagemann Institute.

Advanced Network Architecture Research Group 2001/11/149 th International Conference on Network Protocols Scalable Socket Buffer Tuning for High-Performance.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Global NetWatch Copyright © 2003 Global NetWatch, Inc. Factors Affecting Web Performance Getting Maximum Performance Out Of Your Web Server.

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.

Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.

1 Towards Cinematic Internet Video-on-Demand Bin Cheng, Lex Stein, Hai Jin and Zheng Zhang HUST and MSRA Huazhong University of Science & Technology Microsoft.

Slingshot: Deploying Stateful Services in Wireless Hotspots Ya-Yunn Su Jason Flinn University of Michigan Presenter: Youngki, Lee.

DBI313. MetricOLTPDWLog Read/Write mixMostly reads, smaller # of rows at a time Scan intensive, large portions of data at a time, bulk loading Mostly.

RAMCloud: Low-latency DRAM-based storage Jonathan Ellithorpe, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro,

TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

Advanced Network Architecture Research Group 2001/11/74 th Asia-Pacific Symposium on Information and Telecommunication Technologies Design and Implementation.

Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.

CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.

Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.

A Low-bandwidth Network File System Athicha Muthitacharoen et al. Presented by Matt Miller September 12, 2002.

Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.

1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.

Data Indexing in Peer- to-Peer DHT Networks Garces-Erice, P.A.Felber, E.W.Biersack, G.Urvoy-Keller, K.W.Ross ICDCS 2004.

MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.

09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.

Overview on Web Caching COSC 513 Class Presentation Instructor: Prof. M. Anvari Student name: Wei Wei ID:

Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah.

Cost-Effective Video Streaming Techniques Kien A. Hua School of EE & Computer Science University of Central Florida Orlando, FL U.S.A.

Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.

NetFlow Analyzer Best Practices, Tips, Tricks. Agenda Professional vs Enterprise Edition System Requirements Storage Settings Performance Tuning Configure.

1 Design and Implementation of a High-Performance Distributed Web Crawler Polytechnic University Vladislav Shkapenyuk, Torsten Suel 06/13/2006 석사 2 학기.

Wide-area Network Acceleration for the Developing World

Accelerating Peer-to-Peer Networks for Video Streaming

Coral: A Peer-to-peer Content Distribution Network

Scale and Performance in the CoBlitz Large-File Distribution Service

Optimizing the Migration of Virtual Computers

The Impact of Replacement Granularity on Video Caching

CSC 4250 Computer Architectures

Web Caching? Web Caching:.

Cache Memory Presentation I

Memory Management for Scalable Web Data Servers

VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

CMSC 611: Advanced Computer Architecture

COS 518: Advanced Computer Systems Lecture 9 Michael Freedman

Be Fast, Cheap and in Control

Kalyan Boggavarapu Lehigh University

Filesystems 2 Adapted from slides of Hank Levy

Content Distribution Networks + P2P File Sharing

Content Distribution Networks + P2P File Sharing

Presentation transcript:

Wide-area Network Acceleration for the Developing World Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

2 Sunghwan Ihm, Princeton University POOR INTERNET ACCESS IN THE DEVELOPING WORLD Internet access is a scarce commodity in the developing regions Expensive / slow Zambia example [Johnson et al. NSDR’10] 300 people share 1Mbps satellite link $1200 per month

3 Sunghwan Ihm, Princeton University WEB PROXY CACHING IS NOT ENOUGH Local Proxy Cache Users Slow Link Origin Web Servers Whole objects, designated cacheable traffic only Zambia: 43% cache hit rate, 20% byte hit rate

4 Sunghwan Ihm, Princeton University WAN ACCELERATION: FOCUS ON CACHE MISSES Packet-level (chunks) caching Mostly for enterprise Two (or more) endpoints, coordinated Users Slow Link Origin Web Servers WAN Accelerator

5 Sunghwan Ihm, Princeton University CONTENT FINGERPRINTING Split content into chunks Rabin’s fingerprinting over a sliding window Match n low-order bits of a global constant K Average chunk size: 2 n bytes Name chunks by content (SHA-1 hash) Cache chunks and pass references ABCDE

6 Sunghwan Ihm, Princeton University WHERE TO STORE CHUNKS Chunk data Usually stored on disk Can be cached in memory to reduce disk access Chunk metadata index Name, offset, length, etc. Partially or completely kept in memory to minimize disk accesses

7 Sunghwan Ihm, Princeton University HOW IT WORKS Origin Web Server WAN Accelerator Users 1.Users send requests 2.Get content from Web server 3.Generate chunks 4.Send cached chunk names + uncached raw content (compression) WAN Accelerator 1.Fetch chunk contents from local disk (cache hits) 2.Request any cache misses from sender-side node 3.As we assemble, send it to users (cut-through) Sender-Side Receiver-Side

8 Sunghwan Ihm, Princeton University WAN ACCELERATOR PERFORMACE Effective bandwidth (throughput) Original data size / total time Transfer: send data + refs Compression rate Reconstruction: hits from cache Disk performance Memory pressure High Low

9 Sunghwan Ihm, Princeton University DEVELOPING WORLD CHALLENGES/OPPORTUNITES Dedicated machine with ample RAM High-RPM SCSI disk Inter-office content only Star topology Shared machine with limited RAM Slow disk All content Mesh topology VS. Poor Performance! EnterpriseDeveloping World

10 Sunghwan Ihm, Princeton University WANAX: HIGH-PERFORMANCE WAN ACCELERATOR Design Goals Maximize compression rate Minimize memory pressure Maximize disk performance Exploit local resources Contributions Multi-Resolution Chunking (MRC) Peering Intelligent Load Shedding (ILS)

11 Sunghwan Ihm, Princeton University SINGLE RESOLUTION CHUNKING (SRC) TRADEOFFS 75% saving 3 disk reads 3 index entries 93.75% saving 15 disk reads 15 index entries High compression rate High memory pressure Low disk performance Low compression rate Low memory pressure High disk performance QW RT IO XC PZ VN EA YU QW RT IO XC PZ VN EB YU

12 Sunghwan Ihm, Princeton University MULTI-RESOLUTION CHUNKING (MRC) Use multiple chunk sizes simultaneously Large chunks for low memory pressure and disk seeks Small chunks for high compression rate 93.75% saving 6 disk reads 6 index entries

13 Sunghwan Ihm, Princeton University GENERATING MRC CHUNKS Original Data HIT Detect smallest chunk boundaries first Larger chunks are generated by matching more bits of the detected boundaries Clean chunk alignment + less CPU

14 Sunghwan Ihm, Princeton University STORING MRC CHUNKS Store every chunk regardless of content overlaps No association among chunks One index entry load + one disk seek Reduce memory pressure and disk seeks Disk space is cheap, disk seeks are limited BC A BCA *Alternative storage options in the paper

15 Sunghwan Ihm, Princeton University PEERING Cheap or free local networks (ex: wireless mesh, WiLDNet) Multiple caches in the same region Extra memory and disk Use Highest Random Weight (HRW) Robust to node churn Scalable: no broadcast or digests exchange Trade CPU cycles for low memory footprint

16 Sunghwan Ihm, Princeton University INTELLIGENT LOAD SHEDDING (ILS) Two sources of fetching chunks Cache hits from disk Cache misses from network Fetching chunks from disks is not always desirable Disk heavily loaded (shared/slow) Network underutilized Solution: adjust network and disk usage dynamically to maximize throughput

17 Sunghwan Ihm, Princeton University SHEDDING SMALLEST CHUNK Move the smallest chunks from disk to network, until network becomes the bottleneck Disk  one seek regardless of chunk size Network  proportional to chunk size Total latency  max(disk, network) Disk Network Peer Time

18 Sunghwan Ihm, Princeton University SIMULATION ANALYSIS News Sites Web browsing of dynamically generated Web sites (1GB) Refresh the front pages of 9 popular Web sites every five minutes CNN, Google News, NYTimes, Slashdot, etc. Linux Kernel Large file downloads (276MB) Two different versions of Linux kernel source tar file Bandwidth savings, and # disk reads

19 Sunghwan Ihm, Princeton University BANDWITH SAVINGS SRC: high metadata overhead MRC: close to ideal bandwidth savings

20 Sunghwan Ihm, Princeton University DISK FETCH COST MRC greatly reduces # of disk seeks 22.7x at 32 bytes

21 Sunghwan Ihm, Princeton University CHUNK SIZE BREAKDOWN SRC: high metadata overhead MRC: much less # of disk seeks and index entries (40% handled by large chunks)

22 Sunghwan Ihm, Princeton University EVALUTION Implementation Single-process event-driven architecture LOC in C HashCache (NSDI’09) as chunk storage Intra-region bandwidths 100Mbps Disable in-memory cache for content Large working sets do not fit in memory Machines AMD Athlon 1GHz / 1GB RAM / SATA Emulab pc850 / P3-850 / 512MB RAM / ATA

23 Sunghwan Ihm, Princeton University MICROBENCHMARK MRC ILS PEER Better 90% similar 1 MB files 512kbps / 200ms RTT No tuning required

24 Sunghwan Ihm, Princeton University REALISTIC TRAFFIC: YOUTUBE Classroom scenario 100 clients download 18MB clip 1Mbps / 1000ms RTT Better High Quality (490) Low Quality (320)

25 Sunghwan Ihm, Princeton University IN THE PAPER Enterprise test MRC scales to high performance servers Other storage options MRC has the lowest memory pressure Saving disk space increases memory pressure More evaluations Overhead measurement Alexa traffic

26 Sunghwan Ihm, Princeton University CONCLUSIONS Wanax : scalable / flexible WAN accelerator for the developing regions MRC : high compression / high disk performance / low memory pressure ILS : adjust disk and network usage dynamically for better performance Peering : aggregation of disk space, parallel disk access, and efficient use of local bandwidth