Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson
Cache Storage for the Next Billion 2 The Next Billion Developing regions are not all alike Many people have stable food, clean water, reasonable power Connectivity, however, is bad Growing middle class with desire for education & technology These people are the next billion
Cache Storage for the Next Billion 3 Bad Networking & Options Africa often backhauled through Europe Satellite latency not fun Ghana: 2Mbps, $6000/month! Emerging option: disk 1TB disk now $200 Even latency better than satellite
Cache Storage for the Next Billion 4 Enter the Tiny Laptops Problem – memory in 256MB range
Cache Storage for the Next Billion 5 Making Storage Work Populate disk with content Preloaded HTTP cache Preloaded WAN accelerator cache Preloaded Web sites – Wikipedia, etc Ship disk to schools Update as needed Pull update caches on-demand during peak Push updates off peak, overnight
Cache Storage for the Next Billion 6 Deployment Scenarios Special servers per school 2 for redundancy Average school size: /laptop, $10K/school Problems 2 $5K doubles per-school cost Servers dont ride laptop commodity curves Solution: no servers, just laptops
Cache Storage for the Next Billion 7 Goal: 1 TB Cache Store on a 256MB Laptop Why caching? Improves Web access Improves WAN access Problem Large disks are really slow Disk storage requires index In-memory indices optimize disk access
Cache Storage for the Next Billion 8 Memory Index Sizing Squid: popular HTTP cache 72 bytes/object Web objects average 8KB each 1TB = 125M objects 125M objects = 9GB RAM just for index Commercial caches: better RAM usage 32 bytes/object 1TB disk = 4GB RAM
Cache Storage for the Next Billion 9 Revisiting Cache Indexing Seek reduction important Most objects small Access largely random High insert rate Assume hit rate is 50% Assume cachable rate is 50% Insert rate = 25% of request rate High delete rate Caches largely full If insert rate = 25%, delete rate = 25% Deletion using LRU, etc
Cache Storage for the Next Billion 10 Restarting the Design Eliminate in-memory index Treat disk like memory Optimize data structures for locality Use location-sensitive algorithms Measure performance Now consider what to add For each addition, measure performance
Cache Storage for the Next Billion 11 What This Yields HashCache family One basic storage engine Pluggable algorithms & indexing HashCache proxy Web proxy using HashCache engine
Cache Storage for the Next Billion 12 Performance Comparison
Cache Storage for the Next Billion 13 Index Bits Per Object
Cache Storage for the Next Billion 14 Index Bits Per Object
Cache Storage for the Next Billion 15 HashCache Memory
Cache Storage for the Next Billion 16 Storage Limits w/2GB Index
Cache Storage for the Next Billion 17 Beyond Diminishing Returns HTTP cachability has upper limit Beyond that, items revalidated helps Revalidation on demand, or background Uncached content still cachable Wide-area accelerators Must still contact servers, though
Cache Storage for the Next Billion 18 Why WAN Acceleration? Lots of slowly-changing data Wikipedia News sites Customized sites WAN acceleration middleboxes Custom protocol between boxes Standard protocols to rest of net Less desirable than caches for Web
Cache Storage for the Next Billion 19 WAN Acceleration Dilemma WAN accelerators use chunks Transit stream broken into chunks Small chunks = high compression Also lots of small objects Large chunks = high performance But worse for compression Memory & disk important
Cache Storage for the Next Billion 20 Merging WAN Acc & HashCache Easily index huge # chunks Small chunks OK Large chunks better Store chunks redundantly Optimize for performance & compression Communicate tradeoffs to cache layer
Cache Storage for the Next Billion 21 Deployments Two cache instances deployed Both in Africa Shared machines, multiple services Working with OLPC on deployment Working on licensing Hopefully resolved this year Goal: all-in-one server for schools
Cache Storage for the Next Billion 22 Longer Term Goals Effort started around server consolidation Virtualization nice, except for memory Many apps very page-fault sensitive Extracting & sharing components desirable More work in developing regions Even within the US: poor, rural, etc Customization for school-like workloads More work on peak/off-peak behavior