Squirrel: A peer-to-peer web cache Sitaram Iyer Joint work with Ant Rowstron (MSRC) and Peter Druschel
Peer-to-peer Computing Decentralize a distributed protocol: – Scalable – Self-organizing – Fault tolerant – Load balanced Not automatic!!
Web Caching 1. Latency, 2. External bandwidth, 3. Server load. ISPs, Corporate network boundaries, etc. Cooperative Web Caching: group of web caches tied together and acting as one web cache.
Web Cache Browser Cache Browser Cache Centralized Web Cache Web Server Sharing ! LAN Internet
Decentralized Web Cache Browser Cache Browser Cache Web Server LAN Internet Why? How?
Why peer-to-peer ? 1. Cost of dedicated web cache No additional hardware 2. Administrative costs Self-organizing 3. Scaling needs upgrading Resources grow with clients 4. Single point of failure Fault-tolerant by design
Setting Corporate LAN ,000 desktop machines Single physical location Each node runs an instance of Squirrel Sets it as the browser’s proxy
Pastry Peer-to-peer object location and routing substrate Distributed Hash Table: reliably map an object key to a live node Routes in log 2 b (N) steps (e.g. 3-4 steps for 100,000 nodes, with b=16 )
Home-store model client home LAN Internet URL hash
Home-store model client home … that’s how it works!
Directory model Client nodes always store objects in local caches. Main difference between the two schemes: whether the home node also stores the object. In the directory model, it only stores pointers to recent clients, and forwards requests to them.
Directory model client home Net LAN
Directory model client delegate home rando m entry
(skip) Full directory protocol dir server e : cGET req origin other req home req client req 2 b : not-modified 3 e c,e : req c,e : object 1 4 a, d 2 a, d : req 1 a : no dir, go to origin. Also d not-modified object or dele- gate
Recap Two endpoints of design space, based on the choice of storage location. At first sight, both seem to do about as well. (e.g. hit ratio, latency).
Quirk Consider a – Web page with many images, or – Heavily browsing node In the Directory scheme, Many home nodes pointing to one delegate Home-store: natural load balancing.. evaluation on trace-based workloads..
Trace characteristics RedmondCambridge Total duration1 day31 days Number of clients36, Number of HTTP requests16.41 million0.971 million Peak request rate606 req/sec186 req/sec Number of objects5.13 million0.469 million Number of cacheable objects2.56 million0.226 million Mean cacheable object reuse5.4 times3.22 times
Total external bandwidth Total external bandwidth (in GB) [lower is better] Per-node cache size (in MB) Directory Home-store No web cache Centralized cache Redmond
Total external bandwidth Total external bandwidth (in GB) [lower is better] Per-node cache size (in MB) Directory Home-store No web cache Centralized cache Cambridge
LAN Hops Redmond
LAN Hops 0% 20% 40% 60% 80% 100% Fraction of cacheable requests Total hops within the LAN CentralizedHome-storeDirectory Cambridge
Load in requests per sec Number of such seconds Max objects served per-node / second Home-store Directory Redmond
Load in requests per sec e+06 1e Number of such seconds Max objects served per-node / second Home-store Directory Cambridge
Load in requests per min Number of such minutes Max objects served per-node / minute Home-store Directory Redmond
Load in requests per min Number of such minutes Max objects served per-node / minute Home-store Directory Cambridge
Conclusion Possible to decentralize web caching Performance comparable to centralized cache Is better in terms of cost, administration, scalability and fault tolerance.
(backup) Storage utilization Redmond Home-storeDirectory Total MB61652 MB Mean per-node 2.6 MB1.6 MB Max per-node 1664 MB
(backup) Fault tolerance Home-storeDirectory Equations Mean H/O Max H max /O Mean (H+S)/O Max max(H max,S max )/O Redmond Mean % Max % Mean 0.198% Max 1.5% Cambridge Mean 0.95% Max 3.34% Mean 1.68% Max 12.4%
(backup) Full home-store protocol server client other req home req a : object or notmod from home b : object or notmod from origin 3 1 b 2 (WAN) (LAN) origin b : req
(backup) Full directory protocol dir server e : cGET req origin other req home req client req 2 b : not-modified 3 e c,e : req c,e : object 1 4 a, d 2 a, d : req 1 a : no dir, go to origin. Also d not-modified object or dele- gate