Hot Systems, Volkmar Uhlig
On the scale and performance of cooperative Web proxy caching Alec Wolman, Geoffrey M. Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin, and Henry M. Levy University of Washington (SOSP ‘99, Kiawah Island SC)
Outline Concepts of cooperative web caches Cache simulation Request analysis UW + Microsoft Conclusion
Web Proxy Caches Internet Miss Hit
Reasoning for Caches Reduce download time Improve responsiveness Reduce internet bandwidth usage Save money
Idea: Cooperative Caches Overall Hit Rate?
Hierarchical Caching
Neighborhood Caches
Hash based Caching
Related Work – Proxies V. Almeida, A. Bestavros, M. Crovella, and A. de-Oliveira. Characterizing reference locality in the WWW. Technical Report , Boston University, June L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and Zipf-like distributions: Evidence and implications. In Proc. of IEEE INFOCOM ’99, pages 126–134, March R. Caceres, F. Douglis, A. Feldmann, G. Glass, and M. Rabinovich. Web proxy caching: The devil is in the details. In Workshop on Internet Server Performance, pages 111–118, June P. Cao. Characterization of Web proxy traffic and Wisconsin proxy benchmark Nov M. E. Crovella and A. Bestavros. Self-similarity in World Wide Web traffic: Evidence and possible causes. In Proc. of the ACM SIGMETRICS ’96 Conf., pages 160–169, May F. Douglis, A. Feldmann, B. Krishnamurthy, and J. Mogul. Rate of change and other metrics: a live study of the World Wide Web. In Proc. of the 1 st USENIX Symp. on Internet Technologies and Systems, pages 147–158, Dec B. Duska, D. Marwood, and M. J. Feeley. The measured access characteristics of World Wide Web client proxy caches. In Proc. of the 1st USENIX Symp. on Internet Technologies and Systems, pages 23–36, Dec A. Feldmann, R. Caceres, F. Douglis, G. Glass, and M. Rabinovich. Performance of web proxy caching in heterogeneous bandwidth environments. In Proc. of IEEE INFOCOM ’99, March S. D. Gribble and E. A. Brewer. System design issues for Internet middleware services: Deductions from a large client trace. In Proc. of the 1st USENIX Symp.on Internet Technologies and Systems, pages 207–218, Dec T. M. Kroeger, D. D. E. Long, and J. C. Mogul. Exploring the bounds of Web latency reduction from caching and prefetching. In Proc. of the 1st USENIX Symp. on Internet Technologies and Systems, pages 13–22, Dec M. Rabinovich, J. Chase, and S. Gadde. Not all hits are created equal: Cooperative proxy caching over a wide area network. In Proc. of the 3rd Int. WWW Caching Workshop, June 1998.
Related Work – Locality V. Almeida, A. Bestavros, M. Crovella, and A. de-Oliveira. Characterizing reference locality in the WWW. Technical Report , Boston University, June L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and Zipf-like distributions: Evidence and implications. In Proc. of IEEE INFOCOM ’99, pages 126– 134, March P. Cao and S. Irani. Cost-aware WWW proxy caching algorithms. In Proc. of the 1st USENIX Symp. on Internet Technologies and Systems, pages 193–206, Dec C. R. Cunha, A. Bestavros, and M. E. Crovella. Characteristics of WWW client-based traces. Technical Report BU-CS , Boston University, July S. Glassman. A caching relay for the World Wide Web. In Proc. First Int. World Wide Web Conf., pages 60–76, May T. M. Kroeger, J. C. Mogul, and C. Maltzahn. Digital’s Web proxy traces. ftp://ftp.digital.com/pub/DEC/traces/proxy/webtraces.html, August 1996.
Scope of the paper What is the best performance one could achieve with “perfect” caching? For what range of client populations can cooperative caching work effectively? Does the way in which clients are assigned to caches matter? What cache hit rates are necessary to achieve worthwhile decreases in document access latency?
Cache Simulations – How? Collect traces (i.e. packet sniffer) Model cache behavior Play traces against cache model Analyze
Cache Traces TCP_MISS/ GET - DIRECT/i30www.ira.uka.de text/html TCP_MISS/ GET - DIRECT/i30www.ira.uka.de text/html TCP_MISS/ GET - DIRECT/i30www.ira.uka.de text/html TCP_REFRESH_HIT/ GET - DIRECT/i30www.ira.uka.de text/css TCP_REFRESH_HIT/ GET - DIRECT/i30www.ira.uka.de text/css TCP_REFRESH_HIT/ GET - DIRECT/i30www.ira.uka.de image/jpeg TCP_REFRESH_HIT/ GET - DIRECT/i30www.ira.uka.de image/jpeg TCP_REFRESH_HIT/ GET - DIRECT/i30www.ira.uka.de image/gif TCP_REFRESH_HIT/ GET - DIRECT/i30www.ira.uka.de image/jpeg TCP_CLIENT_REFRESH_MISS/ GET - DIRECT/ image/gif TCP_CLIENT_REFRESH_MISS/ GET - DIRECT/ image/gif TCP_CLIENT_REFRESH_MISS/ GET - DIRECT/aftenposten.no image/gif TCP_MEM_HIT/ GET - NONE/- image/gif sec TCP_MISS1465GET DIRECT/i30www.ira.uka.detext/html
Simulation Methodology Infinite sized caches No expiration for objects No compulsory misses (cold start) Ideal vs. Practical Cache (cacheability)
Simulation of Cooperative Caching Optimistic simulation model: Working set of all combined caches No inter-proxy communication latency One HUGE cache server
Collect Traces Microsoft University of Washington Traces of same period of time
University of Washington 82.8 million HTTP requests 18.4 million HTTP objects 677 GB total requested bytes 137 requests/second 22,984 clients 244,211 servers 7 days
Microsoft Cooperation million HTTP requests 15.3 million HTTP objects total requested bytes not available 199 requests/second 60,233 clients 306,586 servers 6 days 6 hours
Experiment Analysis Hit rate (object, byte) Request latency Bandwidth Locality
Request Hit-Rate / # Clients Caches with more than 2500 clients do not increase hit rates significantly!
Byte Hit-Rate / # Clients (UW)
Object Request Latency More clients do not reduce object latency significantly.
Bandwidth / # Clients There is no relation between number of clients and bandwidth utilization!
Locality: Proxies and Organizations University of Washington Museum of Art and Natural History Music Department Schools of Nursing and Dentistry Scandinavian Languages Computer Science comparable to cooperating businesses
Local and Global Proxy Hit rates
Randomly populated vs. UW organizations Locality is minimal (about 4%)
Impact of larger populations
Large-scale Experiment Microsoft University of Washington 23K Clients 60K Clients
Cooperative Caching Microsoft + UW
Further Aspects Analytic model of Web accesses Popularity Expiration of documents Rate of change
Summary and Conclusions Cooperative caching with small population is effective (< 2500) Can be handled by single server Locality not significant Limitations due to cacheability Further research should focus on improving cacheability!