Computer Science Department 1 Studying the Impact of More Complete Server Information on Web Caching Craig E. Wills and Mikhail Mikhailov Worcester Polytechnic Institute Presented by Mikhail Mikhailov May 23, 2000
Computer Science Department 2 Outline of Talk Observations Proposed approach Experiments –Methodology –Test sets –Results Conclusions Future work
Computer Science Department 3 Observations Heterogeneous dynamic content Monolithic pages, loss of information Changes are predictable, can be localized Heuristic approaches to caching (many validations)
Computer Science Department 4
5 Proposed Approach Object classification by type and change characteristics Preserve object identities Object Composition (vs. monolithic approach) Object Relationships Piggybacking
Computer Science Department 6 Exp1: Methodology (content reuse) Popular sites (100hot.com) and popular URLs (NLANR proxy logs) Unconditionally GET HTML and embedded images each day at the same time for 11 days Catalogue resources, compute MD5 Analyze changes with Chunking Tool
Computer Science Department 7 Exp1: Test Sets (content reuse) Cnt300 (7 NLANR logs) Top50 (50 most popular sites, 100hot.com) ECom (50 largest b-2-c shopping sites, 100hot.com) Srcheng (11 top search engines) EComQ (2 queries, top 10 EComm set) SrchengQ (2 queries, Srcheng set)
Computer Science Department 8 Exp1: Results (content reuse)
Computer Science Department 9 Exp2: Methodology (eliminating validation requests) NLANR proxy logs For each 304 response look for a 200 response from the same server within a given window (10 sec on each side) Focus on 304 responses for images
Computer Science Department 10 Exp2: Results (eliminating validation requests)
Computer Science Department 11 Exp3.1: Methodology / Results (object change characteristics) Dynamic, Access Dependent objects (Top50, R,R,15min,R) most of short-term changes occur immediately
Computer Science Department 12 Dependency-based objects (SrchengQ, EComQ, same query, retrieved daily) Exp3.2: Methodology / Results (object change characteristics) some changes may be attributed to dynamic/access dependent objects; further study needed
Computer Science Department 13 Input Dependent objects (SrchengQ, EComQ, different queries, retrieved daily) Exp3.3.1: Methodology / Results (object change characteristics)
Computer Science Department 14 Exp3.3.2: Methodology / Results (object change characteristics) Input Dependent objects (objects with cookies from Cnt300, Top50, ECom, obtain 2 cookies for each object, R-cookie1,R-cookie2)
Computer Science Department 15 Conclusions Proposed techniques have potential to: –increase content reuse –reduce number of validation requests
Computer Science Department 16 Future Work Combine object types and change characteristics with object relationships Extend web server and proxy caching software to support proposed techniques
Computer Science Department 17 Object classification by change characteristics Periodic (changes at regular intervals: hour, day, etc) Dependency-based (depends on a file or DB changing) Dynamic (different on every access, can’t be prefetched) Access Dependent (different on every access, can be prefetched) Input Dependent (query, cookies) Relatively Dynamic (changes frequently) Static (never changes) Relatively Static (changes infrequently)
Computer Science Department 18 Figure 1. Current News Composite Object