October 14, 2002MASCOTS Workload Characterization in Web Caching Hierarchies Guangwei Bai Carey Williamson Department of Computer Science University of Calgary
October 14, 2002MASCOTS Talk Outline 1.Problem Statement 2.Experimental Methodology 3.Simulation Results 4.Modeling Results 5.Summary and Conclusions
October 14, 2002MASCOTS Introduction World Wide Web: One of the most popular applications on today’s Internet Web proxy caching: A technique used for improving performance and scalability of the Internet
October 14, 2002MASCOTS Internet Web Server Web Proxy Caching System …Web Clients… Illustration of Web Proxy Cache Filtering Effect Original Request Stream Filtered Request Stream
October 14, 2002MASCOTS Example of Web cache filter effect Time ID A B C A D B B A E Arriving Request StreamFiltered Request Stream Time ID A B C D B E Web Proxy Cache … …
October 14, 2002MASCOTS Example of Web cache filter effect Time ID A B C A D B B A E Arriving Request StreamFiltered Request Stream Time ID A B C D B E Web Proxy Cache Frequency-domain effect … …
October 14, 2002MASCOTS Example of Web cache filter effect Time ID A B C A D B B A E Arriving Request StreamFiltered Request Stream Time ID A B C D B E … Web Proxy Cache Time-domain effect …
October 14, 2002MASCOTS Goal of this Work: Time-domain analysis of cache filter effects in Web caching hierarchies : o Study impact of a cache on the structural characteristics of Web request workload (mean, peak, variance, self-similarity) o Sensitivity of filter effect to cache configuration (cache size and cache replacement policy) o Characterizing aggregate Web request streams in a multi-level Web proxy caching hierarchy
October 14, 2002MASCOTS Multi-Level Web Proxy Caching System Web Proxy Cache 1 Web Proxy Cache 2Web Proxy Cache Child Level Parent Level
October 14, 2002MASCOTS Experimental Methodology Trace-driven simulation Web proxy cache simulator Synthetic Web proxy workloads o Controllable characteristics o Trace length: about 1M requests o Zipf slope: -0.75, -0.8 o Request arrival process: Deterministic, Poisson, Self-Similar
October 14, 2002MASCOTS Time (sec) Hit Ratio 16:00 15:3012:30 12:00 Requests per 5-minute Interval Time (sec) :0015:30 12:3012: General Observations: Filter Effects Arrival Counts Cache Hit Ratio
October 14, 2002MASCOTS Effect of Cache Configuration Experimental factors: Cache size determines the maximum number of Web Content bytes that can be held in the cache at one time Cache Replacement Policy determines what object(s) to remove from the cache when more space is needed to store an incoming object (e.g. RAND, FIFO, LRU, LFU, GDS) (Assumption: arrival process is Poisson)
October 14, 2002MASCOTS Effect of Cache Size on Traffic Structure Frequency in Percent Requests per 1-minute Interval (a) Effect of cache size Marginal Distribution Plot (pdf)
October 14, 2002MASCOTS Effect of Cache Replacement Policy Frequency Requests per 1-minute Arrival (b) Effect of cache policy (8 KB)
October 14, 2002MASCOTS Input: Deterministic Arrival Process Main Observations: Reduces mean arrival rate of filtered request stream Increases variance of the filtered request stream Statistics Before Cache Cache Size (MB) Mean Standard Deviation Hit Ratio %47.8%52.7%55.5%59.1%62.7%
Input: Poisson Arrival Process Main Observations: Large impact on mean; little impact on variance Variance-to-mean ratio increases with cache size For small cache sizes, the filtered stream is well-characterized as a Poisson process. Statistics Before Cache Cache Size (MB) Mean Standard Deviation Hit Ratio %47.8%52.7%55.5%59.1%62.7%
Input: Self-Similar Arrival Process Main Observations: Large impact on mean; little impact on variance Variance-to-mean ratio increases with cache size Filtered request stream retains self-similar structure Statistics Before Cache Cache Size (MB) Mean Standard Deviation Hit Ratio %47.8%52.7%55.5%59.1%62.7%
October 14, 2002MASCOTS Network traffic self-similarity The statistical characterization of the traffic is essentially invariant with time scale. Main measure Hurst parameter: 0.5 < H < 1 Examination o autocorrelation (long-range dependence) o variance-time plot o rescaled adjusted range statistic (R/S) Background: Self-Similar Traffic
October 14, 2002MASCOTS Traffic Characterization in a Web Proxy Caching Hierarchy Filter effects of the first-level cache on Web workload Statistical multiplexing of filtered Web request streams after the first-level cache Modeling aggregate request stream offered to the second-level cache
October 14, 2002MASCOTS Multi-Level Web Proxy Caching System Web Proxy Cache 1 Web Proxy Cache 2Web Proxy Cache Child Level Parent Level
October 14, 2002MASCOTS Synthetic Self-Similar Workload Traces offered to the first-level cache Trace 1 (H=0.70, Zipf slope=0.75) Trace 2 (H=0.80, Zipf slope=0.80) Time (sec.) Requests per Interval Time (sec.) Requests per Interval
Evidence of Self-Similar Request Arrival Process for Filtered Web Proxy Workload Time Interval Count of Arrival /Interval (a) Time Series Lag Autocorrelation (b) Autocorrelation Log10(Aggregation level) Log10(Variance) (c) Variance-Time Plot Log10(R/S) Log10(Sample Size) (d) R/S Pox Plot H= `
October 14, 2002MASCOTS Superposition of Web Workload in time-domain Request Arrival Frequency (%) Characteristics of aggregate request arrival process 3
Evidence of Self-Similarity for Aggregate Request Arrival Process Requests per Interval Time(sec.) (a) Time series Lag Autocorrelation (b) Autocorrelation function Log10(variance) Log10(aggregation level) (c) Variance-Time Plot Log10(sample size) Log10(R/S) (d) R/S Pox Plot H=0.76
October 14, 2002MASCOTS Gamma Distribution βΓ( ) x-μ β ()e β ( ) - f(x) = : shape parameter β: scale parameter μ: location parameter Modeling of Aggregate Workload
October 14, 2002MASCOTS Modeling of Aggregate Workload
October 14, 2002MASCOTS Summary and Conclusions Recap: Trace-driven simulation of Web proxy caching hierarchy, with synthetic Web workloads Cache reduces peak and mean request arrival rate Cache filter effect does not remove self-similarity Superposition of Web request streams results in a bursty aggregate request stream Gamma distribution: a flexible and robust means to characterize request arrival count distribution at different stages in a Web caching hierarchy
October 14, 2002MASCOTS Future Work Bigger traces, more general workloads Studying the mathematical relationships between gamma (shape) and beta (scale) parameters versus cache size and hit ratio For more information: – –