Web Cache Behavior The Laboratory of Computer Communication and Networking Submitted by: Lena Vardit Liraz

Slides:



Advertisements
Similar presentations
Cache Replacement Algorithm Outline Exiting document replacement algorithm Squids cache replacement algorithm Ideal Problem.
Advertisements

The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling.
Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.
A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:
A Preliminary Attempt ECEn 670 Semester Project Wei Dang Jacob Frogget Poisson Processes and Maximum Likelihood Estimator for Cache Replacement.
Outperforming LRU with an Adaptive Replacement Cache Algorithm Nimrod megiddo Dharmendra S. Modha IBM Almaden Research Center.
Fuzzy Logic and its Application to Web Caching
October 15, 2002MASCOTS WebTraff: A GUI for Web Proxy Cache Workload Modeling and Analysis Nayden Markatchev Carey Williamson Department of Computer.
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
What’s the Problem Web Server 1 Web Server N Web system played an essential role in Proving and Retrieve information. Cause Overloaded Status and Longer.
1 11 Web Caching Web Protocols and Practice. 2 Topics Web Protocols and Practice WEB CACHING  Cache Definition  Goals of Web Caching  Motivations for.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Performance Evaluation of Web Proxy Cache Replacement Policies Orit Brimer Ravit krayif Sigal ishay.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
October 14, 2002MASCOTS Workload Characterization in Web Caching Hierarchies Guangwei Bai Carey Williamson Department of Computer Science University.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?
Improving Proxy Cache Performance: Analysis of Three Replacement Policies John Dilley and Martin Arlitt IEEE internet computing volume3 Nov-Dec 1999 Chun-Fu.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Submitting: Barak Pinhas Gil Fiss Laurent Levy
Performance Evaluation
Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.
Internet Networking Spring 2002 Tutorial 13 Web Caching Protocols ICP, CARP.
Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.
A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan.
Proxy Caching the Estimates Page Load Delays Roland P. Wooster and Marc Abrams Network Research Group, Computer Science Department, Virginia Tech 元智大學.
A Case for Delay-conscious Caching of Web Documents Peter Scheuermann, Junho Shim, Radek Vingralek Department of Electrical and Computer Engineering Northwestern.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Web Caching Schemes For The Internet – cont. By Jia Wang.
Evaluating Content Management Techniques for Web Proxy Caches Martin Arlitt, Ludmila Cherkasova, John Dilley, Rich Friedrich and Tai Jin Hewlett-Packard.
1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.
Investigating Forms of Simulating Web Traffic Yixin Hua Eswin Anzueto Computer Science Department Worcester Polytechnic Institute Worcester, MA.
CS401 presentation1 Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility Takahiro Hara Presented by Mingsheng Peng (Proc. IEEE.
By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
Performance of Web Applications Introduction One of the success-critical quality characteristics of Web applications is system performance. What.
Microprocessor-based systems Curse 7 Memory hierarchies.
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
Segment-Based Proxy Caching of Multimedia Streams Authors: Kun-Lung Wu, Philip S. Yu, and Joel L. Wolf IBM T.J. Watson Research Center Proceedings of The.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
Sharing Information across Congestion Windows CSE222A Project Presentation March 15, 2005 Apurva Sharma.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Chapter 21 Virtual Memoey: Policies Chien-Chung Shen CIS, UD
Understanding the Performance of Web Caching System with an Analysis Model and Simulation Xiaosong Hu Nur Zincir-Heywood Sep
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Multicache-Based Content Management for Web Caching Kai Cheng and Yahiko Kambayashi Graduate School of Informatics, Kyoto University Kyoto JAPAN.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
An Overview of Proxy Caching Algorithms Haifeng Wang.
Evaluating Content Management Technique for Web Proxy Cache M. Arlitt, L. Cherkasova, J. Dilley, R. Friedrich and T. Jin MinSu Shin.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
#16 Application Measurement Presentation by Bobin John.
Overview on Web Caching COSC 513 Class Presentation Instructor: Prof. M. Anvari Student name: Wei Wei ID:
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 9: Virtual Memory.
The Impact of Replacement Granularity on Video Caching
Computer Architecture
How will execution time grow with SIZE?
Basic Performance Parameters in Computer Architecture:
Web Caching? Web Caching:.
Cache Memory Presentation I
Memory Management for Scalable Web Data Servers
Internet Networking recitation #12
Performance metrics for caches
Web Proxy Caching Model
Algorithms for Selecting Mirror Sites for Parallel Download
Presentation transcript:

Web Cache Behavior The Laboratory of Computer Communication and Networking Submitted by: Lena Vardit Liraz

Outline  Introduction - Web Caching Motivation  Project flow design  Project Modules  Prowgen – producing the requests  Network topology  WebCache Tool  NS simulation part  Statistics and graphs of the simulation results  Evaluation of the cache behavior and the different algorithms

Motivation  The World-Wide Web has grown tremendously in the past few years to become the most prevalent source of traffic on the Internet today.  One solution that could help relieve these problems of congestion and overloaded web-servers is.  One solution that could help relieve these problems of congestion and overloaded web-servers is web caching.

Motivation (2)  A web proxy cache sits between Web servers and clients, and stores frequently accessed web objects.  Having received a request from a client the proxy attempts to fulfill the request from among the files stored in the proxy’s cache.  If the requested file is found (a cache hit) the proxy can immediately respond to the client’s request. If the requested file is not found (a cache miss) the proxy then attempts to retrieve the file from the original location.  When the server gets the file from the original server, it can satisfy the request made by the client.

Proxy Server client Server A Server B Server C Server D Server E Internet Web Caching Illustration

Motivation (3)  When the cache is full, different replacement decisions are made, regarding to which file to invoke from the proxy.  The pruning algorithm is mainly cache management dependent, and plays a major role in reducing both latency and network traffic on the internet.

Motivation (4)  The cache concept helps the end user, the service provider and the content provider by reducing the server load, alleviating network congestion, reducing bandwidth consumption and reducing network latency.

What is a Web proxy cache?  Intermediary between Web clients (browsers) and Web servers.  Store local copies of popular documents  Forward requests to servers only if needed

The project purpose: Simulate a web cache behavior of a proxy, and see the hit-rate and the cost of different cache - pruning algorithms. Simulate a network, and run the simulator to estimate the time it takes the misses

Project Flow: Prowgen Generate requests Prowgen Generate requests Prowgen Parsing Creates database of requests Prowgen Parsing Creates database of requests WebCache Tool Simulates cache behavior LRU/LFU/HYB/FIFO WebCache Tool Simulates cache behavior LRU/LFU/HYB/FIFO NS simulator Runs the misses requests on the network 10, 50, 100 NS simulator Runs the misses requests on the network 10, 50, 100 Statistics and conclusions from the results

Prowgen:

Prowgen Part  ProWGen uses mathematical models to capture the salient characteristics of web proxy workload, as defined in the previous study of web proxy servers.  The main purpose of ProWGen is to generate synthetic workload for evaluating proxy caching techniques. This approach reduces the complexity of the models, as well as the time and space required for the generation and storage of synthetic workload.  The main purpose of ProWGen is to generate synthetic workload for evaluating proxy caching techniques. This approach reduces the complexity of the models, as well as the time and space required for the generation and storage of synthetic workload.  The following parameters can be changed in the Prowgen: 1. One-time referencing – set to 50% of the files 1. One-time referencing – set to 50% of the files 2. File popularity – medium distribution 2. File popularity – medium distribution 3. File size distribution – 1.4 (lighter tail index) 3. File size distribution – 1.4 (lighter tail index) 4. Correlation - correlation between file popularity and file size, we used normal correlation. 4. Correlation - correlation between file popularity and file size, we used normal correlation. 5.Temporal locality – used static configuration, which seems to have more temporal locality 5.Temporal locality – used static configuration, which seems to have more temporal locality

Network Topology PROXY 20% of the network – Medium servers 5-7 hops to the proxy 10% of the network – Closest servers- 3-4 hops to the proxy 70% of the network – “the rest of the world” hops to the proxy

Division of files to servers:  10,000 file requests are related to servers from group 1 (10% - the closest servers)  20,000 file requests are related to server from group 2 (20% - the medium distance servers)  70,000 file requests are related to group 3. (70% - “the rest of the internet)  Division is done with the help of a hash function, so the division, won’t be influenced from the order of the files, in the Prowgen input.

ProWGen Output:  The output of Prowgen is a list of requests:  File_name file_size

WebCache TOOL:

Data bases File: size name server … File: size name server … Requests Server: latency bandwidth Server: latency bandwidth WebCache: List of files in cache Cache_size WebCache: List of files in cache Cache_size Servers: LRU FIFO LFU HYB

Data base We have 3 classes:  class File  class Server  class Cache

class File This class contain the file information: double name: name of file double name: name of file double size: size of file double size: size of file int server: any value between 0 - num_serv is valid int server: any value between 0 - num_serv is valid double prio: The Priority of the file double prio: The Priority of the file int nref:number of references to the document since it last entered the cache int nref:number of references to the document since it last entered the cache We used this DB for the list we receive from the prowgen, and for the list of files that are in the cache

class Server class Server This class contain the file information: double lat: The latency to this server double lat: The latency to this server int band: The bandwidth to this server int band: The bandwidth to this server We use the DB to contain the list of server

class Cache class Cache This class contain the file information: list FileList : a list of files that are in the cache list FileList : a list of files that are in the cache int CacheFreeSpace : The remaining place in the cache int CacheFreeSpace : The remaining place in the cache We use this class to simulate the cache itself

Evaluation the cache behavior Evaluation the cache behavior We used the following modules:  Prowgen  cacheLRU  cacheLFU  cacheHYB  cacheFIFO

cacheLRU cacheLRU An implementation of the LRU algorithm using STL list The main idea is: if the file is in the cache – HIT: 1. Move the file to the beginning of the list otherwise MISS: 1.“make room” for the requested file (by deleting the last files in the list), 2.print a request to the ns file for the requested file and then 4.insert the file to the cache DB (to the beginning of the list)

cacheLRU cacheLRU  Replaces least recently used page with the assumption that the page that have not be reference for the longest time will not be very likely to be reference in the future. Each newly fetched page is put on head of list  Tail page is deleted when storage is exceeded  Performs better than LFU in practice  Used in today caches (e.g., Squid Web Proxy Cache)

cacheLFU cacheLFU Implementation of the LFU algorithm using STL list The main idea is: if the file is in the cache – HIT: 1. Update the file priority (inc by 1) 2. Update the file place in the list according to its prio otherwise MISS: 1.“make room” for the requested file (by deleting the last files in the list), 2.print a request to the ns file for the requested file and then 3.initiate the file priority to 1 4.insert the file to the cache DB according to its prio

cacheLFU cacheLFU  Replaces least frequently used page with the assumption that the page has been least often used will not be likely to be referenced again in the future.  Optimal replacement policy if all pages have same size and page popularity does not change  In practice has disadvantages slow to react to popularity changes slow to react to popularity changes needs to keep statistics (counter) for every page needs to keep statistics (counter) for every page does not consider page size does not consider page size

c acheHYB c acheHYB Implementation of the HYB algorithm using STL list The main idea is: if the file is in the cache – HIT: 1. Update the file priority according to the algorithm 2. Update the file place in the list according to its prio otherwise MISS: 1.“make room” for the requested file (by deleting the last files in the list), 2.Print a request to the ns file for the requested file and then 3.Update the file priority according to the algorithm 4.insert the file to the cache DB according to its prio

cacheHYB cacheHYB  The three factors which Hybrid takes into account are size, transfer time, and number of references.  The Hybrid algorithm offers the best combination of guaranteed performance for frequently used objects and overall cache size.  Drawback: needs to keep statistics (counter and other values) for every page

cacheHYB cacheHYB  WB = 8Kb and WN = 0.9 for the HYB algorithm (100 servers)  WB = 1 and WN = 1 for the HYB algorithm (50 servers )   HYB selects for replacement the document with the lowest value value of the following expression:   Weight = ((Ref ** WN)*(latency + WB/bandwidth)) / (FileSize)   Therefore, a file is not likely to be removed if the expression above is large, which would occur if the file has been referenced frequently,and if the document size is small.

cacheHYB   The constant WB, whose units are bytes, is used for setting the relative importance of the connection time versus the connection bandwidth.   The constant WN, which is dimensionless, is used for setting the relative importance of nref versus size. As WN 0 the emphasis is placed upon size.   If WN = 0 nref is not taken into account at all.   If WN > 1 then the emphasis is placed more greatly upon nref than size.

cacheFIFO cacheFIFO Implementation of the FIFO algorithm using STL list The main idea is: if the file is in the cache – HIT: otherwise MISS: 1.“make room” for the requested file (by deleting the last files in the list), 2.Print a request to the ns file for the requested file and then 3.insert the file to the cache DB (to the beginning of the list)

cacheFIFO cacheFIFO  Replace the page that has been cached for longest time with the assumption that old caches will not be reference again in the future.  Regardless of the frequency of the page is request, size of the page and the cost to bring it back.  Does not take the frequency of the page into consideration, this policy will result that the same popular page to be brought into the cache over and over again.

NS implementation:

Network Topology PROXY 20% of the network – Medium servers 5-7 hops to the proxy 2 MB 10% of the network – Closest servers- 3-4 hops to the proxy 10 MB 70% of the network – “the rest of the world” hops to the proxy 2 MB

Network Simulation:  Latency between hops – 10 ms  Latency to each server is decided: The group it belongs to – 10%, 20%, 70% The group it belongs to – 10%, 20%, 70% Inside the group – it is distributed uniformly Inside the group – it is distributed uniformly Group 1 – 3-4 hops * 10 msGroup 1 – 3-4 hops * 10 ms Group hops * 10 msGroup hops * 10 ms Group 3 – hops * 10 msGroup 3 – hops * 10 ms The algorithm responsible for the distribution uses counter, and modulo calculation. The algorithm responsible for the distribution uses counter, and modulo calculation.

Network Simulation:  Bandwidth is also decided, depeding on the group it belongs: Group 1 – 10 MB (closest servers) Group 1 – 10 MB (closest servers) Group 2 & 3 – 2MB Group 2 & 3 – 2MB

Connection Implementation:  TCP agents are created: Agent/TCP/Newreno for the Server Agent/TCP/Newreno for the Server Will implement TCP – New Reno protocolWill implement TCP – New Reno protocol Agent/TCPSink for the proxy ( the receiver ) Agent/TCPSink for the proxy ( the receiver )  On top of the agents: FTP/Application was attached to the TCP FTP/Application was attached to the TCPagent.

Requests:  When a miss has occurred in the CacheTool Part, it will write to the NS input file, a fetch request from the relevant server.  Those requests will be issued in varying times. At the particular time, the server will start sending the file.

Requests Times:  Requests times are distributed exponentially using the random generator implemented in the ns: Average – 0.5 Average – 0.5 Seed – 0 (default) Seed – 0 (default)  Each file request, will be treated within this time, counted from the previous request.

Requests:  When a miss is indicated by the cache – managing algorithm it will place a request to the simulator to fetch the file: it will place a request to the simulator to fetch the file:  NS will decide at which time to fetch this file (at the request time decided randomly)  When this file request will be completed ( all the acks will be received ) – done procedure will be called.

done procedure:  Done procedure Is called every time, a send command is finished Is called every time, a send command is finished Done procedure updates the timer for this request - counts how long the request took. Done procedure updates the timer for this request - counts how long the request took. Writes this time to the statistic file Writes this time to the statistic file

done procedure:  Done procedure Is called every time, a send command is finished Is called every time, a send command is finished Done procedure updates the timer for this request - counts how long the request took. Done procedure updates the timer for this request - counts how long the request took. The duration of request is counted as the difference between the beginning time, and the end time (when done is called) The duration of request is counted as the difference between the beginning time, and the end time (when done is called) Writes this time to the statistic file Writes this time to the statistic file

Screen shots:

Statistics and Evaluation:

statistics:  3 types of network: 10, 50, 100 servers  LFU, LRU, HYB, FIFO algorithms Hit count Hit count Byte hit count Byte hit count  4000 requests in the middle, are run over the simulator Total time for the misses among the requests are counted Total time for the misses among the requests are counted

conclusions:  Cache sizes are tested for different algorithms: 1MB – 256MB 1MB – 256MB

Performance metrics:

Hit Ratio  The cache hit ratio is the number of requests satisfied by the cache divided by the total number of requests from user.  Higher hit ratio, the better the replacement policy is, because this means that fewer requests are forwarded to the web server, thus reduce the network traffic

 LRU, FIFO and HYB seem to get close results  HYB seems to be a little lower, maybe since it is taking into account the number of references, which doesn’t seem to be an efficient idea  LFU is the worse algorithm  At 256MB, all the algorithms seem to get to the same results, since the cache size seems to be big enough, and to contain reasonable amount of files, for optimal amount of misses. Conclusions

Byte-Hit Ratio  The ratio of total bytes satisfied by the cache divided by the total bytes transferred to the user.  Higher byte hit ratio means a lower volume of data flowing between the proxy and the web server, thus reduce the network traffic

Conclusions  LRU and FIFO seem to get good results again  HYB seems to get lower results – since it prefers to evict bigger files, and obviously will achieve lower byte-hit rate. It “pays” more for each miss.  LFU seems to be worse then LRU and FIFO  Again at 256MB, all the algorithms achieve similar results.

NS Latency  The simulated time that it takes to fetch the files from the internet.  The less the latency, the better is the algorithm to lower network traffic, by thus taking load from the internet.  Here we are reducing both latency and both traffic.

 LFU, FIFO, LRU seem to get very close results on the files that ran on the simulator.  HYB seems to get worse results – This might be because of parameters not suited to the workload generated This might be because of parameters not suited to the workload generated Or perhaps because of the HYB preference to small files – which causes more time to bring the larger files. Or perhaps because of the HYB preference to small files – which causes more time to bring the larger files. Conclusions

 LFU does not achieve good results in both Hit Ratio and Byte Hit Ratio. This implies that the assumption that user will request the same frequently requested document over and over again is not a very good assumption.  As for FIFO performance, the results were surprisingly good, when taking into account the simple and non-sophisticated implementation.

Conclusions  HYB achieves good hit-rate, but does not achieve neither good byte-hit rate, or low latency time.  Since HYB prefers files which have high reference, and are relatively small, the byte-hit ratio is not expected to be high.  As for network latency, it should be dependent on the network, and more parameters should be tested.