Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol By Abuzafor Rasal and Vinoth Rayappan.

Slides:



Advertisements
Similar presentations
Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol Li Fan, Pei Cao and Jussara Almeida University of Wisconsin-Madison Andrei Broder Compaq/DEC.
Advertisements

UNIT-IV Computer Network Network Layer. Network Layer Prepared by - ROHIT KOSHTA In the seven-layer OSI model of computer networking, the network layer.
Latency-sensitive hashing for collaborative Web caching Presented by: Xin Qi Yong Yang 09/04/2002.
Cooperative Caching of Dynamic Content on a Distributed Web Server Vegard Holmedahl, Ben Smith, Tao Yang Speaker: SeungLak Choi, DB Lab., CS Dept.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
1 Improving the Performance of Distributed Applications Using Active Networks Mohamed M. Hefeeda 4/28/1999.
Hit or Miss ? !!!.  Cache RAM is high-speed memory (usually SRAM).  The Cache stores frequently requested data.  If the CPU needs data, it will check.
EEC-484/584 Computer Networks Lecture 6 Wenbing Zhao
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
What’s a Web Cache? Why do people use them? Web cache location Web cache purpose There are two main reasons that Web cache are used:  to reduce latency.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
CSCI 4550/8556 Computer Networks Comer, Chapter 19: Binding Protocol Addresses (ARP)
A Distributed Proxy Server for Wireless Mobile Web Service Kisup Kim, Hyukjoon Lee, and Kwangsue Chung Information Network 2001, 15 th Conference.
Internet Networking Spring 2002 Tutorial 13 Web Caching Protocols ICP, CARP.
Implementing ISA Server Caching. Caching Overview ISA Server supports caching as a way to improve the speed of retrieving information from the Internet.
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
Web Caching Schemes For The Internet – cont. By Jia Wang.
1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
TCP: Software for Reliable Communication. Spring 2002Computer Networks Applications Internet: a Collection of Disparate Networks Different goals: Speed,
1 ENHANCHING THE WEB’S INFRASTUCTURE: FROM CACHING TO REPLICATION ECE 7995 Presented By: Pooja Swami and Usha Parashetti.
World Wide Web Caching: Trends and Technology Greg Barish and Katia Obraczka USC Information Science Institute IEEE Communications Magazine, May 2000 Presented.
 Proxy Servers are software that act as intermediaries between client and servers on the Internet.  They help users on private networks get information.
Web Proxy Server Anagh Pathak Jesus Cervantes Henry Tjhen Luis Luna.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
1 Enabling Secure Internet Access with ISA Server.
Mapping Internet Addresses to Physical Addresses (ARP)
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
1 3 Web Proxies Web Protocols and Practice. 2 Topics Web Protocols and Practice WEB PROXIES  Web Proxy Definition  Three of the Most Common Intermediaries.
Design and Implement an Efficient Web Application Server Presented by Tai-Lin Han Date: 11/28/2000.
Input/OUTPUT [I/O Module structure].
1 Chapter 6: Proxy Server in Internet and Intranet Designs Designs That Include Proxy Server Essential Proxy Server Design Concepts Data Protection in.
CH2 System models.
World Wide Web Caching: Trends and Technologys Gerg Barish & Katia Obraczka USC Information Sciences Institute, USA,2000.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Efficient Peer to Peer Keyword Searching Nathan Gray.
CH1. Hardware: CPU: Ex: compute server (executes processor-intensive applications for clients), Other servers, such as file servers, do some computation.
Web Performance 성민영 SNU Computer Systems lab.. 2 차례 4 Modeling the Performance of HTTP Over Several Transport Protocols. 4 Summary Cache : A Scaleable.
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,
Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
ECE 526 – Network Processing Systems Design Computer Architecture: traditional network processing systems implementation Chapter 4: D. E. Comer.
ICP and the Squid Web Cache Duanc Wessels k Claffy August 13, 1997 元智大學系統實驗室 宮春富 2000/01/26.
The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel.
Efficient Peer-to-Peer Keyword Searching 1 Efficient Peer-to-Peer Keyword Searching Patrick Reynolds and Amin Vahdat presented by Volker Kudelko.
SIP working group IETF#70 Essential corrections Keith Drage.
Performance of Web Proxy Caching in Heterogeneous Bandwidth Environments IEEE Infocom, 1999 Anja Feldmann et.al. AT&T Research Lab 발표자 : 임 민 열, DB lab,
HTTP evolution - TCP/IP issues Lecture 4 CM David De Roure
ICP and the Squid Web Cache Duane Wessels and K. Claffy 산업공학과 조희권.
The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.
Hint-based Acceleration of Web Proxy Cache Daniela Rosu Arun Iyengar Daniel Dias IBM T.J.Watson Research Center Unversity of Yuan Ze,Syslab Mike Tien
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Computer Science Lecture 3, page 1 CS677: Distributed OS Last Class: Communication in Distributed Systems Structured or unstructured? Addressing? Blocking/non-blocking?
Cache Digest Alex Rousskov Duane Wessels National Laboratory for Applied Network Research April 17, 1998 元智大學 資訊工程研究所 系統實驗室 陳桂慧 February 9, 1999.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Internet Cache Protocol Erez Tal Assaf Oren Avner Cohen Submission Date: 5/2/01 Guides: Ran Wolff and Itai Dabran.
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Overview on Web Caching COSC 513 Class Presentation Instructor: Prof. M. Anvari Student name: Wei Wei ID:
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
A Classification for Access Control List To Speed Up Packet-Filtering Firewall CHEN FAN, LONG TAN, RAWAD FELIMBAN and ABDELSHAKOUR ABUZNEID Department.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
WWW and HTTP King Fahd University of Petroleum & Minerals
Web Caching? Web Caching:.
Cache Memory Presentation I
Internet Networking recitation #12
Ch 17 - Binding Protocol Addresses
Lecture 1: Bloom Filters
Presentation transcript:

Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol By Abuzafor Rasal and Vinoth Rayappan

Web caching 1 2 HTTP request HTTP response Client1 Client2 Cache Server Client3

Web Cache Sharing Proxy Caches Users Regional Network Rest of Internet Bottleneck...

Web Cache Sharing: Internet Cache Protocol (ICP) Internet Cache Protocol is currently implemented technique of web cache sharing Internet Cache Protocol = the proxy multicasts a query message to all other proxies whenever a cache miss occurs.

Internet Cache Protocol Client Proxy Cache Proxy Cache Proxy Cache Proxy Cache Internet

Proxy HTTP INTERNET Proxy … Client 1Client 2Client n ….. 12N First request: document is available in local proxy. HTTP HIT Internet Cache Protocol

Proxy HTTP INTERNET Proxy … Client 1Client 2Client n ….. 12N HTTP ICP Internet Cache Protocol Second Request: document is not available in local proxy.

Problem of ICP As the number of collaborating proxies increase the overhead dramatically increases, thus not scalable. –A proxy multicasts a query message to all other proxies whenever a cache miss occurs

UDP = ICP query and replay messages TCP = HTTP traffic between proxies, servers, and clients Total Packets or IP = UDP + TCP Problem of ICP

+ = ;

Summary Cache Each proxy maintains a Bloom Filter (data in compressed form) representing its local cache. Also, it holds Bloom Filters representing caches of other proxies. Updates to Bloom Filters are exchanged periodically or after a certain percentage of the documents in the cache was replaced. Request is sent only to proxy who most likely holds the requested document.

Summary Cache Client Internet Proxy Cache Proxy Cache Proxy Cache Proxy Cache First request: document is in other proxy

Summary Cache Client Internet Proxy Cache Proxy Cache Proxy Cache Proxy Cache Second request: the document is not in any proxy

Summary Cache Client Internet Proxy Cache Proxy Cache Proxy Cache Proxy Cache Third request: summary gives false hit

Summary Cache Two Parameter to design of Summary Cache protocol: –The frequency of summary updates. (inter-proxy traffic, overhead) –The representation of summary (memory). Above Solution: –Delay update summaries until a fixed percentage i.e. 1% of the cached documents are new. Positive: Reduce overhead (traffic) Negative: Introduce “false miss” error –Store summaries as a “Bloom Filter”. This is efficient hash-based probabilistic scheme that represent URLs of cached document. Positive: Reduce memory requirement Negative: Introduce “false hit” error

Summary Cache false misses: –Definition : the document requested is cached at some other proxy but its summary does not reflect the fact. –Effect: In this case, a remote cache hit is lost, and the total hit ratio within the collection of caches is reduced. –Improvement: can be eliminated/improved with higher frequency of update false hits: –Definition: the document requested is not cached at some other proxy but its summary indicates that it is. The proxy will send a query message to the other proxy, only to be noticed that the document is not cached there. –Effect: In this case, a query message is wasted. –Improvement: can be eliminated/improved by increasing the vector size of Bloom Filter or increase memory size of representation

Summary Cache Remote Stale Hits: document is cached at another proxy but the cached copy is stale. (Not because of update delay) –Delta compression can be used to transfer the new document. Delta compression transfers only the difference between the old and the new document instead of downloading the whole document.

Summary Cache Two factors limit the scalability: –The network overhead, the inter-proxies communication. Determined by update frequency, false hits and remote hits –Memory required to store the summaries. Determined by size of individual summary and # of proxies.

ICP = Hit ratio when no update delay is introduced exact_dir = Hit ratio with update delay introduced false_hit = No delay – delay = ICP – exact_dir stale-hit = Remote stale hit due to the document is stale (out dated) but not reflected in summary Impact of Update Delay: Explanation of the Graph

exact_dir = hit ratio decrease linearly as threshold increases. stale-hit = not effected by threshold because stale-hit error exist for both ICP and Summary Cache. False-hit = increases as threshold increases because deleted document in cache may still be show present in summary. Impact of Update Delay: Observation of the Graph

Summary Representations Summary Representation = how to store the summaries in proxies. Summary needs to be stored in DRAM (main memory) –Disk arms become bottlenecks in proxy cache –DRAM price continues to drop –DRAM is faster

Summary Representations: Naïve approach Exact-directory = the summary is essentially the list of URLs of cached documents, with each URL represented by its 16-byts MD5 signature. –Positive: Less errors –Negative: Consumes too much memory Server-name = web server names in the URLs of cached documents. –Positive: Cut down memory requirement by a factor of 10 but introduces errors –Negative: Generate too many false hit thus increase network traffic

Summary Representations: Bloom Filters Process –Step 1: Take each URL as an input to four different hash functions. –Step 2: Take each output of hash function (32 bits) and convert to 1 bit. –Step 3: Store 4 bits from four different hash functions and stores into a vector. Positive: Consumes much less memory Negative: Introduce insignificant errors

Summary Representations Server name produces too much traffic in network because request is send to any proxies that has server name.

Bloom filter Bloom filter is type technique used for compression of memory space( To avoid false hit) Summary cache : uses the bloom technique to do compression A method of representing a set of “A” of n elements to support the membership queries. It is a mechanism for identifying which pages have associated comments stored with in common knowledge server

Problem? Place A place B cnn.com/index.html wayne.edu/ Compact Representation arbitrary URI ? Bloom

How the bloom works? Pick a large bit array with all ‘0’s Pick # of independent hash function, in this case we have four(4) Every URL in the bag (Proxy summary cache), you apply the four hash function, and we will be getting four integers. Use the four integers in to the bit array Turn all the bits to 1 Repeat this to all URL in Proxy summary cache The above is the Encryption process. Repeat above steps in reverse for decrypting.

How does hash works? Hash function turns data into a relatively small number that may serve as a digital "fingerprint" of the data. Hash function turns data into a relatively small number that may serve as a digital "fingerprint" of the data.

Bloom filter  A hashing technique  m bit  k independent hashing function  many to one mapping “false positive

Bloom filter False positive - Given the query to b, we check bits at position h1(b), h2(b)…..,hk(b)..if any of them is 0, b is not in the set of A. - Other wise we know b is in a set A, although there is a certain probability that we are wrong. If fall positive increases number of access will go up, but when the fall negative increase, probability of getting wrong doc will go up. The salient feature of Bloom is there is a trade of between memory size(array) and false positive.

Probability of false positive  upper graph: for 4 hash functions  lower graph: optimal integral number of hash functions(5 hash function)

Bloom filter as summaries Provides straight forward mechanism to built summaries Proxy build bloom from the URL of cached docs Thus increasing the memory can decrease flase positive and other wise provides the clear trade between the above two

How the hash function built? 32 bit hash …… bit hash MD5 128 bit

Hit ratio

Obeservations of the cache hit ratio Exact_dir and bloom filter_8, _16,_32 is have virtually the same hit ratio compared to server name. Exact_dir will give same hit as bloom, but it will consume more memory to store all the informations of URL. Incase of Bloom filter_8_16_32,it will consume less memory than exact_dir, because of hash function.

False hit ratio under different summary representations

Observation of false hit (miss) ratio Server name has a much higher false hit (miss) ratio. Why? Because it just got the server name and don’t have a specific address of the requested URL. So the request will be sent to all other proxies, but the hit will be in any of the one proxy and obviously false hit is high. Exact_dir will have less false hit ratio compared to all (but it does need large cache size (memory).

Message per request

Observations on Msg/request We included ICP in for a comparative study. In case of ICP( With out the summary cache) the request will sent to all proxy to find the requested URL. So obviously messages/client request will be high compared to others. In the other extreme the bloom_8_16_32 and exact_dir will spend much less msg/client request to find the URL. It is good and economical to go with. Server name will be in the mid the above, because it got more false hit (miss). So higher the msg/client request.

Bytes of Msg size per request

Observations on size of inter network msg in bytes We are considering this issue because, update messages is of higher size than the query messages. So, Summary caches uses the occasional burst of large messages in between the small query messages. So it reduces CPU overhead and network interface packet (Results are table 2 and 4) significantly For query messages Header sizeAverage URL ICP and others 2050 For Summary updates Header sizeBytes/Change Exact directory 2016 Server name 2016 Bloom filter based Summaries 324

Memory requirments in terms of % of Proxy cache: NLANR 4 proxies

Memory requirments in terms of % of Proxy cache: DEC 16 proxies

Summary Web caching is an active research area. Directory server: Approach uses the a central server to keep track of the cache directories of all the proxies query the server for the cache hits in other proxies The above approach is failed because being a centralized server the network overhead will be high because of serving the all request. To over come the above we got a summary cache enabled ICP web-cache sharing protocol. Our inspection of the Quesnet traces showed that the chid to parent ICP queries can be a significant portion of the messages that the parent proxy has to process. So in this case applying the summary cache will significantly reduce the # of queries and overhead.

Future work Plan to investigate the impact of the protocol on the parent – child proxy cooperation and the optimal hierarchy configuration for a given work load Plan to investigate the application of summary cache in various web-cache consistency protocol Plan to design new method for summary cache implementation in proxy to speed up the look up.

Conclusion We proposed the summary-cache enhanced ICP, a scalable world wide web cache sharing protocol and proved it is the best to go with compared all other techniques. Our study has two key concepts effects of delayed updates of summary cache, and the representation of summary. Solution to first is, we can delayed the updates1 % to 10 % (Proved based on trace driven simulation) and it will cause errors but it is bearable. Solution to second problem, we introduced bloom filter technique for representation of summary cache. We achieve over 50 % reduction in bandwidth, and reduces the inter-proxy communication messages by a factor of 25 to 60.