Web Caching Dr. Yingwu Zhu. What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster.

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

1 Content Delivery Networks iBAND2 May 24, 1999 Dave Farber CTO Sandpiper Networks, Inc.
1 11 Web Caching Web Protocols and Practice. 2 Topics Web Protocols and Practice WEB CACHING  Cache Definition  Goals of Web Caching  Motivations for.
1 Caching in HTTP Representation and Management of Data on the Internet.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
How the web works: HTTP and CGI explained
What’s a Web Cache? Why do people use them? Web cache location Web cache purpose There are two main reasons that Web cache are used:  to reduce latency.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Data and Computer Communications Eighth Edition by William Stallings Lecture slides by Lawrie Brown Chapter 23 – Internet Applications Internet Directory.
Internet Networking Spring 2002 Tutorial 13 Web Caching Protocols ICP, CARP.
Web Caching1 By Amisha Thakkar Alpa Shah. Web Caching2 Overview What is a Web Cache ? Caching Terminology Why use a cache? Disadvantages of Web Cache.
Web Caching1 By Amisha Thakkar. Web Caching2 Overview What is a Web Cache ? Caching Terminology Why use a cache? Disadvantages of Web Cache Other Features.
Web, HTTP and Web Caching
Implementing ISA Server Caching. Caching Overview ISA Server supports caching as a way to improve the speed of retrieving information from the Internet.
Caching And Prefetching For Web Content Distribution Presented By:- Harpreet Singh Sidong Zeng ECE Fall 2007.
Web Caching Schemes For The Internet – cont. By Jia Wang.
World Wide Web Caching: Trends and Technology Greg Barish and Katia Obraczka USC Information Science Institute IEEE Communications Magazine, May 2000 Presented.
Caching and Content Distribution Networks. Web Caching r As an example, we use the web to illustrate caching and other related issues browser Web Proxy.
Web Cache. Introduction what is web cache?  Introducing proxy servers at certain points in the network that serve in caching Web documents for faster.
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
1 Caching  Temporary storage of frequently accessed data (duplicating original data stored somewhere else)  Reduces access time/latency for clients 
Web Hacking 1. Overview Why web HTTP Protocol HTTP Attacks 2.
Krerk Piromsopa. Web Caching Krerk Piromsopa. Department of Computer Engineering. Chulalongkorn University.
Web Caching: Replication on the World Wide Web Jonathan Bulava CSC8530 – Distributed Systems Dr. Paul Schragger.
CP476 Internet Computing Lecture 5 : HTTP, WWW and URL 1 Lecture 5. WWW, HTTP and URL Objective: to review the concepts of WWW to understand how HTTP works.
World Wide Web Caching: Trends and Technologys Gerg Barish & Katia Obraczka USC Information Sciences Institute, USA,2000.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.
15-744: Computer Networking L-21: Caching and CDNs Amit Manjhi.
Network Security. 2 SECURITY REQUIREMENTS Privacy (Confidentiality) Data only be accessible by authorized parties Authenticity A host or service be able.
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
HTTP1 Hypertext Transfer Protocol (HTTP) After this lecture, you should be able to:  Know how Web Browsers and Web Servers communicate via HTTP Protocol.
CSE 461 HTTP and the Web. This Lecture  HTTP and the Web (but not HTML)  Focus  How do Web transfers work?  Topics  HTTP, HTTP1.1  Performance Improvements.
1 Caching in HTTP Representation and Management of Data on the Internet.
World Wide Web Caching CS457 Seminar Yutao Zhong 11/13/2001.
HTTP support for caching & replication. Conditional requests Server executes conditional request. Responds with a message body only if the condition is.
ICP and the Squid Web Cache Duanc Wessels k Claffy August 13, 1997 元智大學系統實驗室 宮春富 2000/01/26.
Web Cache Consistency. “Requirements of performance, availability, and disconnected operation require us to relax the goal of semantic transparency.”
Computer Science Lecture 14, page 1 CS677: Distributed OS Last Class: Concurrency Control Concurrency control –Two phase locks –Time stamps Intro to Replication.
On The Cooperation of Web Clients and Proxy Caches Yiu Fai Sit, Francis C.M. Lau, Cho-Li Wang Department of Computer Science The University of Hong Kong.
HTTP evolution - TCP/IP issues Lecture 4 CM David De Roure
ICP and the Squid Web Cache Duane Wessels and K. Claffy 산업공학과 조희권.
Setup and Management for the CacheRaQ. Confidential, Page 2 Cache Installation Outline – Setup & Wizard – Cache Configurations –ICP.
Web Services. 2 Internet Collection of physically interconnected computers. Messages decomposed into packets. Packets transmitted from source to destination.
EE 122: Lecture 21 (HyperText Transfer Protocol - HTTP) Ion Stoica Nov 20, 2001 (*)
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
System Software Lab. A Scalable Web Cache Consistency Architecture Kim Sangyup SSLAB. EE. KAIST SIGCOMM ’ 99 Haobo Yu, Lee Breslau.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Web Caching. Why Caching? Faster browsing experience for users Cache hit rate Traffic Prioritization Reduce network bandwidth requirements significantly.
Overview on Web Caching COSC 513 Class Presentation Instructor: Prof. M. Anvari Student name: Wei Wei ID:
Presented by Michael Rainey South Mississippi Linux Users Group
Web and Proxy Server.
BUILD SECURE PRODUCTS AND SERVICES
WWW and HTTP King Fahd University of Petroleum & Minerals
Web Development Web Servers.
HTTP request message: general format
Caching Temporary storage of frequently accessed data (duplicating original data stored somewhere else) Reduces access time/latency for clients Reduces.
The Internet.
Web Caching? Web Caching:.
Internet Networking recitation #12
Internet Applications
ECE 671 – Lecture 16 Content Distribution Networks
Distributed Content in the Network: A Backbone View
CSE 461 HTTP and the Web.
EE 122: HyperText Transfer Protocol (HTTP)
APACHE WEB SERVER.
Presentation transcript:

Web Caching Dr. Yingwu Zhu

What is Web Caching Introducing proxy servers at certain points in the network that serve in caching Web documents for faster client access. Comparable to the cache memory in a computer system

Proxy Cache clients proxy servers Reply Req. Reply

How? Client send requests to the proxy. If the requested document is in its cache, the proxy serves the request from its cache. Otherwise, the proxy forward the request to the server. Server replies the request through the proxy (proxy keep a copy of the requested document).

Why Web Caching? Rapid growth in HTTP traffic to form the largest part of the Internet traffic which causes more network congestion and server unavailability. The number of Web static pages almost doubles every year Some old data –Number of unique pages: 800M < X < 2.2B –Number of unique web sites: 8,500,000 –static pages: %30 - %40 –pages revisited: %80 –expected hit-rate: %24 - %32

Why Web Caching? Bandwidth Latency Performance = Response Time Server Load Failure Redundancy

Expected Gains Bandwidth saving Improving content availability. Improving web server availability. Server load balancing. Reducing user-perceived latency

What: Content and Protocols HTTP 1.0 Basic protocol –Send Request based on fix number of verbs GET HEAD POST –Receive response, meta-data, content

What: Content and Protocols HTTP Request Request = Simple-Request | Full-Request Simple-Request = "GET" SP Request-URI CRLF Full-Request = Request-Line ; * ( General-Header ; | Request-Header ; | Entity-Header ) ; CRLF [ Entity-Body ]

What: Content and Protocols Example: GET /pub/www/index.html HTTP/1.0 Response: HTTP/ OK Server: Microsoft-IIS/5.0 Date: Sat, 19 Oct :46:53 GMT Expires: Sun, 20 Oct :00:00 GMT Content-Length: 2291 Content-Type: text/html Cache-control: private

What: Content and Protocols Example “if-modified-since”: GET /pub/www/index.html HTTP/1.0 If-Modified-Since: Sat, 19 Oct :43:31 GMT Response: HTTP/ OK Server: Microsoft-IIS/5.0 Date: Thu, 13 Jul :46:53 GMT Expires: Sun, 20 Oct :00:00 GMT Content-Length: 2291 Content-Type: text/html Cache-control: private

What: Content and Protocols Example “if-modified-since”: GET /pub/www/index.html HTTP/1.0 If-Modified-Since: Sat, 19 Oct :43:31 GMT Response: HTTP/ Not Modified

HTTP support for caching Conditional requests (IMS) Servers can set expires and max-age Request indirection: application level routing Range requests, entity tag Cache-control header –Requests: min-fresh, max-stale, no-transform –Responses: must-revalidate, public, private, no-cache

Reverse Proxy Reverse Proxy Reverse Proxy Intranet Where Browser Local ISP cache L4 Switch Data Center ISP cdn cache Content Server Content Server Content Server Content Server Reverse Proxy Browser cache Browser cache cdn

Cache Types Proxy Caching Reverse Proxy Caching Transparent Caching Adaptive Caching Push Caching Active Caching

Proxy Caching Harvest/Squid Provide web content for a fixed user base Deployed at the network edges (company or institutional gateway or firewall hosts) Standalone operation Manual configuration in web browsers Commodity product/technology Single point of failures

Reverse Proxy Caching Designed to offload duties from one or more specific servers Data size is limited to size of static content on the server Challenge is fast, disk-less operation Cache consistency is easy

Transparent Caching Intercept HTTP requests and redirect them to web cache servers or cache clusters No client configuration Violates end-to-end paradigm –Client thinks it is talking directly to server –Server thinks it is talking to cache Implemented as: L4-switch –Layer 4 switch makes switching decisions based on TCP or UDP port number, i.e., 80

Transparent Caching

Adaptive Caching ISP Level caching, global data placement optimization Cooperating multiple distributed caches Operate as a cache-mesh based on content demand Cache Group Management Protocol –How meshes are formed –How individual caches join/leave the meshes Content Routing Protocol sends request to the appropriate cache within the meshes Uses distributed cache meshes to solve the hot spot problem Caches dynamically join and leave the groups based on content demand Administrative boundaries must be relaxed

Push Caching Keep data close to those clients requesting this information Send the data out proactively Assumption: we are able launch caches that may cross administrative boundaries Incurs cost (storage and transmission)

Active Caching Applies caching to dynamic documents 30 % of client HTTP requests contains cookies The servers provides the cache with the objects and any associated cache applets –Use an applet inside of the cache to customize dynamic pages on the fly

Cache Placement/Deployment Close to clients/content consumers –Proxy caching –Transparent proxy caching Close to servers/content providers –Improve access to logical sets of data –Delay-sensitive data: video, audio –Reverse proxy caching –Push caching Network choke points: strategic deployment –Adaptive caching –Problem with administrative control

Zipf Law vs. Web Access Zipf Law Web Access Caching?

Zipf’s Law Zipf’s law: The frequency of an event P as a function of rank i is a power law function: P i = Ω / i α where α ≤ 1

Zipf’s Law Observed to be true for –Frequency of written words in English texts –Population of cities –Income of a company as a function of rank

Zipf’s Law vs. Web Access For a given server, page access by rank follows Zipf’s law Web requests from a fixed population of users follows Zipf’s law 0.64 < α < 0.83

Observations Top %1 of all documents account for %20 - %35 of proxy requests Top %10 account for %45 - %55 of requests It takes %25 to %40 of all documents to account for %70 of requests It takes %70 to %80 of all documents to account for %90 of requests

Zipf’s Law and Caching Discussion How does this help in cache design?

Basic caching algorithm Pages may be Fresh: up-to-date Expired: current date > expiration date Stale: “old”

Basic caching algorithm - #2 If (page is in the cache) if ( page is expired or stale ) Get from server - if-modified-since If not modified, Get from cache Get from Server Else Get from Server

Basic caching algorithm - #3 If cache has space Store the file Else 1.Delete expired from cache 2.Delete stale from cache 3.Delete LRU from cache 4.Delete largest/smallest from cache?

Cache Replacement Cache size is limited, need replacement policy LRU LFU Greedy-dual size Many others

Cache Consistency Multiple copies of objects created – How and when renewing the copies? Goals –Avoid stale copies –Keep non useful traffic as low as possible

Cache Consistency: Polling Solution 1: polling every time implemented in HTTP using the optional “if-modified-since" request header field Benefit: strong consistency Drawback: very slow cache hit

Cache Consistency: Polling Solution 2: polling if TTL expires, widely used –Associate a TTL (12 hours or 2 days) with each cached object implemented in HTTP using the optional "expires" header field Benefit: fast cache hit Drawback: weak cache consistency (5% stale) due to TTL is an a priori estimate of an object's life time

Cache Consistency Solution 3 : Invalidation Protocols The server helps the proxy in maintaining consistency Invalidation protocols –When the proxy makes a request, Piggyback cache validation (PCV) : the proxy provides some other potentially stale copies for server validating Piggyback cache invalidation (PCI) : the server provides some copies which have been updated since last access –Use of volumes Volume lease : – The client receive a lease from the server –During the lease validity the client can retreive copies from proxy –When the lease expire the client has to renew it Problems: scalability, servers needs keep cache states

Cache Cooperation Hierarchical caching –Cache servers form a hierarchy, tree-like structures –Parent servers: top of the hierarchy, receive requests from child servers. If they do not have the requested objects, either ask their parents or original web servers –Sibling servers: if the local cache does not have the requested object, then ask its sibling caches. If the sibling caches do not have the object, then the local cache asks the parent cache

Cache Hierarchies Use hierarchy to scale a proxy –Why? Larger population = higher hit rate (less compulsory misses) Larger effective cache size –Why is population for single proxy limited? Performance, administration, policy, etc. NLANR cache hierarchy –Most popular –9 top level caches –Internet Cache Protocol based (ICP) –Squid/Harvest proxy How to locate content?

ICP (Internet cache protocol) Simple protocol to query another cache for content Uses UDP – why? ICP message contents –Type – query, hit, hit_obj, miss –Other – identifier, URL, version, sender address –Special message types used with UDP echo port Used to probe server or “dumb cache” Query and then wait till time-out (2 sec) Transfers between caches still done using HTTP

Squid Client Parent Child Web page request ICP Query

Squid Client Parent Child ICP MISS

Squid Client Parent Child Web page request

Squid Client Parent Child Web page request ICP Query

Squid Client Parent Child Web page request ICP MISS ICP HIT

Squid Client Parent Child Web page request

Hierarchical caching Ideally, want the cache mesh to behave as a single cache with equivalent capacity and processing capability ICP: many copies of popular objects created – capacity wasted High Latency: More than one hop needed for searching object How to improve?  Discuss!

Problems with caching Over 50% of all HTTP objects are uncacheable. Sources: –Dynamic data  stock prices, frequently updated content –CGI scripts  results based on passed parameters –SSL  encrypted data is not cacheable Most web clients don’t handle mixed pages well  many generic objects transferred with SSL –Cookies  results may be based on passed data –Hit metering  owner wants to measure # of hits for revenue, etc, so, cache busting

Risks of Using Proxy Benefits: reduce latency, bandwidth saving, etc. Risks –Obsolete data –Violate client privacy: the proxy can keep a log file telling which objects the client has requested –Data integrity

Real Proxy Servers Squid: The most widely used. The better working and the free one. Microsoft ISA Server 2004 : Microsoft developed ISA to replace Microsoft proxy server. It’s fully functional with Active Directory Apache: Apache web server has a module to do reverse caching (experimental) Cisco Cache Engine: sits next to (mostly) Cisco routers and receives transparently redirected HTTP requests CERN/W3C HTTPd: It was the original proxy server.