Download presentation
Presentation is loading. Please wait.
1
A tutorial on Web Caching
HTTP CACHING A tutorial on Web Caching
2
Goals Understanding of HTTP Caching HTTP headers related to Caching
Common terms Web environment without and with caching Benefits of caching Types of cache Cache processing mechanism HTTP headers related to Caching Common HTTP Caching Scenarios Queries
3
Understanding of HTTP Caching
4
Prerequisites Before this tutorial, make sure you have completed following tutorials: HTTP Tutorial HTTP Headers Tutorial HTTP Cookies Tutorial
5
Common Terms
6
Cache (Dictionary Meaning): A safe place for hiding & storing things.
Cache (Computer): A special high speed buffering technique employed in various places to provide high speed access like Disk Cache, File Cache etc. Web cache: A web cache stores copies of web documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met.
7
Origin Servers: This server owns resources and can send a response for these resources straight back to the client. It is the last server to receive the request, and originates the reply to the client. Proxy Server: is a server (a computer system or an application program) that acts as an intermediary for requests from clients seeking resources from other servers. Gateway: Gateways will translate the identifier for the requested resource into another form and will request this translated resource from another, often non-HTTP, server.
8
Cache Hit: When a request arrives at the cache it can serve the local copy.
Cache Miss: When a request arrives at the cache it does not have the response available in the cache and has to get it from the server. Revalidation: From time to time the cache needs to check whether the cached documents are up-to-date with the server or not. This is known as the Freshness Check or Revalidation.
9
Web Environment Without Cache
10
Local ISP/Proxy Web Servers & App Servers Data Center ISP/Proxy Intranet Browser Browser Browser
11
Problems
12
Redundant Data Transfers
When multiple clients access a popular origin server page, the server transmits the same document multiple times, once to each client The same bytes travel across the network over and over again These redundant data transfers eat up expensive network bandwidth, slow down transfers, and overload web servers.
13
Bandwidth Bottlenecks
Clients access servers at the speed of the slowest network on the way Flash Crowds Many people accessing a web document at nearly the same time
14
Distance Delays Every network router adds delays to Internet traffic.
Even if there are not many routers between client and server, the speed of light alone can cause a significant delay. The direct distance from Boston to San Francisco is about 2,700 miles. In the very best case, at the speed of light (186,000 miles/sec), a signal could travel from Boston to San Francisco in about 15 milliseconds and complete a round trip in 30 milliseconds
15
Web Caching Environment
16
Local ISP/Proxy Web Servers & App Servers cache Data Center ISP/Proxy
Intranet cache cache cache Browser cache Browser cache Browser
17
Benefits of Web Caching
18
Reduces latency Reduces network traffic Reduces bandwidth usage Reduce server load Reduce distance delays
19
Types of Web Cache
20
Private Cache: These are exclusive.
Browser Cache Public Cache: These are shared between many. Proxy Cache Gateway cache
21
Private Cache Browser Cache : IE, Mozilla and other browsers have a cache setting that allows the user to set cache policies and disk space allotted for cache. The browser has a folder in which certain items that have been downloaded are stored for future access. Makes faster loading of the pages. The cache could be emptied periodically for storage concerns.
22
Public Cache Proxy Cache: Web proxy cache, is a function of a proxy server that caches retrieved Web pages on the server's hard disk so that the page can be quickly retrieved by the same or a different user the next time that page is requested. Eases bandwidth requirements and reduces delays that are inherent in a heavily trafficked, Internet-connected network. The proxy cache also is advantageous when browsing multiple pages of the same Web site.
23
Public Cache Gateway Cache : Also known as “reverse proxy caches” or “surrogate caches,” gateway caches are also intermediaries, but instead of being deployed by network administrators to save bandwidth, they’re typically deployed by Webmasters themselves, to make their sites more scalable, reliable and better performing. Requests can be routed to gateway caches by a number of methods, but typically some form of load balancer is used to make one or more of them look like the origin server to clients. A reverse proxy is a proxy server that is installed on a server network or on network equipment. Reverse proxies are used in front of Web servers.
24
Cache processing steps
25
Receiving Parsing Lookup Freshness check
Cache reads the arriving request message from the network Parsing Cache parses the message, extracting the URL and headers Lookup Cache checks if a local copy is available and, if not, fetches a copy (and stores it locally if it is cachable). Freshness check Cache checks if cached copy is fresh enough and, if not, asks server for any updates.
26
Response creation Sending Logging
Cache makes a response message with the new headers and cached body. Sending Cache sends the response back to the client over the network. Logging Optionally, cache creates a log file entry describing the transaction.
27
Cache Processing Flow
28
Revalidate with Server no Fetch from Server
Request Arrives Cached? no yes Fresh Enough? Revalidated ? no Revalidate with Server no Fetch from Server yes yes Update freshness of the cached document Store in Cache Serve to client
29
Determining Freshness
30
The most important step in Cache processing is the Freshness Check.
We want to Use caches but we don’t want to serve Stale Responses So we need to answer two Questions How Do we decide What is fresh and what is Stale? Once we know a entry is stale. How do we revalidate?
31
Time can be used as a Freshness constraint
We need a Mechanism that makes the cached entry expiable by giving determining the lifespan of the entry in terms of time. The Expiration model of HTTP provides this mechanism.
32
Expiration Model
33
The Expiration model is a way for the server to say how long a requested resource stays fresh for, user agents should cache the resource response and re-use it until its cache is no longer fresh There are three main functions that are performed: Assigning Expiration Time to a response. Calculating Age of a response. Calculating Expiration of a response.
34
Age of the response is the time a response has been in the cache.
Life of a response is the time for which it will be considered fresh by the cache Expiration occurs when Age exceeds Life. Once an Entry expires then the cache will have to fetch a fresh entry on the first subsequent request for that resource. This is called an end to end reload.
35
Servers specifies explicit expiration times using:
Expiration Time determines life of the response. It is assigned to the cached entries by following two methods: Server-Specified Expiration: Origin server provides an explicit expiration time in the future. Heuristic Expiration: Cache calculates Expiration time. Servers specifies explicit expiration times using: Expires header max-age directive of the of the Cache-Control header Heuristic Expiration employs algorithms that use other header values (such as the Last-Modified time) to estimate a plausible expiration time.
36
The expiration mechanism applies only to responses taken from a cache and not to first-hand responses forwarded immediately to the requesting client.
37
Revalidation
38
Like The expiration model HTTP provides a standard mechanism to handle Revalidations as well.
It is known as the Validation Model It uses Validators and Conditional Requests as the tools for revalidation.
39
Validation Model
40
The validation model is the mechanism for a client asking the server whether a cached version of a resource is still fresh. Validation is done when the cache wants to use a stale entry to respond to a client’s request. A cache cannot do a conditional retrieval if it does not have a validator for the entity, which means it will not be refreshable after it expires.
41
When an origin server generates a full response, it attaches some sort of validator to it, which is kept with the cache entry. When a client (user agent or proxy cache) makes a conditional get request for a resource for which it has a cache entry, it includes the associated validator in the request. A conditional GET request is the one which includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range header field.
42
The server then checks that validator against the current validator for the entity
If they match, it responds with a special status code (usually, 304 (Not Modified)) and no entity-body. Otherwise, it returns a full response (including entity-body) Benefits We don’t have to pay the overhead of retransmitting the full response if the cached entry is good We do not have to pay the overhead of an extra round trip if the cached entry is invalid. It’s faster than a Cache miss
43
Validators
44
In some cases both are used
A Validator is a header whose value can be compared with similar type of Header Values in order to Validate the response. For Caching we use the following validators: Last-Modified Dates Entity Tag In some cases both are used
45
Last-Modified: A cache entry is considered to be valid if the entity has not been modified since the Last-Modified value. Entity Tag: Entity tags are opaque validators associated with a resource that usually change when the resource changes. On the basis of changes to entity affecting the associated Validator a validator can be classifiled as: Weak Strong
46
Strong Validators Weak Validators
There might be cases when a server prefers to change the validator only on semantically significant changes, and not when insignificant aspects of the entity change. A validator that does not always change when the resource changes is a "weak validator." Weak validators are only usable in contexts that do not depend on exact equality of an entity. Weak validator is part of an identifier for a set of semantically equivalent entities. Since both origin servers & caches will compare two validators to decide if they represent the same or different entities, one normally would expect that if the entity changes in any way, then the associated validator would change as well. This validator is called a "strong validator." Strong validators are usable in any context. Strong validator as part of an identifier for a specific entity
47
Cache Headers
48
The following header fields are used in caching:
Last-modified If-modified-since If-unmodified-since Etag Vary Age Pragma directive Date Expires Cache-Control
49
Last-modified (Response)
Indicates the date and time at which the origin server believes the variant was last modified. Last modified is implicitly a weak validator. Last-Modified: Tue, 15 Nov :45:26 GMT If-Modified-Since (Request) If the requested variant has not been modified since the time specified in this field An entity will not be returned from the server Instead , a 304 (not modified) response will be returned without any message body. If-Modified-Since: Sat, 12 Oct :43:31 GMT
50
If-Unmodified-Since (Request)
It’s a request Header Field. If the requested resource has not been modified since the time specified in this field, the server SHOULD perform the requested operation as if the If-Unmodified-Since header here not present. Otherwise, it must return 412. This is rarely used If-Unmodified-Since: Sat, 29 Oct :43:31 GMT
51
Many Etag header and within in one many values
Etag (Response, HTTP 1.1) Many Etag header and within in one many values An entity tag, provides for an “opaque” cache validator. This might allow more reliable validation in situations where it is inconvenient to store modification dates Where the one-second resolution of HTTP date values is not sufficient Where the origin server wishes to avoid certain paradoxes that might arise from the use of modification dates. This field provides the value of the entity tag for the requested variant. ETag: “xyxxy” ETag: W/”xyzzy” ETag: “ “ If both Etag and Last Modified values are present the subsequent Conditional request should use both values
52
Following are used in the request header with entity tags to make the request conditional
If-Match: The method is performed only if there is an entity that matches the entity tag used in this field. If-Match: "xyzzy” If-Match: "xyzzy", "r2d2xxxx", "c3piozzzz” If-Match: *
53
If-None-Match: A client that has one or more entities previously obtained from the resource can verify that none of those entities is current by including a list of their associated entity tags in the If-None-Match header field. If none of the entity tags match, then the server MAY perform the requested method. If-None-Match: "xyzzy“ If-None-Match: W/"xyzzy" If-None-Match: "xyzzy", "r2d2xxxx", "c3piozzzz" If-None-Match: * This is used most of the time.
54
If-Range: If the entity tag given in the If-Range header matches the current Entity tag for the entity, then the server SHOULD provide the specified sub-range of the entity using a 206 (Partial content) response. If the entity tag does not match, then the server SHOULD return the entire entity using a 200 (OK) response. MAY use that date in an If-Range header. Example – both of these are valid If-Range: “df6b0-b4a-3be1b5e1” If-Range: Sat, 29 Oct :43:31 GMT
55
Vary (Response) The Vary response header lists all of the client request headers that the server considers to select the document or generate custom content. If the server uses information other than the headers in the request, such as the client’s IP address, time of the day, user personalization, etc. it uses a Vary header with a value of “*”. The fields in the vary header of a cached response (while fresh) determine whether the cache is permitted to use the same response to reply to a subsequent request without validation. An HTTP/1.1 server SHOULD include a Vary header field with any cacheable response that is subject to server driven negotiation.
56
Age (Response) The Age response-header field conveys the sender's estimate of the amount of time since the response (or its revalidation) was generated at the origin server. A cached response is “fresh” if its age does not exceed its freshness lifetime. Age values are non-negative decimal integers, representing time in seconds. An HTTP/1.1 server that includes a cache MUST include an Age header field in every response generated from its own cache. Age: 3600
57
Pragma Directive The Pragma general-header field is used to include implementation- specific directives that might apply to any recipient along the request/response chain. Pragma = "Pragma" ":" 1#pragma-directive pragma -directive = "no-cache" | extension-pragma extension- pragma = token [ "=" ( token | quoted-string ) When the no-cache directive is present in a request message, an application SHOULD forward the request toward the origin server even if it has a cached copy of what is being requested This is used only for compatibility with HTTP1.0
58
Date The Date general-header field represents the date and time at which the message was originated. A received message that does not have a Date header field MUST be assigned one by the recipient if the message will be cached by that recipient or gatewayed via a protocol which requires a Date. Origin servers MUST include a Date header field in all responses, except few cases. A client without a clock MUST NOT send a Date header field in a request Date: Tue, 15 Nov :12:31 GMT
59
Expires The Expires entity-header field gives the date/time after which the response is considered stale. To mark a response as “already expired,” an origin server sends an Expires date that is equal to the Date header value. To mark a response as “never expires,” an origin server sends an Expires date approximately one year from the time the response is sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one year in the future. Expires: Thu, 01 Dec :00:00 GMT
60
Cache-Control The Cache-Control general-header field is used to specify directives that MUST be obeyed by all caching mechanisms along the request/response chain. The directives specify behavior intended to prevent caches from adversely interfering with the request or response. The Cache-Control directives are either for the request, response or both. HTTP/1.0 caches might not implement Cache-Control.
61
Cache-Control Directives
62
Public (response) Indicates that any response may be cached by any cache. Marks authenticated responses as cacheable; normally, if HTTP authentication is required, responses are automatically private. Private (response) Indicates that all or part of the response message is intended for a single user MUST not be cached by a shared cache (e.g., in a proxy), Could be cached by a private cache (e.g., in a browser).
63
no-cache (request/response)
The server sends this to prevent the cache from serving this response without revalidation. Sometimes, a client wants a reload of the entry from the origin server (this could be because it thinks the expiration time is overestimated by its cache or intermediate caches or the server’s cache. It could also use this directive if the cached copy is corrupted for some reason. Public caches won’t cache this response No-Store (request/response) Must not store the request or response that contains this header.
64
s-maxage (secs) (request/response)
In a response, this overrides the maximum age specified by max-age and expires directives, for a shared cache. It is ignored by a private cache. max-age (secs) (request/response) In a request, it is the maximum age of the response (secs) the client is willing to accept. In a response, it specifies the maximum amount of time that a representation will be considered fresh.
65
Min-fresh (request) Client wants a response that will be fresh for at least min-fresh secs. Max-stale (request) Client is willing to accept a stale (expired) response up to max-stale seconds old. Cache must attach a Warning header to the stale response using warning 110 No-transform (request/response) cache or proxy MUST NOT change any aspect of the entity-body that is specified by these headers, including the value of the entity-body itself.
66
must-revalidate (response)
Cache must do an end-to-end revalidation every time if this directive is in the response received by the cache. If the response is stale based on Expires header or max-age value. This is required to overrule other settings such as max-stale or anything else that ignores expiration time. proxy-revalidate (response) Same as must-revalidate, but applies to proxy (not to non shared user agent caches).
67
Only-if-cached (request)
If the Client wants the cache to return only to return only those responses that it currently has stored, and not to reload or revalidate with the origin server If the response is not cached the cache responds with 504 –gateway timeout status
68
Cachability
69
Cacheable Response Codes
Description Explanation 200 Ok success 203 Non-authoritative information Same as 200, but sender has reason to believe that the entity headers are different from those the origin server would send 206 Partial content Similar to 200, but response to a “range” request. Cacheable if the cache supports range requests. 300 Multiple choices Response includes choices from which user could make a selection 301 Moved permanently New URL is in the response headers 410 Gone Requested resource moved permanently from origin server
70
Cacheable Request Methods
GET Yes, by default POST Uncachable by default, cacheable if Cache-control headers allow HEAD May be used to cache prev updated entry PUT No DELETE OPTIONS TRACE
71
Caching Scenarios
72
Scenario 1 Browser Caches a Response
73
Client Caches a Response
GET /foo/index.html HTTP/1.1 Host: Client Server Internet HTTP/ OK Cache-Control: max-age=60 Content-Length: 3688 Content-Type: text/html Client stores copy of on its hard disk
74
Client Cache Hit GET /foo/index.html HTTP/1.1 Host: Client Server Internet Client Receives copy of from its hard disk
75
Cached Entry Expires GET /foo/index.html HTTP/1.1 Host: Client Server Internet HTTP/ OK Cache-Control: max-age=60 Content-Length: 3688 Content-Type: text/html Cache does an End to end reload and stores copy of on its hard disk
76
Scenario 2 No expiry in the response Client used heuristic model to calculate expiry time
77
Client Caches a Response
GET /foo/index.html HTTP/1.1 Host: Client Server Internet HTTP/ OK Last-Modified: Fri, 23 Jul :52:37 GMT Content-Length: 3688 Content-Type: text/html Client stores copy of on its hard disk and calculates expiry based on heuristic model
78
Revalidation on expiry (Revalidate Hit)
GET /foo/index.html HTTP/1.1 Host: If-Modified-Since: Fri, 23 Jul :52:37 GMT Client Server Internet HTTP/ Not Modified Response Revalidated the cache will Refresh the cache entry and serve response from local cache
79
Revalidation on expiry (Revalidate Miss)
GET /foo/index.html HTTP/1.1 Host: If-Modified-Since: Fri, 23 Jul :52:37 GMT Client Server Internet HTTP/ OK Last-Modified: Fri, 23 Jul :55:37 GMT Content-Length: 3688 Content-Type: text/html Response Revalidated the cache will Refresh the cache entry and serve response from local cache
80
Scenario 3 Revalidation With Etags
81
Client Caches a Response
GET /foo/index.html HTTP/1.1 Host: Date: Fri, 23 Jul :50:37 GMT Server Client Internet HTTP/ OK Date: Fri, 23 Jul :50:57 GMT Expires: Fri, 23 Jul :51:37 GMT Etag: “CZSSOOOO1000” Content-Length: 3688 Content-Type: text/html Client stores copy of on its hard disk
82
Revalidation on expiry (Revalidate Hit)
GET /foo/index.html HTTP/1.1 Host: Date: Fri, 23 Jul :52:37 GMT If-None-Match: “CZSSOOOO1000” Client Server Internet HTTP/ Not Modified Date: Fri, 23 Jul :52:40 GMT Response Revalidated the cache will Refresh the cache entry and serves response from local cache
83
Scenario 4 Caching a Response in Proxy Cache
84
Proxy Cache Miss Client Proxy Server Internet
GET /foo/index.html HTTP/1.1 Host: GET /foo/index.html HTTP/1.1 Host: Client Proxy Server Internet HTTP/ OK Last-Modified: Fri, 23 Jul ... Content-Length: 3688 Content-Type: text/html HTTP/ OK Age: 0 Last-Modified: Fri, 23 Jul ... Content-Length: 3688 Content-Type: text/html
85
Proxy Cache Hit Client Proxy Server Internet
GET /foo/index.html HTTP/1.1 Host: Client Proxy Server Internet HTTP/ OK Age: 60 Last-Modified: Fri, 23 Jul ... Content-Length: 3688 Content-Type: text/html
86
Scenario 5 Proxy ignores Private Response
87
Proxy Ignores Private Response
GET /foo/index.html HTTP/1.1 Host: GET /foo/index.html HTTP/1.1 Host: Client Proxy Server Internet HTTP/ OK Last-Modified: Fri, 23 Jul ... Cache-Control: Private Content-Length: 3688 Content-Type: text/html HTTP/ OK Cache-Control: Private Last-Modified: Fri, 23 Jul ... Content-Length: 3688 Content-Type: text/html Proxy will not store copy of Client can store a copy of
88
Bibliography HTTPTutorial (Taken from Training folder of CVS Docs)
RFC 2616 in general and Chapter 13 Caching in HTTP in more details Chapter 7 – Caching of Book - HTTP – The definitive guide by O’RIELLY Caching tutorial - betterexplained.co/articles /how-to-optimize-your-site-with- http-caching/
89
Queries ?
90
Thanks!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.