HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004.

Slides:



Advertisements
Similar presentations
HTTP HyperText Transfer Protocol. HTTP Uses TCP as its underlying transport protocol Uses port 80 Stateless protocol (i.e. HTTP Server maintains no information.
Advertisements

1 Caching in HTTP Representation and Management of Data on the Internet.
HTTP – HyperText Transfer Protocol
Web basics HTTP – – URI/L/Ns – HTML –
Chapter 9 Application Layer, HTTP Professor Rick Han University of Colorado at Boulder
CS320 Web and Internet Programming Generating HTTP Responses
16-Jun-15 HTTP Hypertext Transfer Protocol. 2 HTTP messages HTTP is the language that web clients and web servers use to talk to each other HTTP is largely.
HTTP Hypertext Transfer Protocol. HTTP messages HTTP is the language that web clients and web servers use to talk to each other –HTTP is largely “under.
How the web works: HTTP and CGI explained
What’s a Web Cache? Why do people use them? Web cache location Web cache purpose There are two main reasons that Web cache are used:  to reduce latency.
Chapter 2 Application Layer Computer Networking: A Top Down Approach Featuring the Internet, 3 rd edition. Jim Kurose, Keith Ross Addison-Wesley, July.
Web, HTTP and Web Caching
Hypertext Transfer Protocol Information Systems 337 Prof. Harry Plantinga.
HTTP Overview Vijayan Sugumaran School of Business Administration Oakland University.
2/9/2004 Web and HTTP February 9, /9/2004 Assignments Due – Reading and Warmup Work on Message of the Day.
CSC 2720 Building Web Applications Servlet – Getting and Setting HTTP Headers.
Nikolay Kostov Telerik Corporation
Lecture 4: stateful inspection, advanced protocols Roei Ben-Harush 2015.
1 Caching  Temporary storage of frequently accessed data (duplicating original data stored somewhere else)  Reduces access time/latency for clients 
Basics of the HTTP Protocol and Apache Web Server Brandon Checketts.
Web technologies and programming cse hypermedia and multimedia technology Fanis Tsandilas April 3, 2007.
Web Hacking 1. Overview Why web HTTP Protocol HTTP Attacks 2.
Introduction to Web Programming Fall 2014/2015 Some slides are based upon Web Technologies course slides, HUJI, 2009 Extended System Programming Laboratory.
1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin ApacheCon 2005 Wednesday, 14 December 2005.
COMP3016 Web Technologies Introduction and Discussion What is the Web?
Java Technology and Applications
Krerk Piromsopa. Web Caching Krerk Piromsopa. Department of Computer Engineering. Chulalongkorn University.
HTTP Protocol Specification
HTTP HTML Introduction to web development. elaborate SPARCS 07 Wheel Moodle TA 안병욱 CS101 TA The presenter is 바퀴짱 ? 3 월 신작 ? 밤의 제왕 ? 악명 높은 TA?
Web Caching: Replication on the World Wide Web Jonathan Bulava CSC8530 – Distributed Systems Dr. Paul Schragger.
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
HTTP Reading: Section and COS 461: Computer Networks Spring
CSC 2720 Building Web Applications Getting and Setting HTTP Headers (With PHP Examples)
CP476 Internet Computing Lecture 5 : HTTP, WWW and URL 1 Lecture 5. WWW, HTTP and URL Objective: to review the concepts of WWW to understand how HTTP works.
TCP/IP Protocol Suite 1 Chapter 22 Upon completion you will be able to: World Wide Web: HTTP Understand the components of a browser and a server Understand.
Application Layer 2 Figures from Kurose and Ross
Rensselaer Polytechnic Institute Shivkumar Kalvanaraman, Biplab Sikdar 1 The Web: the http protocol http: hypertext transfer protocol Web’s application.
2: Application Layer1 Internet apps: their protocols and transport protocols Application remote terminal access Web file transfer streaming multimedia.
Maryam Elahi University of Calgary – CPSC 441.  HTTP stands for Hypertext Transfer Protocol.  Used to deliver virtually all files and other data (collectively.
WWW, HTTP, GET, POST, Cookies Svetlin Nakov Telerik Corporation
IT Engineering Instructor: Rezvan Shiravi
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 17 This presentation © 2004, MacAvon Media Productions Multimedia and Networks.
Proxy Lab Recitation I Monday Nov 20, 2006.
1 HTTP EECS 325/425, Fall 2005 September Chapter 2: Application layer r 2.1 Principles of network applications m app architectures m app requirements.
Web Server Design Week 8 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein 3/3/10.
HyperText Transfer Protocol (HTTP) RICHI GUPTA CISC 856: TCP/IP and Upper Layer Protocols Fall 2007 Thanks to Dr. Amer, UDEL for some of the slides used.
HTTP1 Hypertext Transfer Protocol (HTTP) After this lecture, you should be able to:  Know how Web Browsers and Web Servers communicate via HTTP Protocol.
Web Server Design Week 4 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein 2/03/10.
1 Caching in HTTP Representation and Management of Data on the Internet.
CIS679: Lecture 13 r Review of Last Lecture r More on HTTP.
A Little Bit About Cookies Fort Collins, CO Copyright © XTR Systems, LLC A Little Bit About Cookies Instructor: Joseph DiVerdi, Ph.D., M.B.A.
1-1 HTTP request message GET /somedir/page.html HTTP/1.1 Host: User-agent: Mozilla/4.0 Connection: close Accept-language:fr request.
Operating Systems Lesson 12. HTTP vs HTML HTML: hypertext markup language ◦ Definitions of tags that are added to Web documents to control their appearance.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 17 This presentation © 2004, MacAvon Media Productions Multimedia and Networks.
HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol.
A Faster FasterFox? David Backeberg and Remo Mueller.
EE 122: Lecture 21 (HyperText Transfer Protocol - HTTP) Ion Stoica Nov 20, 2001 (*)
Summer 2007 Florida Atlantic University Department of Computer Science & Engineering COP 4814 – Web Services Dr. Roy Levow Part 1 – Introducing Ajax.
1 HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin ApacheCon 2004 November 17, 2004.
1 10/19/05CS360 Windows Programming ASP.NET. 2 10/19/05CS360 Windows Programming ASP.NET  ASP.NET works on top of the HTTP protocol  Takes advantage.
Overview of Servlets and JSP
LURP Details. LURP Lab Details  1.Given a GET … call a proxy CGI script in the same way you would for a normal CGI request  2.This UDP perl.
COMP2322 Lab 2 HTTP Steven Lee Jan. 29, HTTP Hypertext Transfer Protocol Web’s application layer protocol Client/server model – Client (browser):
Web Caching. Why Caching? Faster browsing experience for users Cache hit rate Traffic Prioritization Reduce network bandwidth requirements significantly.
Week 11: Application Layer 1 Web and HTTP r Web page consists of objects r Object can be HTML file, JPEG image, Java applet, audio file,… r Web page consists.
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
INTRODUCTION Dr Mohd Soperi Mohd Zahid Semester /16.
© Janice Regan, CMPT 128, Jan 2007 CMPT 371 Data Communications and Networking HTTP 0.
Web Caching? Web Caching:.
HTTP Caching & Cache-Busting for Content Publishers
Presentation transcript:

HTTP Caching & Cache-Busting for Content Publishers Michael J. Radwin O’Reilly Open Source Convention July 28, 2004

2 Publishers must think about caching Publishers have a lot of web content –HTML, images, Flash, movies Speed is important part of user experience Bandwidth is expensive –Use what you need, but avoid unnecessary extra Personalization differentiates –Show timely data (stock quotes, news stories) –Get accurate advertising statistics –Protect sensitive info ( , account balances)

3 HTTP Review Server Client Interne t (1) Client connects to port 80 Server Client (2) Client sends HTTP GET request Interne t

4 HTTP Review (cont’d) Server Client Interne t (3) Client reads HTTP response from server Server Client Interne t (4) Client and Server close connection

5 HTTP Example telnet 80 Trying Connected to w6.example.com. Escape character is '^]'. GET /foo/index.html HTTP/1.1 Host: HTTP/ OK Date: Wed, 28 Jul :36:12 GMT Last-Modified: Fri, 23 Jul :52:37 GMT Content-Length: 3688 Connection: close Content-Type: text/html Hello World...

6 Browsers use private caches Server Client Interne t GET /foo/index.html HTTP/1.1 Host: HTTP/ OK Last-Modified: Fri, 23 Jul :52:37 GMT Content-Length: 3688 Content-Type: text/html Client stores copy of on its hard disk with timestamp.

7 Revalidation (Conditional GET) Server Client Interne t GET /foo/index.html HTTP/1.1 Host: If-Modified-Since: Fri, 23 Jul :52:37 GMT HTTP/ Not Modified

8 Non-Caching Proxy Server Client Interne t Proxy GET /foo/index.html HTTP/1.1 Host: HTTP/ OK Last-Modified: Fri, 23 Jul... Content-Length: 3688 Content-Type: text/html GET /foo/index.html HTTP/1.1 Host: HTTP/ OK Last-Modified: Fri, 23 Jul... Content-Length: 3688 Content-Type: text/html

9 Proxy Cache Miss Server Client Interne t Proxy GET /foo/index.html HTTP/1.1 Host: HTTP/ OK Last-Modified: Fri, 23 Jul... Content-Length: 3688 Content-Type: text/html GET /foo/index.html HTTP/1.1 Host: HTTP/ OK Last-Modified: Fri, 23 Jul... Content-Length: 3688 Content-Type: text/html

10 Proxy Cache Hit Server Client Interne t Proxy GET /foo/index.html HTTP/1.1 Host: HTTP/ OK Last-Modified: Fri, 23 Jul... Content-Length: 3688 Content-Type: text/html

11 Proxy Cache Revalidation Hit Server Client Interne t Proxy GET /foo/index.html HTTP/1.1 Host: If-Modified-Since: Fri, 23 Jul... HTTP/ Not Modified GET /foo/index.html HTTP/1.1 Host: HTTP/ OK Last-Modified: Fri, 23 Jul... Content-Length: 3688 Content-Type: text/html

12 Assumptions about content types Rate of change once published Frequently OccasionallyRarely/Never HTMLCSS JavaScript Images Flash PDF Dynamic Content Static Content Personalized Same for everyone

13 Top 5 techniques for publishers 1.Use “Cache-Control: private” for personalized content 2.Implement “Images Never Expire” policy 3.Use a cookie-free TLD for static content 4.Use Apache defaults for CSS & JavaScript 5.Use random strings in URL for accurate hit metering or very sensitive content

14 1. Use “Cache-Control: private” for personalized content Rate of change once published Frequently OccasionallyRarely/Never HTMLCSS JavaScript Images Flash PDF Dynamic Content Static Content Personalized Same for everyone

15 Bad caching of personalized content Webmail Server Client 1 Interne t Proxy GET /msg3.html HTTP/1.1 Host: webmail.example.com Cookie: user=jane Jane’s message GET /msg3.html HTTP/1.1 Host: webmail.example.com Cookie: user=jane Jane’s message

16 Bad caching of personalized content Webmail Server Client 1 msg3.html Interne t Proxy GET /msg3.html HTTP/1.1 Host: webmail.example.com Cookie: user=jane Jane’s message

17 Bad caching of personalized content Webmail Server Client 2 msg3.html Interne t Proxy GET /msg3.html HTTP/1.1 Host: webmail.example.com Cookie: user=mary Jane’s message

18 What’s cacheable? HTTP/1.1 allows caching anything by default –Unless explicit Cache-Control header In practice, most caches avoid anything with –Cache-Control/Pragma header –Cookie/Set-Cookie headers –WWW-Authenticate/Authorization header –POST/PUT method –302/307 status code

19 Cache-Control: private Shared caches bad for shared content –Mary shouldn’t be able to read Jane’s webmail Private caches perfectly OK –Speed up web browsing experience Avoid personalization leakage with single line in httpd.conf or.htaccess Header set Cache-Control private

20 2. “Images Never Expire” policy Rate of change once published Frequently OccasionallyRarely/Never HTMLCSS JavaScript Images Flash PDF Dynamic Content Static Content Personalized Same for everyone

21 The “Images Never Expire” Policy Encourage caching of icons & logos –Forever ≈ 10 years in Internet biz Must change URL when you change image – – Tradeoff –More difficult for designers –Bandwidth savings, faster user experience

22 Images Never Expire (mod_expires) # Works with both HTTP/1.0 and HTTP/1.1 ExpiresActive On ExpiresByType image/gif A ExpiresByType image/jpeg A ExpiresByType image/png A

23 Images Never Expire (mod_headers) # Works with HTTP/1.1 only Header set Cache-Control \ "max-age= " # Works with both HTTP/1.0 and HTTP/1.1 Header set Expires \ "Mon, 28 Jul :30:00 GMT"

24 mod_images_never_expire /* Enforce policy with module that runs at URI translation hook */ static int translate_imgexpire(request_rec *r) { const char *ext; if ((ext = strrchr(r->uri, '.')) != NULL) { if (strcasecmp(ext, ".gif") == 0 || strcasecmp(ext, ".jpg") == 0 || strcasecmp(ext, ".png") == 0 || strcasecmp(ext, ".jpeg") == 0) { if (ap_table_get(r->headers_in, "If-Modified-Since") != NULL || ap_table_get(r->headers_in, "If-None-Match") != NULL) { /* Don't bother checking filesystem, just hand back a 304 */ return HTTP_NOT_MODIFIED; } return DECLINED; }

25 3. Cookie-free TLD for static content Rate of change once published Frequently OccasionallyRarely/Never HTMLCSS JavaScript Images Flash PDF Dynamic Content Static Content Personalized Same for everyone

26 Cookie-free TLD for static content For maximum efficiency use two domains – for HTML –static.example.net for images Many proxies won’t cache Cookie reqs –But: multimedia is never personalized –Cookies would ignored by server anyways

27 Typical GET request w/Cookies GET /i/foo/bar/quux.gif HTTP/1.1 Host: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/ Firefox/0.8 Accept: application/x-shockwave- flash,text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plai n;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1 Cookie: U=mt=vtC1tp2MhYv9RL5BlpxYRFN_P8DpMJoamllEcA--&ux=IIr.AB&un=42vnticvufc8v; brandflash=1; B=amfco1503sgp8&b=2; F=a=NC184LcsvfX96G.JR27qSjCHu7bII3s. tXa44psMLliFtVoJB_m5wecWY_.7&b=K1It; LYC=l_v=2&l_lv=7&l_l=h03m8d50c8bo &l_s=3yu2qxz5zvwquwwuzv22wrwr5t3w1zsr&l_lid=14rsb76&l_r=a8&l_um=1_0_1_0_0; GTSessionID = ; Y=v=1&n=6eecgejj7012f &l=h03m8d50c8bo/o&p=m012o &jb=16|47|&r=a8&lg=us&intl=us&np=1; PROMO=SOURCE=fp5; YGCV=d=; T=z=iTu.ABiZD/AB6dPWoqXibIcTzc0BjY3TzI3NTY0MzQ- &a=YAE&sk=DAAwRz5HlDUN2T&d=c2wBT0RBekFURXdPRFV3TWpFek5ETS0BYQFZQUUBb2sBWlcwLQF 0aXABWUhaTVBBAXp6AWlUdS5BQmdXQQ--&af=QUFBQ0FDQURCOUFIQUJBQ0FEQUtBTE FNSDAmdHM9MTA5MDE4NDQxOCZwcz1lOG83MUVYcTYxOVouT2Ftc1ZFZUhBLS0-; LYS=l_fh=0&l_vo=myla; PA=p0=dg13DX4Ndgk-&p1=6L5qmg--&e=xMv.AB; YP.us=v=2&m=addr&d=1525+S+Robertson+Blvd%01Los+Angeles%01CA% %014480% % %019%01a% Referer: Accept-Language: en-us,en;q=0.7,he;q=0.3 Accept-Encoding: gzip,deflate Accept-Charset: ISO ,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive

28 Same request, no Cookies GET /i/foo/bar/quux.gif HTTP/1.1 Host: static.example.net User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/ Firefox/0.8 Accept: application/x-shockwave- flash,text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plai n;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1 Referer: Accept-Language: en-us,en;q=0.7,he;q=0.3 Accept-Encoding: gzip,deflate Accept-Charset: ISO ,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Added bonus: much smaller GET request –Dial-up MTU size 576 bytes, PPPoE 1492 –1450 bytes reduced to 550

29 4. Apache defaults for static, occasionally-changing content Rate of change once published Frequently OccasionallyRarely/Never HTMLCSS JavaScript Images Flash PDF Dynamic Content Static Content Personalized Same for everyone

30 Revalidation works pretty well Revalidation default behavior for static content –Browser sends If-Modified-Since request –Server replies with short 304 Not Modified –No fancy Apache config needed Use if you can’t predict when content will change –Page designers can change immediately –No renaming necessary Cost: extra HTTP transaction for 304 –Small with Keep-Alive, but large sites disable

31 Techniques to encourage caching Send explicit Cache-Control or Expires Generate “static content” headers –Last-Modified, ETag –Content-Length Avoid “ cgi-bin ”, “.cgi ” or “ ? ” in URLs –Some proxies (e.g. Squid) won’t cache –Use PATH_INFO instead

32 5. Random URL strings for accurate hit metering or very sensitive content Rate of change once published Frequently OccasionallyRarely/Never HTMLCSS JavaScript Images Flash PDF Dynamic Content Static Content Personalized Same for everyone

33 Accurate advertising statistics If you trust proxies –Send Cache-Control: must-revalidate –Count 304 Not Modified log entries as hits If you don’t –Ask client to fetch uncacheable image URL –Return 307 to highly cacheable image file –Count 307s as hits –Don’t bother to look at cacheable server log

34 Hit-metering for advertisements (1) var r = Math.random(); var t = new Date(); document.write(" ");

35 Hit-metering for advertisements (2) GET /ad/foo/bar.gif?t= ;r= HTTP/1.1 Host: ads.example.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/ Firefox/0.8 Referer: Cookie: uid=C50DF33E-E B1F3-946AEDF9308B HTTP/ Temporary Redirect Date: Wed, 28 Jul :45:06 GMT Cache-Control: max-age=0,no-cache,no-store Expires: Tue, 11 Oct 1977, 01:23:45 GMT Pragma: no-cache Location: Content-Type: text/html Moved

36 Hit-metering for advertisements (3) GET /i/foo/bar.gif HTTP/1.1 Host: static.example.net User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/ Firefox/0.8 Referer: HTTP/ OK Date: Wed, 28 Jul :45:07 GMT Last-Modified: Mon, 05 Oct :32:51 GMT ETag: "69079e-ad cc8" Cache-Control: public,max-age= Expires: Mon, 28 Jul :45:07 GMT Content-Length: 6096 Content-Type: image/gif GIF89a...

37 Turning proxies into private caches Use distinct tokens in URL –No two users use same token –Defeats shared proxy caches –Works well with private caches Doesn’t break the back button May break visited-link highlighting –e.g. JavaScript timestamps/random numbers –Every link is blue, no purple

38 Breaking the Back button When users click browser Back button –Expect to go back one page instantly –Private cache enables this behavior Aggressive cache-busting breaks Back button –Server sends Pragma: no-cache or Expires in past –Browser must re-visit server to re-fetch page –Hitting network much slower than hitting disk Use very sparingly –Compromising user experience is A Bad Thing

39 Review: Top 5 techniques 1.Use “Cache-Control: private” for personalized content 2.Implement “Images Never Expire” policy 3.Use a cookie-free TLD for static content 4.Use Apache defaults for CSS & JavaScript 5.Use random strings in URL for accurate hit metering or very sensitive content