CS193H: High Performance Web Sites Lecture 8: Rule 4 – Gzip Components Steve Souders Google souders@cs.stanford.edu
Announcements Web 100 Performance Profile (round 1) class project has been graded – contact Aravind if you want to know your grade
Compression (encoding) GET /v-app/scripts/107652916-dom.common.js HTTP/1.1 Host: www.blogger.com User-Agent: Mozilla/5.0 (…) Gecko/2008070208 Firefox/3.0.1 Accept-Encoding: gzip,deflate GET /v-app/scripts/107652916-dom.common.js HTTP/1.1 Host: www.blogger.com User-Agent: Mozilla/5.0 (…) Gecko/2008070208 Firefox/3.0.1 HTTP/1.1 200 OK Content-Type: application/x-javascript Last-Modified: Mon, 22 Sep 2008 21:14:35 GMT Content-Length: 6230 function d(s) {... HTTP/1.1 200 OK Content-Type: application/x-javascript Last-Modified: Mon, 22 Sep 2008 21:14:35 GMT Content-Length: 2066 Content-Encoding: gzip XmoÛHþ\ÿFÖvã*wØoq... typically reduces size by 70% (6230-2066)/6230 = 67%
Gzip vs. Deflate gzip (default settings) compresses more Gzip Deflate Size Savings Script 3.3K 1.1K 67% 66% 39.7K 14.5K 64% 16.6K 58% Stylesheet 1.0K 0.4K 56% 0.5K 52% 14.1K 3.7K 73% 4.7K gzip (default settings) compresses more
Pros and Cons Pro: smaller transfer size Con: CPU cycles – on client and server Don't compress resources < 1K
Gzip configuration Apache 1.3: mod_gzip Apache 2.x: mod_deflate mod_gzip_item_include file \.html$ mod_gzip_item_include mime ^text/html$ mod_gzip_item_include file \.js$ mod_gzip_item_include mime ^application/x-javascript$ mod_gzip_item_include file \.css$ mod_gzip_item_include mime ^text/css$ Apache 2.x: mod_deflate AddOutputFilterByType DEFLATE text/html text/css application/x-javascript control compression level: DeflateCompressionLevel http://httpd.apache.org/docs/2.0/mod/mod_deflate.html
Gzip: not just for HTML gzip scripts, stylesheets, XML, JSON amazon.com x aol.com some cnn.com ebay.com froogle.google.com msn.com deflate myspace.com wikipedia.org yahoo.com youtube.com HTML Scripts Stylesheets aol.com x ebay.com some facebook.com google.com/search na search.live.com/results msn.com myspace.com en.wikipedia.org/wiki yahoo.com youtube.com Images and PDF files are already compressed. Gzipping them wastes CPU and can increase file sizes. gzip scripts, stylesheets, XML, JSON (not images, Flash, PDF) October 2008 March 2007
Edge Case: Proxies Proxy Origin Server 1 GET main.js Accept-Encoding: gzip 2 GET main.js Accept-Encoding: gzip 5 main.js Content-Encoding: gzip 3 main.js Content-Encoding: gzip 6 GET main.js (no Accept-Encoding) 7 main.js Content-Encoding: gzip 4 main.js Content-Encoding: gzip proxies may serve gzipped content to browsers that don't support it, and vice versa
Edge Case: Proxies w/ Vary Proxy Origin Server 1 GET main.js Accept-Encoding: gzip 2 GET main.js Accept-Encoding: gzip 7 GET main.js (no Accept-Encoding) 5 main.js Content-Encoding: gzip 3 main.js Content-Encoding: gzip Vary: Accept-Encoding 6 GET main.js (no Accept-Encoding) 8 main.js Vary: Accept-Encoding 10 main.js (no gzip) 4 main.js Content-Encoding: gzip [Accept-Encoding: gzip] 11 GET main.js Accept-Encoding: gzip 12 main.js Content-Encoding: gzip 9 main.js [Accept-Encoding: ] 13 GET main.js (no Accept-Encoding) 14 main.js (no gzip) add Vary: Accept-Encoding
Edge Case: Bad Browsers < 1% of browsers have problems with gzip IE 5.5: http://support.microsoft.com/default.aspx?scid=kb;en-us;Q313712 IE 6.0: http://support.microsoft.com/default.aspx?scid=kb;en-us;Q31249 Netscape 3.x, 4.x http://www.schroepl.net/projekte/mod_gzip/browser.htm User-Agent white list for gzip Apache 1.3: mod_gzip_item_include reqheader "User-Agent: MSIE [6-9]" mod_gzip_item_include reqheader "User-Agent: Mozilla/[5-9]" Apache 2.0: BrowserMatch ^MSIE [6-9] gzip BrowserMatch ^Mozilla/[5-9] gzip
Edge Case: Bad Browsers (cont'd) proxies could mix-up responses give cached response from useragent1 to useragent2 could add Vary: User-Agent so many possibilities, defeats proxy caching better to add Cache-Control: Private downside: disables all proxy caches is it a serious problem? hard to diagnose; problem getting smaller
Edge Case: ETags what happens when proxy makes Conditional GET requests? Last-Modified date for gzipped vs. ungzipped is different => If-Modified-Since works fine ETag is the same in Apache for gzipped & ungzipped => If-None-Match succeeds, proxy could give browser mismatched content remove Etags! (Rule 13) http://issues.apache.org/bugzilla/show_bug.cgi?id=39727
Edge Case: ETags present Proxy Origin Server 1 GET main.js Accept-Encoding: gzip 2 GET main.js Accept-Encoding: gzip 7 GET main.js If-None-Match: "de158-e58-c7ee4140" 5 main.js Content-Encoding: gzip 3 main.js Content-Encoding: gzip Cache-Control: max-age=0 ETag: "de158-e58-c7ee4140" 6 GET main.js (no Accept-Encoding) 8 304 Not Modified 9 main.js Content-Encoding: gzip 4 main.js Content-Encoding: gzip Cache-Control: max-age=0 ETag: "de158-e58-c7ee4140" proxy gives browser mismatched content
Edge Case: ETags removed Proxy Origin Server 1 GET main.js Accept-Encoding: gzip 2 GET main.js Accept-Encoding: gzip 7 GET main.js If-Modified-Since: Thu, 21 Aug 2008 23:53:57 GMT 5 main.js Content-Encoding: gzip 3 main.js Content-Encoding: gzip Cache-Control: max-age=0 Last-Modified: Thu, 21 Aug 2008 23:53:57 GMT 6 GET main.js (no Accept-Encoding) 8 main.js Cache-Control: max-age=0 Last-Modified: Fri, 22 Aug 2008 09:43:15 GMT 10 main.js (no gzip) 4 main.js Content-Encoding: gzip Cache-Control: max-age=0 Last-Modified: Thu, 21 Aug 2008 23:53:57 GMT 9 main.js Cache-Control: max-age=0 Last-Modified: Fri, 22 Aug 2008 09:43:15 GMT removing ETags avoids the problem
Vary: Accept-Encoding Cache-Control: private Edge Case Fixes Vary: Accept-Encoding Cache-Control: private ETag aol.com x ebay.com x (IIS) facebook.com google.com/search search.live.com/results msn.com myspace.com x (Apa) en.wikipedia.org/wiki yahoo.com youtube.com some Images and PDF files are already compressed. Gzipping them wastes CPU and can increase file sizes. Vary: User-Agent – not used October 2008 March 2007
Homework "Improving Top Site" class project: add improvements for Rule 4 measure improvements using Hammerhead record results in your personal Web 100 sheet read Chapter 5 of HPWS for 10/17
Questions How much are file sizes typically reduced by using gzip compression? What types of resources (images, scripts, etc.) should not be compressed? For the resource types that should be compressed, should they always be compressed? How do you prevent proxies from serving gzipped resources to browsers that don't support gzip? How can ETags cause proxies to serve mismatched content to browsers?