Whole Page Performance Leeann Bent and Geoffrey M. Voelker University of California, San Diego
August 14, 2002WWCCD ‘022 Whole Page Performance? l Extensive previous work on how specific techniques affect individual object download. u Caching, Prefetching, CDNs, DNS caching. l However, user downloads pages of objects. u Not clear how individual object performance maps onto whole page performance l Goal: Study whole page performance u Extent to which different optimizations are used u Effect on downloading whole pages of objects
August 14, 2002WWCCD ‘023 Related Work l [Krishnamurthy and Wills99] look at: u Parallel (HTTP1.0), persistent and pipelined connections. »In addition to caching, range requests, and content placed on different servers. u Top-level pages of popular sites. u Focus on pages where all optimizations used. l Our Study: u Follow on, with a different perspective. u Use real user workloads. »All pages, not just top level pages on popular servers »Not all pages use optimizations u Base page + embedded objects. u Connection optimizations + CDNs + DNS.
August 14, 2002WWCCD ‘024 Overview l Introduction l Methodology l Results l Conclusion
August 14, 2002WWCCD ‘025 Methodology Overview l Use Medusa to: u Record everyday browsing from six users over four days. u Replay traces toggling performance options: »Parallel Connections »Using CDNs »Complete DNS caching »Persistent Connections l Compute download costs for whole pages
August 14, 2002WWCCD ‘026 The Medusa Proxy User Driven BehaviorTrace Driven Behavior
August 14, 2002WWCCD ‘027 Page Download Time l Page download time u Time required to download base page and all embedded objects. u Reflects user-perceived web performance l Calculated using object download time. u Determine object download time from just after DNS lookup to connection close or full object return (persistent). u Incorporate original recorded DNS times where appropriate.
August 14, 2002WWCCD ‘028 Example Individual Object Times: Obj1Obj2Obj3Obj4 Download Time (ms) DNS (ms) Page Download Times: Parallel (2 conns) 854 ms 580 ms 259 ms 580ms Serial
August 14, 2002WWCCD ‘029 Traces l Six users: April (Sat. - Tues.). l Originally 22,228 objects and 1,455 pages. u Remove error pages. l Replay data gathered May 6-7 (Mon - Tues) & June (Sat. – Thurs.). u Minimize warming effects by taking median of 5 consecutive page downloads. UsersRequestsPagesAve Requests per Page
August 14, 2002WWCCD ‘0210 Optimization Combinations l Parallel Connections (1) u Medusa tracks number of concurrent connections used during trace. u Used to replay parallel download. l CDN Usage (2) u When no CDN usage, remove CDN references. »Replace with references to origin servers. u When CDN usage enabled, traces left intact. l DNS Caching (3) u Simulate ideal DNS caching by excluding DNS time. u Normal DNS: add original DNS lookup times from trace. l Persistent Connections (4) u Use whichever protocol (1.0/1.1) recorded in original trace.
August 14, 2002WWCCD ‘0211 Overview l Introduction l Methodology l Results l Conclusion
August 14, 2002WWCCD ‘0212 Whole Page Optimizations l DNS improvement consistent. u 7.4% u 6.7% l Parallel gives large improvement. l CDN improvement small. u 2.5% l Persistent connections not as helpful as expected u 1.5%
August 14, 2002WWCCD ‘0213 Overall Trace Conclusions l Parallelism has the greatest effect. u Parallelism used aggressively on all pages. l All other options provide incremental benefits. u Does not mean other optimizations don’t work. u Some overheads may be relatively small. u Average over all pages. »Not all pages implement all optimizations. »We don’t simulate more aggressive use of options than found in original trace. l A closer look…
August 14, 2002WWCCD ‘0214 Ideal DNS Caching l Average DNS costs: u Per object: 7.1 ms u Per page: 529 ms l DNS improvement moderate across the board. u 5 – 14% improvement across all pages. l Provides moderate benefit to all pages. u Not all objects require full DNS lookups u Already effective DNS caching in traces
August 14, 2002WWCCD ‘0215 Objects Per Page l We would expect some other optimizations to have a greater effect (e.g. persistent connections). u Looking at all pages in trace doesn’t tell the whole story. l Less opportunity for connection optimizations on small pages. u Page with one object counts as much as a page with 152 objects. u Optimizations more effective on a page with 152 objects. l Separate out effects of optimizations in pages with different numbers of objects: u Median number of objects per page is 5. u Average number of objects per page is 15.
August 14, 2002WWCCD ‘0216 Page Breakdown 1-5 objects 1: 21% 2-5: 63% 6+ objects improvements. 6-15: 157% 16+: 183% Persistent 1.95% 18.5%
August 14, 2002WWCCD ‘0217 Page Breakdown Conclusions l Performance optimizations dependent on number of objects per page. u Optimizations more effective when more objects per page. u Especially connection optimizations. l Single object pages see moderate improvement. u Can usually only benefit from DNS caching and CDNs. u Persistent benefit only if on same server as previous page. u And 26% of pages had one object
August 14, 2002WWCCD ‘0218 Persistent Connections l Still don’t see a whole lot of improvement for persistent connections. u Expected to see more benefit for 16+ objects. l Not all pages use persistent connections. u 20% of pages in our trace use them (229 pages). »2211 objects or 16.1%. »9.65 objects per page. l Look at only pages that contain persistent connections.
August 14, 2002WWCCD ‘0219 Persistent Connections l Persistent connections useful if: u Many objects downloaded over persistent connections in the original trace. u Objects downloaded from few servers. l For pages < 6 objects: u 2 out of 3 downloaded with persistent connections. »Average page size 3. u On average, 1.32 persistent objects per server. l For pages >= 16 objects: u Average 18 objects with persistent connections. u On average, 3.92 persistent objects per server.
August 14, 2002WWCCD ‘0220 Mostly Persistent Pages l Pages that can benefit, do: l 6+ objects improve 33-50%. Objects per Page Pages (% persistent pages) MethodMean (ms) Improvement (%) (56%)serial4000 persistent % (42%)serial6180 persistent % Know what it takes to see persistent optimization improvement: Look at large pages where persistent connections used extensively (>50% of objects).
August 14, 2002WWCCD ‘0221 CDN l Previous study showed CDNs highly effective for individual objects. [Koletsou01] u What is effect on whole page performance? l Few pages with explicit Akamai-hosted objects. u 48 pages or 5.2% of pages. u 216 objects or 1.6% of total downloaded objects. u Average of 4.5 CDN objects per page. l Looked at CDN only page improvements: u CDNs improve CDN containing pages 6% - 30%.
August 14, 2002WWCCD ‘0222 Conclusions l Parallel connections have greatest impact. u Universally applicable and easy to implement. l Other options give incremental performance across all pages. u Some optimizations provide consistent, but moderate, improvement across all pages. u Some optimizations are not implemented on all pages. »Provide benefit when used extensively.
August 14, 2002WWCCD ‘0223 Conclusions l Can we draw correlation between object and real- world whole page performance? u Depends. u Not all optimizations widely used. u When optimizations are used to full advantage, they are effective.
August 14, 2002WWCCD ‘0224 Medusa Available
The End
August 14, 2002WWCCD ‘0226 Medusa Proxy Functionality l Trace and Replay u Record requests and replay. »Parallel connections. »Persistent connections. l Transformation u CDN/no CDN replay. l Performance Measurement u Request latency. u DNS overhead. l Optimization options u Use parallel connections. u Use persistent connections. »HTTP 1.0 and HTTP 1.1. »Always attempt, never attempt, mirror trace attempt.
August 14, 2002WWCCD ‘0227 Page Delimitation l Determining pages: u Necessary for: »Calculating total page costs. »Limiting optimizations to within one page. n Parallel Connections. u Can analyze page and draw object dependencies. »High overhead »May impact user l Use inter-object times in the original trace data. u Use 2 second inter-object times.
August 14, 2002WWCCD ‘0228 Akamaized URLs l Akamai accounts for 85%-98% of CDN hosted objects [ref]. l Will not account for sites completely hosted on Akamai hosts. l Filter: u u
August 14, 2002WWCCD ‘0229 Interleaved Requests l Requests may get interleaved when recorded in parallel mode and replayed in serial mode u E.G. »Connection 0 requests: »Connection 1 requests: ar.atwola.com. u Requests may be ordered in trace as: » ar.atwola.com, u Negates benefit of parallel connections.
August 14, 2002WWCCD ‘0230 Page Characterization: Objects per Page
August 14, 2002WWCCD ‘0231 Object Types l Identified object type by clues in URL: u 80% of URLs images (.gif,.jpg). u 5.6% html file (.htm,.html). u 3.8% cgi, perl or javascript (?,.pl,.class). u 3.3% javascript (.js). u 3.6% unidentified (no suffix, pdf, txt, etc).
August 14, 2002WWCCD ‘0232 Persistent Connection/Brower l Persistent connections appear correlated with browser: u IE - 12% pgs, 15.8% objs. u Netscape % pgs,10.0% objs. u Omniweb % pgs, 72.4% objs. u Mozilla 5.0/Gecko % pgs, 91.3% objs.
August 14, 2002WWCCD ‘0233 Persistent Connection Pages l Still not as improved as expected: u Better than for only large pages: »Serial 7.28% vs. 1.98% »Parallel 24.03% vs.18.5% u Medians don’t show improvements in all cases. OptimizationsAverage Improvement (%) Median Improvement (%) Serial7.28%-3.5% Parallel Connections24.03%7.5% Parallel Connections with DNS, CDN12.5%0.6%
August 14, 2002WWCCD ‘0234 Mostly Persistent Pages Objects per Page Pages (% persistent pages) MethodMean (ms) Improvement (%) (56%)serial4000 persistent % parallel1567 persistent/parallel % (42%)serial6180 persistent % parallel2524 persistent/parallel %
August 14, 2002WWCCD ‘0235 Persistent Connections per Page
August 14, 2002WWCCD ‘0236 Same as previous 16+
August 14, 2002WWCCD ‘0237 Ad-Servers l Identified by identifying hosts that were named with the phrases “ads” and “adserver”. u YES: u NO:
August 14, 2002WWCCD ‘0238 Ad-Servers and DNS l Number of pages with ad-servers. »9.5% of pages, 1.53% of total objects. »Average of 2.4 ads per page. l Objects not hosted on content server. u DNS lookup may be large part of lookup cost. l DNS caching doesn’t give great improvement: u DNS caching improves parallel case 10.9%. »Compared with 12.2% over all pages. u DNS caching improves parallel, persistent case 8%. »Compared with 6.3% over all pages. u DNS caching improves parallel, persistent w/ CDN 4.7%. »Compared to 6.3%.