Download presentation
Presentation is loading. Please wait.
Published byCrystal Stephens Modified over 8 years ago
1
1 Web Performance Modeling Issues Daniel A. Menascé Department of Computer Science George Mason University http://www.cs.gmu.edu/faculty/menasce.html Menascé, D. A.. All Rights Reserved.
2
2 Outline E-commerce facts. WWW Traffic Characterization. Improving Web Performance. Predicting Web Performance. An Example. Concluding Remarks. Menascé, D. A.. All Rights Reserved.
3
3 Part I E-commerce Facts Menascé, D. A.. All Rights Reserved.
4
4 Electronic Commerce: online sales are soaring “… IT and electronic commerce can be expected to drive economic growth for many years to come.” The Emerging Digital Economy, US Dept. of Commerce, 1998. Menascé, D. A.. All Rights Reserved.
5
5 Caution Signs Along the Road There will be jolts and delays along the way for electronic commerce: congestion is the most obvious challenge. ( Gross & Sager, Business Week, June 22, 1998, p. 166.) Menascé, D. A.. All Rights Reserved.
6
6 What people are saying about Web performance… “Tripod’s Web site is our business. If it’s not fast and reliable, there goes our business.”, Don Zereski, Tripod’s vice-president of Technology (Internet World) Menascé, D. A.. All Rights Reserved.
7
7 What people are saying about Web performance… “Sites have been concentrating on the right content. Now, more of them -- specially e-commerce sites -- realize that performance is crucial in attracting and retaining online customers.” Gene Shklar, Keynote, The New York Times, 8/8/98 Menascé, D. A.. All Rights Reserved.
8
8 What people are saying about Web performance… “Capacity is King.” Mike Krupit, Vice President of Technology, CDnow, 06/01/98 “Being able to manage hit storms on commerce sites requires more than just buying more plumbing.” Harry Fenik, vice president of technology, Zona Research, LANTimes, 6/22/98 Menascé, D. A.. All Rights Reserved.
9
9 E-commerce facts Businesses will exchange $327 billion in goods and services by the year 2,002. Cisco Systems sells $4 billion/yr on the Web at a cost savings of $363 million. General Electric estimates that e-commerce will save them $500 million over the next three years. Boeing booked $100 million in spare parts in the first seven month of activity of its Web site. Texas Instruments fills 60,000 orders a month through its Web site meeting delivery deadlines 95% of the time. Menascé, D. A.. All Rights Reserved.
10
10 Business in the Internet Age (Business Week, June 22, 1998) Menascé, D. A.. All Rights Reserved.
11
11 Part II WWW Traffic Characteristics Menascé, D. A.. All Rights Reserved.
12
12 WWW Traffic Characteristics Unpredictable in nature. Self-similar, i.e., bursty over several time scales. Load spikes can be many times higher than average traffic. Workload characterization studies done at: client side proxy cache server Web see http://www.parc.xerox.com/istl/projects/http-ng/web- characterization-reading.html Menascé, D. A.. All Rights Reserved.
13
13 Workload Characterization at the Client Side Cunha, Bestavros, and Crovella (1995) Half a million requests from instrumented Mosaic in an academic setting. The distribution of document sizes, popularity of documents as a function of size, distribution of user requests for documents, and number of references to documents as a function of overall rank in popularity can be modeled by power-law distributions. Menascé, D. A.. All Rights Reserved.
14
14 Workload Characterization at the Client Side Cunha, Bestavros, and Crovella (1995) 22% of the requests generated by the browser were cache misses. 96% of the total requests were for html files and only 1% for CGI bin requests. Current studies show that dynamically generated pages ranging from 2 to 6% (Almeida98) Menascé, D. A.. All Rights Reserved.
15
15 Workload Characterization at the Client Side Cunha, Bestavros, and Crovella (1995) 79% of requests were for external servers Less than 10% of requests were for unique URLs, i.e., URLs not previously referenced. 9.6% of accesses were to html files with an average size of 6.4 KB and 69% to images with an average size of 14KB. Menascé, D. A.. All Rights Reserved.
16
16 Workload Characterization at the Client Side Tauscher and Greenberg (1997) Six weeks of WWW usage by 23 users. 58% of pages visited are revisits. Users tend to visit pages just visited more often than pages visited less recently. Menascé, D. A.. All Rights Reserved.
17
17 Workload Characterization at the Proxy Server Abrams, Standrige, Abdulla, Williams, and Fox (1995) Six months of data from 3 educational sites. Trace-driven simulation of a cache proxy server. The maximum cache hit rate was between 30 and 50% for infinite size caches regardless of cache design. Menascé, D. A.. All Rights Reserved.
18
18 Workload Characterization at the Server Arlitt and Williamson (1996) Six WWW servers: academic and commercial. Number of requests ranged from 188K to 3.5M per site. Search for invariants. Menascé, D. A.. All Rights Reserved.
19
19 Workload Characterization at the Server Menascé & Almeida. All Rights Reserved. Arlitt and Williamson (1996) HTML and image files account for 90-100% of requests The average size of a transferred document does not exceed 21KB Less than 3% of the requests are for distinct files. The file size distribution is Pareto with 0.40 < < 0.63. I.e., this distribution is heavy-tailed.
20
20 Workload Characterization at the Server Arlitt and Williamson (1996) Ten percent of the files accessed account for 90% of server requests and 90% of the bytes transferred. File inter-reference times are exponentially distributed and independent. At least 70% of the requests come from remote sites. These requests account for at least 60% of the bytes transferred. Menascé, D. A.. All Rights Reserved.
21
21 Workload Characterization at the Server Crovella and Bestravos (1996) Traces of users using Mosaic reflecting requests to over half a million documents. Purpose: show the presence of self-similarity in Web traffic and explain it through the underlying characteristics of the WWW workload. Menascé, D. A.. All Rights Reserved.
22
22 Workload Characterization at the Server Crovella and Bestravos (1996) File sizes have a heavy-tailed distribution. This distribution may explain the fact that transmission time distributions are also heavy- tailed. Menascé, D. A.. All Rights Reserved.
23
23 Workload Characterization at the Server Almeida and Oliveira (1996) Used fractal models to study the document reference pattern at Web servers. Used an LRU stack model to study references to documents stored in two Web sites. Found strong evidence of self-similarity in the document reference pattern. Menascé, D. A.. All Rights Reserved.
24
24 Web Traffic Workload Characterization Bray (1996) Over 11 million Web pages were analyzed in 1995. The average page size was 6,518 bytes with a standard deviation of 31,678 bytes. About 50% of the pages were found to have at least one embedded image and 15% were found to have exactly one image. Menascé, D. A.. All Rights Reserved.
25
25 Web Traffic Workload Characterization Bray (1996) Over 80% of the sites are pointed by a few (between 1 and 10) other sites. Almost 80% of the sites contain no links to off- site URLs. Around 45% of the files had no extension and 37% were html files. Then.gif and.txt files were the next most popular with 2.5% each. Menascé, D. A.. All Rights Reserved.
26
26 Web Workload Characterization File size and request sizes are heavy tailed. Popularity: –Zipf’s Law: the number of references, P, to a file tends to be inversely proportional to its rank r: P = k/r Temporal locality: –refers to the likelihood that once a document has been requested it will be requested again in the near future. Menascé, D. A.. All Rights Reserved.
27
27 Web Workload Characterization SURGE (Barford and Crovella, ACM Sigmetrics 1998): workload generator that mimics real Web users. SURGE exercises Web servers quite differently from most commonly used benchmarks (i.e., SPECweb96) –maintains a higher number of open connections –results in much higher CPU load Menascé, D. A.. All Rights Reserved.
28
28 Part III Improving Web Performance Menascé, D. A.. All Rights Reserved.
29
29 Improving Web Performance Through Caching and Prefetching Prefetching and caching of inlines. (Dodge and Menascé, 1998) Prefetching Results of Queries to Search Engines. (Foxwell and Menascé, 1998) Menascé, D. A.. All Rights Reserved.
30
30 Improving Web Performance Through Caching and Prefetching Prefetching and caching of inlines. (Dodge and Menascé, 1998) Prefetching Results of Queries to Search Engines. (Foxwell and Menascé, 1998) Menascé, D. A.. All Rights Reserved.
31
31 Browser Server HTTP request inline 1 request inline 2 request HTTP document inline 1 file inline 2 file HTML document parsed by the browser server disk No Caching/Prefetching of Inlines Menascé, D. A.. All Rights Reserved.
32
32 Caching/Prefetching of Inlines Menascé, D. A.. All Rights Reserved.
33
33 Network CPU Web browsers WEB Server Cache Disk h 1 - h Menascé, D. A.. All Rights Reserved.
34
34 Response Time of Inline Files (in sec) vs. Cache Size (KB) Menascé, D. A.. All Rights Reserved.
35
35 Improving Web Performance Through Caching and Prefetching Prefetching and caching of inlines. (Dodge and Menascé, 1998) Prefetching Results of Queries to Search Engines. (Foxwell and Menascé, 1998) Menascé, D. A.. All Rights Reserved.
36
36 Probability of Access for Lycos Queries vs. URL Position Menascé, D. A.. All Rights Reserved.
37
37 Hit Ratio of Query Results Menascé, D. A.. All Rights Reserved.
38
38 Hit Ratio vs. Threshold for Lycos Queries Menascé, D. A.. All Rights Reserved.
39
39 Part IV Predicting Web Performance Menascé, D. A.. All Rights Reserved.
40
40 The Impact of BurstinessBurstiness As shown by some measurements (Banga and Druschel 1997), the maximum throughput of a Web server decreases as burstiness increases.burstiness How can we represent the effects of burstiness in performance models? We know that the maximum throughput is equal to the inverse of the maximum service demand or the service demand of the bottleneck resource. Menascé, D. A.. All Rights Reserved.
41
41 WWW Traffic Burst 10 6 10 7 Bytes Chronological time (slots of 1000 sec) Menascé, D. A.. All Rights Reserved.
42
42 Traffic Burstiness on the Web a: ratio between the maximum observed request rate and the average request rate during an observation period. b: fraction of time during which the instantaneous arrival rate exceeds the average arrival rate. Menascé, D. A.. All Rights Reserved.
43
43 Burstiness Modeling Consider an HTTP LOG composed of L requests to a Web server. : time interval during which the requests arrive : average arrival rate, = L / The time interval is divided into n equal subintervals of duration / n called epochs Arr(k) number of HTTP requests that arrive in epoch k k arrival rate during epoch k Menascé, D. A.. All Rights Reserved.
44
44 Burstiness Modeling Arr + total number of HTTP requests that arrive in epochs in which k > b = (number of epochs for which k > ) / n above-average arrival rate, + = Arr + / (b* ) a = + / = Arr + / (b*L) Menascé, D. A.. All Rights Reserved.
45
45 Burstiness Modeling: an example Example: Consider that 19 requests are logged at a Web server at instants: 1 3 3.5 3.8 6 6.3 6.8 7.0 10 12 12.2 12.3 12.5 12.8 15 20 30 30.2 30.7 What are the burstiness parameters? Menascé, D. A.. All Rights Reserved.
46
46 Burstiness Modeling: an example Let us consider the number of epochs n=21 Each epoch has a duration of / n = 31 /21 = 1.48 The average arrival rate = 19/31 = 0.613 req./sec The number of arrivals in each of the 21 epochs are: 1, 0, 3, 0, 4, 0, 1, 0, 4, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 4 Thus, 1 = 1/1.48 = 0.676, that exceeds the avg. = 0.613 In 8 of the 21 epochs, k exceeds b = 8 / 21 = 0.381 a = Arr + / (b*L) = 19 / (0.381 * 19) = 2.625 Menascé, D. A.. All Rights Reserved.
47
47 The Impact of Burstiness (Menascé and Almeida, 1998) To account for burstiness, we write the service demand of the bottleneck resource as: –D = D f + b –D f is the portion of the service demand that does not depend on burstiness – is a factor used to inflate the service demand according to burstiness factor b. It is given by: – = (U 1 /X 1 0 - U 2 /X 2 0 )/(b 1 -b 2 ) –The measurement interval is divided into 2 subintervals 1 and 2 to obtain U i, X i 0, and b i Menascé, D. A.. All Rights Reserved.
48
48 The Impact of Burstiness: an example Consider the HTTP LOG of the previous slides. During 31 sec in which the 19 requests arrived, the CPU was found to be the bottleneck. What is the burstiness adjustment that should be applied to the CPU service demand to account for the burstiness effect on the performance of the Web server? The number of requests during each 15.5 sec subinterval is 14 and 5, respectively. The measured CPU utilization in each interval was 0.18 and 0.06, respectively. Menascé, D. A.. All Rights Reserved.
49
49 The Impact of Burstiness: an example The throughput in each interval is: –X 1 0 = 14/15.5 = 0.903 –X 2 0 = 5/15.5 = 0.323 Using the previous algorithm: –b 1 = 0.273, b 2 = 0.182 – = (0.18/0.903 - 0.06/0.323)/(0.273-0.182) = 0.149 –the adjustment factor is: × b = 0.149 × 0.381 = 0.057 Assuming Df = 0.02 sec, we are able to calculate the maximum server throughput as a function of the burstiness factor (b). Menascé, D. A.. All Rights Reserved.
50
50 Effects of Burstiness on Performance 0.30.1 0.0 0.2 Menascé, D. A.. All Rights Reserved.
51
51 Part V Predicting Web Performance: An Example Menascé, D. A.. All Rights Reserved.
52
52 Upgrading the Capacity of Your Link to the ISP Menascé, D. A.. All Rights Reserved.
53
53 Using QN models to predict Web Performance Menascé, D. A.. All Rights Reserved.
54
54 Results of QN Model Menascé, D. A.. All Rights Reserved.
55
55 Concluding Remarks The Web is becoming an important element of the IPG. Understanding the nature of the Web workload is crucial to being able to predict its performance. New workload characterization studies for e- commerce sites are required (use of dynamic pages, XML, etc). Need performance models for the Web that capture the effects of Web traffic characteristics on performance. Menascé, D. A.. All Rights Reserved.
56
56 Capacity Planning for Web Performance: metrics, models and methods Prentice Hall, June 1998 Daniel Menascé and Virgilio Almeida Menascé, D. A.. All Rights Reserved.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.