1 10 Web Workload Characterization Web Protocols and Practice.

1 10 Web Workload Characterization Web Protocols and Practice

2 Topics Web Protocols and Practice WEB WORKLOAD CHARACTERIZATION  Web Workload Definition  Workload Characterization  Statistics and Probability Distributions  HTTP Message Characteristics  Web Resource Characteristics  User Behavior Characteristics  Applying Workload Models

3 Web Workload Definition Web Protocols and Practice  Important performance metrics, such as user- perceived latency and server throughput, depend on the interaction of numerous protocols and software components.  A workload consists of the set of all inputs a system receives over a period of time.  Web workload models are used to generate request traffic for comparing the performance of different proxy and server implementation. WEB WORKLOAD CHARACTERIZATION

4 Web Workload Definition Web Protocols and Practice  Developing a workload model involves three main steps:  Identifying the important workload parameters  Analyzing measurement data to qualify these parameters  Validating the model against reality  Constructing a workload model requires an understanding of statistical techniques for analyzing measurement data and representing the key properties of Web traffic. WEB WORKLOAD CHARACTERIZATION

5 Web Workload Definition Web Protocols and Practice  Key properties of Web workloads are:  HTTP message characteristics  Resource characteristics  User behavior WEB WORKLOAD CHARACTERIZATION

6 Workload Characterization Web Protocols and Practice  A workload model consists of a collection of parameters that represent the key features of the workload that affect the resource allocation and system performance.  Workload model can be applied to a variety of performance evaluation tasks, such as the following:  Identifying performance problems  Benchmarking Web components  Capacity planning WEB WORKLOAD CHARACTERIZATION

7 Workload Characterization Web Protocols and Practice  Workload models have several approaches:  Trace-driven workload »Constructs requests directly from an existing log or trace »Reproduces a known workload »Avoids the intermediate step of analyzing the traffic »Not provide flexibility for experimenting with changes to the workload »No clear separation between the load and performance WEB WORKLOAD CHARACTERIZATION

8 Workload Characterization Web Protocols and Practice  Stress testing »Sends requests as fast as possible to evaluate a proxy or a server under heavy load »May not present the realistic traffic patterns WEB WORKLOAD CHARACTERIZATION

9 Workload Characterization Web Protocols and Practice  Synthetic Workload »derives from an explicit mathematical model that can be inspected, analyzed, and criticized »Represents the key properties of real Web traffic »Explores system performance in a controlled manner by changing the parameters associated with each probability distribution WEB WORKLOAD CHARACTERIZATION

10 Workload Characterization Web Protocols and Practice  To ensure that a workload model is representative of real workloads, the parameters of the model should have certain properties:  Decoupling from underlying system  Proper level of detail  Independence from other parameters (Table 10.1) WEB WORKLOAD CHARACTERIZATION

11 Table 10.1. Examples of Web workload parameters Web Protocols and Practice WEB WORKLOAD CHARACTERIZATION ParameterCategory Request method Response code Protocol Content type Resource size Response size Popularity Modification frequency Temporal locality Number of embedded resources Resource Session interarrival times Number of clicks per session Request interarrival times Users

12 Statistics and Probability Distributions Web Protocols and Practice  Statistics such as the mean, median, and variance capture the basic properties of many workload parameters.  Mean shows the average value of the parameters.  Median shows the middle value of parameters. Half of the values are smaller than the median and the other half are larger than the median.  Variance or standard deviation attempt to quantify how much the parameters varies from the average value. WEB WORKLOAD CHARACTERIZATION

13 Statistics and Probability Distributions Web Protocols and Practice  For a sequence of 4100, 4700, 4200, 20,000, 4000 bytes  mean size = 7400 bytes  median size = 4200 bytes  For a sequence of 4100, 4700, 4200, 4800, 4000 bytes  mean size = 4360 bytes  median size = 4200 bytes WEB WORKLOAD CHARACTERIZATION

14 Statistics and Probability Distributions Web Protocols and Practice  Probability distributions capture how a parameter varies over a wide range of values. WEB WORKLOAD CHARACTERIZATION

15 Statistics and Probability Distributions Web Protocols and Practice  For a sequence of 4100, 4700, 4200, 20,000, 4000 bytes F(x) = P(X <= x) WEB WORKLOAD CHARACTERIZATION Example of cumulative distribution Function (CDF)

16 Statistics and Probability Distributions Web Protocols and Practice  For a sequence of 4100, 4700, 4200, 20,000, 4000 bytes Fc (x) = P(X > x) = 1 − F(x) WEB WORKLOAD CHARACTERIZATION Figure 10.1. Example of complementary cumulative distribution Function (CCDF)

17 Statistics and Probability Distributions Web Protocols and Practice  Several probability distributions have been widely applied to workload characterization.  One of the most popular probability distributions is the exponential distribution with the form mean WEB WORKLOAD CHARACTERIZATION

18 Statistics and Probability Distributions Web Protocols and Practice  Relating a measured distribution to an equation requires justifying the hypothesis that the equation is capable of accurately representing the measured data.  Justifying this hypothesis consists of two key steps:  The measured data is fitted with the equation to determine the value of each variable.  Statistical tests are performed to compare the resulting equation with the measured equation. WEB WORKLOAD CHARACTERIZATION

19 Statistics and Probability Distributions Web Protocols and Practice  In some cases, no single well-known distribution matches the measured data.  It may be necessary to represent different parts of the measured distribution with different equations. WEB WORKLOAD CHARACTERIZATION

20 HTTP Message Characteristics Web Protocols and Practice  HTTP Request Methods  HTTP Response Codes WEB WORKLOAD CHARACTERIZATION

21 HTTP Request Methods Web Protocols and Practice  Knowing which request methods arise in practice is useful for optimizing server implementation and developing realistic benchmarks for evaluating Web proxies and servers.  Traffic characteristics:  The overwhelming majority or Web requests use the GET method to fetch resources and invoke scripts.  Small fraction of HTTP requests use the POST method to submit data in forms. WEB WORKLOAD CHARACTERIZATION

22 HTTP Request Methods Web Protocols and Practice  Measurements show a small number of HEAD requests to test an operational Web server.  Web Distributed Authoring and Versioning (WEBDAV) use PUT and DELETE methods frequently.  The emergence of tools for testing and debugging Web components may increase the use of the TRACE method.  The exact distribution of request methods varies from site to site. WEB WORKLOAD CHARACTERIZATION

23 HTTP Response Codes Web Protocols and Practice  Knowing how servers respond to client requests is an important part of constructing a realistic model of Web workloads.  Traffic characteristics:  200 OK: for 75% to 90% of responses  304 Not Modified: for 10% to 30% of responses  The other redirection(3xx) codes and the client error(4xx) codes are the most common  206 Partial Content: may become more common when the server returns a range of bytes from the requested resource WEB WORKLOAD CHARACTERIZATION

24 HTTP Response Codes Web Protocols and Practice  302 Found: is used for redirection responses and varies from site to site WEB WORKLOAD CHARACTERIZATION

25 Web Resource Characteristics Web Protocols and Practice  Content Type  Resource Size  Response Size  Resource Popularity  Modification Frequency (Resource Changes)  Temporal Locality  Number of Embedded Resources WEB WORKLOAD CHARACTERIZATION

26 Web Resource Characteristics Web Protocols and Practice  Understanding the characteristics of Web resources is an important part of modeling Web workload.  Resources are vary in terms of:  How big they are  How popular they are  How often they change  Characteristics of Web resources are:  Content type  Resource size WEB WORKLOAD CHARACTERIZATION

27 Web Resource Characteristics Web Protocols and Practice  Response size  Resource popularity  Modification frequency (Resource changes)  Temporal locality  Number of embedded resources WEB WORKLOAD CHARACTERIZATION

28 Content type Web Protocols and Practice  Content type has a direct relationship to other key workload parameters, such as resource size and modification frequency.  Traffic characteristics:  Overwhelming majority of resources are text content (plain and HTML) and images (jpeg and gif)  The remaining content types include documents such as postscript and PDF, software such as JavaScript of Java applets, and audio and video data. WEB WORKLOAD CHARACTERIZATION

29 Content type Web Protocols and Practice  The emergence of new application can have an influence on the distribution of content types. WEB WORKLOAD CHARACTERIZATION

30 Resource Sizes Web Protocols and Practice  The sizes of Web resources affect:  The storage requirements at the origin server  The overhead of caching resources at browsers and proxies  The load on the network  The latency in delivering the response message  Traffic characteristics:  The average resource size is relatively small »Average size of an HTML: 4 to 8 KB »Median size of an HTML: 2 KB »Average size of an image: 14 KB WEB WORKLOAD CHARACTERIZATION

31 Resource Sizes Web Protocols and Practice  Knowing the distribution of resource sizes at Web sites is useful for deciding how to allocate memory or disk space at a server or proxy.  The high variability in resource size is captured by the Pareto distribution mean α is a shape parameter k is a scale parameter WEB WORKLOAD CHARACTERIZATION

32 Statistics and Probability Distributions Web Protocols and Practice WEB WORKLOAD CHARACTERIZATION Figure 10.2. Exponential and Pareto distributions (with mean of 1)

33 Statistics and Probability Distributions Web Protocols and Practice WEB WORKLOAD CHARACTERIZATION Figure 10.3. Exponential and Pareto distributions on a logarithmic scale

34 Statistics and Probability Distributions Web Protocols and Practice WEB WORKLOAD CHARACTERIZATION Figure 10.4. Lognormal distribution

35 Response Sizes Web Protocols and Practice  In analyzing the server and network performance, the size of response messages is a more important factor.  Traffic characteristics:  Response sizes may differ from resource sizes for a variety of reasons: »Some HTTP response messages do not have a message body. »Some Web resources are never requested and do not contribute to the set of response messages. »Some responses are aborted before they complete, resulting in shorter transfers. WEB WORKLOAD CHARACTERIZATION

36 Response Sizes Web Protocols and Practice  The median of the response size distribution is several hundred bytes smaller than the median resource size.  Response sizes can be represented by a combination of the lognormal and Pareto distributions.  Response size distribution has a heavy tail.  Some factors suggest that the distribution of response sizes is not the same as the distribution of resource sizes. WEB WORKLOAD CHARACTERIZATION

37 Resource Popularity Web Protocols and Practice  The popularity of the various resources at a Web site has important performance implications.  The most popular resources are likely to reside in main memory at the origin server, obviating the need to fetch the data from disk.  Traffic characteristics:  Popularity is measured in terms of the proportion of requests that access a particular resource  The probability mass function (pmf) P(r) captures the proportion of requests directed to each resources. WEB WORKLOAD CHARACTERIZATION

38 Resource Popularity Web Protocols and Practice  The proportion of requests for a resource follows Zipf’s Law: r is the rank of an object k is a constant that ensures that P(r) sums to 1. (Figure 10.5) WEB WORKLOAD CHARACTERIZATION

39 Resource Popularity Web Protocols and Practice WEB WORKLOAD CHARACTERIZATION Figure 10.5. Zipf’s law

40 Resource Popularity Web Protocols and Practice  more generally, a Zipf-like distribution has the form for some constant c. »The extreme case of c= 0 corresponds to all resources having equal popularity. »Early studies of requests to Web servers found c values close to 1. »More recent studies show values for c in the range of 0.75 to 0.95 WEB WORKLOAD CHARACTERIZATION

41 Resource Changes Web Protocols and Practice  Web resources change over time as a result of modifications at the origin server.  Modifications to resources affect the performance of Web caching.  Resources that change less often may be given preference in caching or revalidated with the origin server less frequently.  Traffic characteristics:  Images do not change very often  Text and HTML files change more often than images WEB WORKLOAD CHARACTERIZATION

42 Resource Changes Web Protocols and Practice  Some resources change in a periodic fashion:  News stories  The Expires header could indicate the next time that a cached resource would change.  Accurate timing information in the HTTP response message can reduce the load on the origin server as well as the user-perceived latency for accessing the resource.  An accurate model of Web workloads need to consider the frequency of resource changes. WEB WORKLOAD CHARACTERIZATION

43 Temporal Locality Web Protocols and Practice  The time between successive requests for the same resource has a significant affect on Web traffic.  Resource popularity indicates the frequency of requests without indicating the spacing between the requests.  Temporal locality captures the likelihood that a requested resource will be requested again in the near future. WEB WORKLOAD CHARACTERIZATION

44 Temporal Locality Web Protocols and Practice  Testing a server with a benchmark that has low temporal locality would underestimate the potential throughput.  High temporal locality also increases the likelihood that a request is satisfied by a browser or proxy cache and reduces the likelihood that a resource has changed since the previous access. WEB WORKLOAD CHARACTERIZATION

45 Temporal Locality Web Protocols and Practice  Traffic characteristics:  Temporal locality can be measured by sequencing through the stream of requests, putting each request at the top of a stack, and noting the position in the stack- the stack distance - of the previous access to each resource.  The small stack distance suggests high temporal locality.  The stack distances for requests for a resource follow a lognormal distribution. WEB WORKLOAD CHARACTERIZATION

46 Number of Embedded Resources Web Protocols and Practice  Embedded resources include images, JavaScript programs, and other HTML files that appear as frames in the containing Web page.  The number of embedded references in a Web page has significant impact on the server and network load.  Traffic characteristics:  Web pages have a median of 8 to 20 embedded resources.  The distribution has high variability, following the Pareto distribution. WEB WORKLOAD CHARACTERIZATION

47 Number of Embedded Resources Web Protocols and Practice  The number of embedded images has tended to increase over time as more users have high- bandwidth connection to the Internet.  A large number of embedded resources does not necessarily translate into a large number of requests to the Web server:  A cached copy of embedded resource may be available.  Some embedded images do not reside at the same Web server as the containing Web page. WEB WORKLOAD CHARACTERIZATION

48 User Behavior Characteristics Web Protocols and Practice  Web workload characteristics depend on the behavior of users as they download Web pages from various sites.  The workload introduced by a single user can be modeled at three levels:  Session »The series of requests by a single user to a single Web site could be viewed as a logical session.  Click »A user performs one or more clicks to request Web pages. WEB WORKLOAD CHARACTERIZATION

49 User Behavior Characteristics Web Protocols and Practice  Request »Each click triggers the browser to issue an HTTP request for a resource.  Each session arrival brings a new user to the site.  The client may establish a new TCP connection for a request or send a request on an existing TCP connection.  Session arrivals can be studied by considering the time between the start of one user session and the start of the next user session. WEB WORKLOAD CHARACTERIZATION

50 User Behavior Characteristics Web Protocols and Practice  The session arrival times follow an exponential distribution.  Exponential interarrival times correspond to a Poisson process, when users arrive independently of one another.  The exponential distribution is not an accurate model of interarrival times of TCP connections and HTTP requests. WEB WORKLOAD CHARACTERIZATION

51 User Behavior Characteristics Web Protocols and Practice  A workload model that assumes that HTTP requests arrive as a Poisson process would underestimate the possibility of the heavy-load periods and would overestimate the potential performance of the Web server.  The number of clicks associated with user sessions has considerable influence on the load on a server. WEB WORKLOAD CHARACTERIZATION

52 User Behavior Characteristics Web Protocols and Practice  The number of clicks follows a Pareto distribution, suggesting that some sessions involve a much larger number of clicks than others.  The time between successive requests (request interarrival time) by each user has important implications on the server and network load.  The time between the downloading of one page and its embedded images and the user’s next click is referred to as think time or quiet time. WEB WORKLOAD CHARACTERIZATION

53 User Behavior Characteristics Web Protocols and Practice  The characteristics of user think times influence the effectiveness of policies for closing persistent connections.  Most interrequest times are less than 60 seconds.  The think times follow a Pareto distribution with a heavy tail, with a around 1.5.  Heavy-tailed distributions apply to numerous properties of Web traffic:  Resource sizes WEB WORKLOAD CHARACTERIZATION

54 User Behavior Characteristics Web Protocols and Practice  Response sizes  The number of embedded references in a Web page  The number of click per session  The time between successive clicks  A Web session can be modeled as a sequence of on/off periods, in which each on period corresponds to downloading a Web page and its embedded images and each off period corresponds to the user’s think time. WEB WORKLOAD CHARACTERIZATION

55 User Behavior Characteristics Web Protocols and Practice  The duration of on/off periods both follow a heavy-tailed distribution.  The load on Web servers and the network exhibits a phenomenon known as self similarity, in which the traffic varies dramatically on a variety of time scales from microseconds to several minutes. WEB WORKLOAD CHARACTERIZATION

56 Applying Workload Models Web Protocols and Practice  A deeper understanding of Web workload characteristics can drive the creation of a workload model for evaluating Web protocols and software components.  Generating synthetic traffic involves sampling the probability distribution associated with each workload parameter. (Table 10.2) WEB WORKLOAD CHARACTERIZATION

57 Table 10.2. Probability distributions in Web workload models Web Protocols and Practice WEB WORKLOAD CHARACTERIZATION Workload parameterDistribution Session interarrival timesExponential Response sizes (tail of distribution) Resource sizes (tail of distribution) Number of embedded images Request interarrival times Pareto Response sizes (body of distribution) Resource sizes (body of distribution) Temporal locality Lognormal Resource popularityZipf-like

58 Applying Workload Models Web Protocols and Practice  Generating synthetic traffic that accurately represents a real workload is very challenging.  Validation of the synthetic workload model is an important step in constructing and using a workload model.  Validation is different from verification:  Verification involves testing that the synthetic traffic has the statistical properties embodied in the workload model. WEB WORKLOAD CHARACTERIZATION

59 Applying Workload Models Web Protocols and Practice  Validation requires demonstrating that the performance of a system subjected to the synthetic workload matches the performance of the same system under a real workload, according to some predefined performance metrics.  Synthetic workload models are also used to test servers over a range of scenarios that might not have happened in practice.  Generating synthetic traffic provides an opportunity to evaluate a proxy or server in a controlled manner. WEB WORKLOAD CHARACTERIZATION

60 Applying Workload Models Web Protocols and Practice  Web performance depends on the interaction between user behavior, resource characteristics, server load, and network dynamics.  Synthetic workloads help address the need to evaluate and compare Web software components in a controlled manner. WEB WORKLOAD CHARACTERIZATION

1 10 Web Workload Characterization Web Protocols and Practice.

Similar presentations

Presentation on theme: "1 10 Web Workload Characterization Web Protocols and Practice."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 10 Web Workload Characterization Web Protocols and Practice.

Similar presentations

Presentation on theme: "1 10 Web Workload Characterization Web Protocols and Practice."— Presentation transcript:

Similar presentations

About project

Feedback