2005/2/23 HUT T Characterizing Web Workload of Mobile Clients Chuang Yu Juha Raitio
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 2 Outline Web workload analyses What Why How Characteristics of workload Wireline Wireless Case study results Statistical characteristics of Web workload Power laws Self-similarity Examples of workload analyses tools Summary
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 3 What? Content Analysis User behavior analysis User load distribution Session duration Temporal stability Spatial locality System load analysis How do users come to visit the web site? Why do users leave the web site? What contents are users interested in? How do users’ interest vary in time? How do users’ interest vary across different geographic region?
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 4 Why? Characteristics of user load have significant implications on Web site design Content management Protocol design Capacity planning Content provider: Enhance user experience through more effective design and content management Service provider: Efficient resource allocation, capacity planning, and pricing System designer: Shed light on performance bottlenecks and effectiveness of protocols
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 5 How? Gathering requirements, what are the goals of the analysis? Planning and design the data collection What data to collect? Over how long period of time? From where? Web proxies, Web browsers and Web servers What is the scope? How large? How many? What methods to use? What analysis needed? How to analyze data? Collecting data Analysis the traces with statistic and mathematics approaches Execute different analysis Content analysis User behavior analysis System load analysis
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 6 Wireline user workload characterization (1) Content analysis Content type Pure text Graphics-rich multimedia Majority mix of both Content size Size of all contents in a web server Size of content that is transferred by a web server Nonnegligible fraction of files are very large Median transfer size ~2kB, Median content size a few hundred bytes larger Content popularity Highly depends on where traces are collected Content Modification Pattern Large variation in modification pattern, lots of contents never modified, some were modified at least once between two consecutive accesses. Content type dependent, e.g. news web site Most file modifications are small Past modification interval, gives a rough prediction about its future modification time
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 7 Wireline user workload characterization (2) User behavior analysis User Request Arrival and duration Occur at three levels: session, click and request User dependent The number of clicks in a session, the number of embedded images in a web page, think time, and active time can be modeled with Pareto distributions with heavy tails. 8 second rule Temporal locality and stability A page is accessed now, what is the likelihood it will be accessed again in the near future? Stronger temporal locality implies caching would be effective Access ranking stability, stability is high on the scale of days Spatial locality Capture how likely people in the same geographic location or at the same organization request similar set of document Effectiveness of proxy caching Organization and domain membership is significant “hot” event dominant the membership
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 8 Wireline user workload characterization (3) System load analysis Load varies with time and recent event, e.g. World Cup, Sept 11…. Self-similar web traffic
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 9 Wireless user workload characterization WAP traffic Access rate is still low, 80,000 entries in 7 months (99) Amount of data is less than voice Metropolitan wireless network Usage behavior shows diurnal and weekly pattern Users do not move frequently WLAN In campus, session-oriented and chat-oriented, incoming traffic exceeds outgoing traffic; high degree roaming within sessions, sessions are short normally Conference, users are evenly distributed across AP;Web and SSH account 64% traffic; short session, 60% less than 10 min; bandwidth distribution is highly uneven across AP Corporate, different user impose different load;
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 10 Case study ”A popular commercial Web site designed for Mobile clients” Provides Web access for wireline, wireless and offline use Provides notification services Analyses Web access Notifications Comparison between Web access and notications use Comparison between wireline and wireless use Motivation To give an general overview the analyses process and data To show some more concrete results To illustrate possibilities of the analyses To propose direct implications of results
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 11 Case study - architecture
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 12 Case study - material Web access logs for 12 days (August 2000) per user per request Notification logs for 6 days per user per notification Types of Web access
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 13 What content was available for wireless use? Case study – Web content
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 14 How retrieved content varied in size? Replies are small: 98% of replies for wireless are less than 3kB 98% of replies for offline are less than 6kB 80% of bytes are carried in replies of size 10kB or more Implications: systems could be optimized for small replies Case study – Web content size
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 15 Case study – Web content popularity How popularity varied across documents? Heavy tailed distribution 0,1-0,5% of documents returned by 90% of the requests Implications: caching could be very effective
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 16 Case study – Web user load distribution How did individual users contribute to the load? Heavy tailed distribution Small group of users generate majority of the load Implications: different pricing for different user groups needed
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 17 Case study – stability of Web access How did interest vary during weekdays? Interests are relatively stable Of top 100 popular request, 80% remain popular during a week Of top 1000, 70% Implications: performance can be optimized over the stable set
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 18 Case study – locality of Web access Did people in the same region issue similar request? Randomly sampled user groups don’t differ from local users Geographic locality in requests is insignificant Implications: geographic distribution of servers/content does not require localization
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 19 What type of content was available as notifications and how popular it was? Case study – notification popularity
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 20 How notification messages varied in size? Notification are small All messages contain less than 256 bytes Implications: if delivery is not optimized, overhead caused by a network protocols may be considerable Case study – notification size
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 21 Case study – notification popularity How popularity varied across notifications? Heavy tailed distribution Top 1% notifications accounted for 60% of messages Implications: multicasting notifications would yield significant savings
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 22 Case study – notification load distribution How did individual users contribute to the notification load? Heavy tailed distribution Top 5% of clients received 25% of notification messages Top 10% received 40% Implications: different pricing for different user groups needed
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 23 Case study – locality of notifications Did people in the same region receive same notifications? Randomly sampled user groups differ from local users Users in same regions share notification content Implications: regional differences may be utilized in planning of geographic distribution of servers/content
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 24 Correlation bwn browsing and notification Limited correlation between client’s notification and browsing usage People use two services for different purposes, two services deliver different type of contents The result is useful to web design and pricing plan Number of users who have overlap between their top N browsing categories and top N notification categories.
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 25 Workload comparison bwn wireline and mobile web Comparison in content Web content is richer then wireless Content size is smaller in wireless, limited display and bandwidth Wireless content shares the Zipf-like popularity distribution as wireline content Comparison in User behavior Both user dependent Both exhibit temporal stability Wireless user does not exhibit strong spatial locality, limited content Comparison in system load Both exhibit a diurnal and weekly variation Wireless server load is smaller than wireline server Web site for mobile clients has more heterogeneous population of users
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 26 Power laws Measure y depends on another measure x in linear dependence of the a th power of x Power law distributions (a.k.a heavy-tail distributions) include e.g. the Zipfian and Pareto distributions Why? Finding suitable distribution for observed data allows for probabilistic inference on the underlaying phenomenom in closed form
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 27 Power laws and the Web Several distributions derived from the topology of the Internet at router and domain level follow a power law Number of documents per Web site or file system Size of documents per Web site or file system Session durations Links between web pages Example (a = -0.46):
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 28 Self-similarity Self-Similar (a.k.a. fractal) data: Maintains its bursty characteristic even when aggregated over wide range of time scales Slowly decaying variance Long range dependence (not memoryless) Underlaying phonomenom Data generators which are either ON or OFF The distribution of ON and OFF times (or message sizes) are heavy tailed Aggregation of these data leads to self-similarity Internet/WWW traffic is self-similar
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 29 Self-similarity and the Web
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 30 WebTraff: A GUI for Web Proxy Cache Workload Modelling and Analysis An extended and improved version of ProWGen (Proxy Workload Generator), including a GUI interface to a useful set of tools for Web traffic modelling and analysis Purpose: To facilitate the easy generation and analysis of controllable and representative workloads for Web caching simulations The WebTraff toolkit provides three main functions: Web workload trace generation Web workload trace analysis Web proxy cache simulation Graphs displayed in PostScript format
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 31 WebTraff GUI Interface
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 32 Web Workload Generation
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 33 Web Workload Analysis Two main categories of analysis functions: Time series analysis (on the left) Web workload analysis (on the right) Radio buttons, slide bars and text boxes available to control plotting characteristics
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 34 Requests per Interval (time series plot)
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 35 Popularity Distribution plot
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 36 Document Size Distribution (zoomed)
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 37 Web Proxy Cache Simulation Application-level caching simulation parameters Cache size Cache replacement policy Five replacement policies currently available Random replacement (RAND) First-In-First-Out (FIFO) Least-Recently-Used (LRU) (default setting) Least-Frequently-Used (LFU) Greedy-Dual-Size (GDS)
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 38 For More Information about WebTraff WebTraff toolkit: “ProWGen: A Synthetic Workload Generation Tool for the Simulation Evaluation of Web Proxy Caches” Busari/Williamson, Computer Networks, Vol 38, No 6, June Contact information:
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 39 Summary Workload characterization is information that usefull for making better decisions on Web site/application design Content management Protocol design Capacity planning Service pricing etc. Workload characterization can be gained through Gathering requirements for the analyses Planning of data acquisition Statistical analyses of the data Mathematical modeling There are tools for workload characterization Power-law and self-similarity characteristics of load make the Web different from good old telephony world Same models and optimization don’t necessarily apply in these two worlds
Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 40 References Adya A, Bahl B, Qiu L. ”Characterizing Web Workload of Mobile Clients” in ”Content Networking in the Mobile Internet”, Ch5. Dixit S, Wu T (eds), 2004 Adya A, Bahl B, Qiu L. ”Characterizing Alert and Browse Services for Mobile Clients”, 2002 Kramer G., ”Self-similar Network Traffic”, 2001 Martin J. Fischer, Thomas B. Fowler. ”Fractals, Heavy-Tails, and the Internet”, 2001 Markatchev N, Williamson C. ” WebTraff: A GUI for Web Proxy Cache Workload Modelling and Analysis”, Department of Computer Science, University of Calgary, 2002