1 CS533 Modeling and Performance Evaluation of Network and Computer Systems Workload Characterization Techniques (Chapter 6)

Slides:

Advertisements

Similar presentations

Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides

Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.

Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.

Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.

Agricultural and Biological Statistics

Introduction to Bioinformatics

6-1 ©2006 Raj Jain Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different.

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.

Copyright 2004 David J. Lilja1 What Do All of These Means Mean? Indices of central tendency Sample mean Median Mode Other means Arithmetic Harmonic Geometric.

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Statistics & Modeling By Yan Gao. Terms of measured data Terms used in describing data –For example: “mean of a dataset” –An objectively measurable quantity.

L16: Micro-array analysis Dimension reduction Unsupervised clustering.

Cluster Analysis (1).

B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.

What is Cluster Analysis?

Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)

Chapter 5 Continuous Random Variables and Probability Distributions

Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.

Evaluating Performance for Data Mining Techniques

Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the.

Hydrologic Statistics

Descriptive Statistics for Spatial Distributions Chapter 3 of the textbook Pages

Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.

Describing distributions with numbers

Data Handling Collecting Data Learning Outcomes  Understand terms: sample, population, discrete, continuous and variable  Understand the need for different.

Summarizing Data Chapter 12.

The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.

Chapter 3 Data Exploration and Dimension Reduction 1.

May 06th, Chapter - 7 INFORMATION PRESENTATION 7.1 Statistical analysis 7.2 Presentation of data 7.3 Averages 7.4 Index numbers 7.5 Dispersion from.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.

1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)

© Copyright McGraw-Hill CHAPTER 3 Data Description.

Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.

Analysis of Simulation Results Chapter 25. Overview  Analysis of Simulation Results  Model Verification Techniques  Model Validation Techniques  Transient.

Chapter 2 Describing Data.

Describing distributions with numbers

Skewness & Kurtosis: Reference

TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.

Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 4 Describing Numerical Data.

ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.

CPSC 531: Data Analysis1 CPSC 531: Output Data Analysis Instructor: Anirban Mahanti Office: ICT Class Location: TRB.

Discrete Probability Distributions Define the terms probability distribution and random variable. 2. Distinguish between discrete and continuous.

Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.

Central Tendency & Dispersion

Basic Review of Statistics By this point in your college career, the BB students should have taken STAT 171 and perhaps DS 303/ ECON 387 (core requirements.

Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.

Central Tendency A statistical measure that serves as a descriptive statistic Determines a single value –summarize or condense a large set of data –accurately.

Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.

Analyzing Expression Data: Clustering and Stats Chapter 16.

Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.

LIS 570 Summarising and presenting data - Univariate analysis.

Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.

FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.

Exploratory data analysis, descriptive measures and sampling or, “How to explore numbers in tables and charts”

Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.

Exploratory Data Analysis

Unsupervised Learning

Different Types of Data

Data Mining: Concepts and Techniques

PCB 3043L - General Ecology Data Analysis.

CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.

DAY 3 Sections 1.2 and 1.3.

POPULATION VS. SAMPLE Population: a collection of ALL outcomes, responses, measurements or counts that are of interest. Sample: a subset of a population.

PCA of Waimea Wave Climate

Advanced Algebra Unit 1 Vocabulary

Unsupervised Learning

Presentation transcript:

1 CS533 Modeling and Performance Evaluation of Network and Computer Systems Workload Characterization Techniques (Chapter 6)

2 Workload Characterization Techniques Want to have repeatable workload so can compare systems under identical conditions Hard to do in real-user environment Instead –Study real-user environment –Observe key characteristics –Develop workload model  Workload Characterization Speed, quality, price. Pick any two. – James M. Wallace

3 Terminology Assume system provides services Workload components – entities that make service requests –Applications: mail, editing, programming.. –Sites: workload at different organizations –User Sessions: complete user sessions from login to logout Workload parameters – used to model or characterize the workload –Ex: instructions, packet sizes, source or destination of packets, page reference pattern, …

4 Choosing Parameters Better to pick parameters that depend upon workload and not upon system –Ex: response time of not good Depends upon system –Ex: size is good Depends upon workload Several characteristics that are of interest –Arrival time, duration, quantity of resources demanded Ex: network packet size –Have significant impact (exclude if little impact) Ex: type of Ethernet card

5 Techniques for Workload Characterization Averaging Specifying dispersion Single-parameter histograms Multi-parameter histograms Principal-component analysis Markov models Clustering

6 Averaging Characterize workload with average –Ex: Average number of network hops Arithmetic mean may be inappropriate –Ex: average hops may be a fraction –Ex: data may be skewed –Specify with median, mode

7 Specifying Dispersion Variance and Standard deviation C.O.V. (what is this?) Min-Max Range percentiles SIQR (what is this?) If C.O.V. is 0, then mean is the same If C.O.V. is high, may want complete histogram (next) –Or divide into sub-components and average those that are similar only –Ex: average small and large packet sizes

8 Case Study (1 of 2) Resource demands for programs at 6 sites Average and C.O.V. DataAverageC.O.V. CPU time2.19 sec40.23 Number of writes Number of reads C.O.V. numbers are high! –Indicates one class for all apps not a good idea

9 Case Study (2 of 2) Instead, divide into several classes Editing Sessions: DataAverageC.O.V. CPU time2.57 sec3.54 Number of writes Number of reads C.O.V. numbers went down, so looks better

10 Techniques for Workload Characterization Averaging Specifying dispersion Single-parameter histograms Multi-parameter histograms Principal-component analysis Markov models Clustering

11 Single-Parameter Histograms Shows relative frequency of parameter values Divide into buckets. Values of buckets can be used to generate workloads Given n buckets, m parameters, k components nmk values –May be too much detail, so only use when variance is high Frequency CPU Time Problem: may ignore correlation. Ex: short jobs have low CPU and I/O, but could pick low CPU and high I/O

12 Multi-Parameter Histograms If correlation, should characterize in multi- parameter histogram –n-dimensional matrix, tough to graph n > 2 –Often even more detailed than single parameter histogram, so rarely used

13 Principal-Component Analysis Goal is to reduce number of factors PCA transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components

14 PCA: Example (1 of 3) Jain, p. 78. Sending and Receiving packets 1) Compute mean and standard deviation 2) Normalize all points (N~(0,1)) –Subtract mean, divide by standard dev 3) Compute correlation (essentially, a line through the data that minimizes distance from line) –Principal component 1, rotate to be x axis 4-7) Next, fit another line that also minimizes distance from line but is orthogonal (not correlated) to first line –Using eigenvalues and eigenvectors –Principal component 2, rotate to be y axis

15 PCA: Example (2 of 3) 8) Compute values of principal components 9) Compute sums of squares, explains the percentage of variation –Ex: Factor 1  / ( ) or 96% –Ex: Factor 2  / ( ) or 4% 10) Plot values

16 PCA: Example PCA (3 of 3) - Use y1 for low, medium and high load - Not much gained in y2

17 Techniques for Workload Characterization Averaging Specifying dispersion Single-parameter histograms Multi-parameter histograms Principal-component analysis Markov models Clustering

18 Markov Models (1 of 2) Sometimes, important not to just have number of each type of request but also order of requests If next request depends upon previous request, then can use Markov model –Actually, more general. If next state depends upon current state Ex: process between CPU, disk, terminal: (Draw diagram Fig 6.4)

19 Markov Models (2 of 2) Can use for application transitions –Ex: users run editors, compilers, linkers  Markov model to characterize probability of type j after type i Can use for page reference locality –Ex: probability of referencing page (or procedure) i after page (or proc.) j But not probability  really refers to order of requests –May be several Markov models that have same relative frequency (Example of this next)

20 Markov Model Example Computer network showed packets large (20%) or small (80%) 1) ssssbssssbssssb2) ssssssssbbssssssssbb 3) Or, generate random number between 0 and 1. If less than.8, small else large –Next packet is not dependent upon current If performance is affected by order, then need to measure to build Markov model

21 Techniques for Workload Characterization Averaging Specifying dispersion Single-parameter histograms Multi-parameter histograms Principal-component analysis Markov models Clustering

22 Clustering (1 of 2) May have large number of components –Cluster such that components within are similar to each other –Then, can study one member to represent component class Ex: 30 jobs with CPU + I/O. Five clusters. Disk I/O CPU Time

23 Clustering (2 of 2) 1.Take sample 2.Select parameters 3.Transform, if necessary 4.Remove outliers 5.Scale observations 6.Select distance metric 7.Perform clustering 8.Interpret 9.Change and repeat Select representative components (Each step, next)

24 Clustering: Sampling Usually too many components to do clustering analysis –That’s why we are doing clustering in the first place! Select small subset –If careful, will show similar behavior to the rest May choose randomly –However, if are interested in a specific aspect, may choose to cluster only those Ex: if interested in a disk, only do clustering analysis on components with high I/O

25 Clustering: Parameter Selection Many components have a large number of parameters (resource demands) –Some important, some not –Remove the ones that do not matter Two key criteria: impact on perf & variance –If have no impact, omit. Ex: Lines of output –If have little variance, omit. Ex: Processes created Method: redo clustering with 1 less parameter. Count fraction that change cluster membership. If not many change, remove parameter.

26 Clustering: Transformation If distribution is skewed, may want to transform the measure of the parameter Ex: one study measured CPU time –Two programs taking 1 and 2 seconds are as different as two programs taking 10 and 20 milliseconds  Take ratio of CPU time and not difference (More in Chapter 15)

27 Clustering Methodology 1.Take sample 2.Select parameters 3.Transform, if necessary 4.Remove outliers 5.Scale observations 6.Select distance metric 7.Perform clustering 8.Interpret 9.Change and repeat Select representative components

28 Clustering: Outliers Data points with extreme parameter values Can significantly affect max or min (or mean or variance) For normalization (scaling, next) their inclusion/exclusion may significantly affect outcome Only exclude if do not consume significant portion of resources –Ex: extremely high RTT flows, exclude –Ex: extremely long (heavy tail) flow, include

29 Clustering: Data Scaling (1 of 3) Final results depend upon relative ranges –Typically scale so relative ranges equal –Different ways of doing this Normalize to Zero Mean and Unit Variance –Mean x k, stddev s k of the k th parameter Do this for each of the k parameters

30 Clustering: Data Scaling (2 of 3) Weights –Assign based on relative importance Range Normalization –Change from [x min,k,x max,k ] to [0,1] Ex: x i1 {1, 6, 5, 11} 1  0, 11  1, 6 .5, 4 .4 But sensitive to outliers (say 11 above was 101)

31 Clustering: Data Scaling (3 of 3) Percentile Normalization –Scale so 95% of values between 0 and 1 Less sensitive to outliers

32 Clustering Methodology 1.Take sample 2.Select parameters 3.Transform, if necessary 4.Remove outliers 5.Scale observations 6.Select distance metric 7.Perform clustering 8.Interpret 9.Change and repeat Select representative components

33 Clustering: Distance Metric (1 of 2) Map each component to n-dimensional space and see which are close to each other Euclidean Distance between two components –{x i1, x i2, … x in } and {x j1, x j2, …, x jn } Weighted Euclidean Distance Assign weights a k for n parameters Used if values not scaled or if significantly different in importance

34 Clustering: Distance Metric (2 of 2) Chi-Square Distance –Used in distribution fitting –Need to use normalized or the relative sizes influence chi-square distance measure Overall, Euclidean Distance is most commonly used

35 Clustering Methodology 1.Take sample 2.Select parameters 3.Transform, if necessary 4.Remove outliers 5.Scale observations 6.Select distance metric 7.Perform clustering 8.Interpret 9.Change and repeat Select representative components

36 Clustering: Clustering Techniques Partition into groups s.t. members are as similar as possible and other groups as dissimilar as possible –Minimize intra-group variance or –Maximize inter-group variance Two classes: –Non-Hierarchical – start with k clusters, move components around until intra-group variance is minimized –Hierarchical Start with 1 cluster, divide until k Start with n clusters, combine until k –Ex: minimum spanning tree (Show this one next)

37 Clustering Techniques: Minimum Spanning Tree (Example next)

38 Minimum Spanning Tree Example (1 of 5) Workload with 5 components (programs), 2 parameters (CPU/IO). –Measure CPU and I/O for each 5 programs

39 Minimum Spanning Tree Example (2 of 5) Step 1) Consider 5 cluster with i th cluster having only i th program Step 2) The centroids are {2,4}, {3,5}, {1,6}, {4,3} and {5,2} Disk I/O CPU Time c a b d e

40 Minimum Spanning Tree Example (3 of 5) Step 3) Euclidean distance: Disk I/O CPU Time c a b d e Step 4) Minimum  merge

41 Minimum Spanning Tree Example (4 of 5) The centroid of AB is {(2+3)/2, (4+5)/2} = {2.5, 4.5}. DE = {4.5, 2.5} Disk I/O CPU Time c a b d e x x Minimum  merge

42 Minimum Spanning Tree Example (5 of 5) Centroid ABC {(2+3+1)/3, (4+5+6)/3} –= {2,5} Minimum  Merge  Stop

43 Representing Clustering Spanning tree called a dendrogram –Each branch is cluster, height where merges abcde Can obtain clusters for any allowable distance Ex: at 3, get abc and de

44 Interpreting Clusters Clusters will small populations may be discarded –If use few resources –If cluster with 1 component uses 50% of resources, cannot discard! Name clusters, often by resource demands –Ex: “CPU bound” or “I/O bound” Select 1+ components from each cluster as a test workload –Can make number selected proportional to cluster size, total resource demands or other