Lecture 2 Page 1 CS 239, Spring 2007 Metrics and Review of Basic Statistics CS 239 Experimental Methodologies for System Software Peter Reiher April 5,

Slides:

Advertisements

Similar presentations

CS433: Modeling and Simulation

Advertisements

Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.

1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.

Copyright 2004 David J. Lilja1 What Do All of These Means Mean? Indices of central tendency Sample mean Median Mode Other means Arithmetic Harmonic Geometric.

QUANTITATIVE DATA ANALYSIS

Statistics & Modeling By Yan Gao. Terms of measured data Terms used in describing data –For example: “mean of a dataset” –An objectively measurable quantity.

Discrete Event Simulation How to generate RV according to a specified distribution? geometric Poisson etc. Example of a DEVS: repair problem.

OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.

Summarizing Measured Data Part I Visualization (Chap 10) Part II Data Summary (Chap 12)

Performance Evaluation

Probability and Statistics Review

CHAPTER 6 Statistical Analysis of Experimental Data

3-1 Introduction Experiment Random Random experiment.

Summarizing Measured Data

 The Law of Large Numbers – Read the preface to Chapter 7 on page 388 and be prepared to summarize the Law of Large Numbers.

Lecture 6: Descriptive Statistics: Probability, Distribution, Univariate Data.

Lecture II-2: Probability Review

Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.

Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.

Summarizing Measured Data

Measures of Central Tendency

Today: Central Tendency & Dispersion

Hydrologic Statistics

AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.

Objective To understand measures of central tendency and use them to analyze data.

Summarizing Data Chapter 12.

Statistics and Research methods Wiskunde voor HMI Betsy van Dijk.

ICOM 6115: COMPUTER SYSTEMS PERFORMANCE MEASUREMENT AND EVALUATION Nayda G. Santiago August 18, 2006.

Random Sampling, Point Estimation and Maximum Likelihood.

Summarizing Measured Data Andy Wang CIS Computer Systems Performance Analysis.

Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.

Worked examples and exercises are in the text STROUD (Prog. 28 in 7 th Ed) PROGRAMME 27 STATISTICS.

© 1998, Geoff Kuenning Introduction to Statistics Concentration on applied statistics Statistics appropriate for measurement Today’s lecture will cover.

Selecting Evaluation Techniques Andy Wang CIS 5930 Computer Systems Performance Analysis.

Performance Evaluation of Computer Systems Introduction

1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.

Test Loads Andy Wang CIS Computer Systems Performance Analysis.

1 Statistical Distribution Fitting Dr. Jason Merrick.

Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.

Central Tendency Introduction to Statistics Chapter 3 Sep 1, 2009 Class #3.

Chapter 12.  When dealing with measurement or simulation, a careful experiment design and data analysis are essential for reducing costs and drawing.

LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.

Lecture 2: Combinatorial Modeling CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at.

AP STATS: Take 10 minutes or so to complete your 7.1C quiz.

CPE 619 Summarizing Measured Data Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.

Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.

CS433 Modeling and Simulation Lecture 03 – Part 01 Probability Review 1 Dr. Anis Koubâa Al-Imam Mohammad Ibn Saud University

Relative Values. Statistical Terms n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the data  not sensitive to.

Central Tendency & Dispersion

Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.

Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.

Sampling and estimation Petter Mostad

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.

Chapter 20 Statistical Considerations Lecture Slides The McGraw-Hill Companies © 2012.

Lecture 3 Page 1 CS 239, Spring 2007 Variability in Data CS 239 Experimental Methodologies for System Software Peter Reiher April 10, 2007.

Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.

CHAPTER – 1 UNCERTAINTIES IN MEASUREMENTS. 1.3 PARENT AND SAMPLE DISTRIBUTIONS  If we make a measurement x i in of a quantity x, we expect our observation.

Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.

Test Loads Andy Wang CIS Computer Systems Performance Analysis.

AP Statistics From Randomness to Probability Chapter 14.

Confidence Intervals Cont.

MECH 373 Instrumentation and Measurements

OPERATING SYSTEMS CS 3502 Fall 2017

Random variables (r.v.) Random variable

Selecting Evaluation Techniques

CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.

M248: Analyzing data Block A UNIT A3 Modeling Variation.

CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.

Presentation transcript:

Lecture 2 Page 1 CS 239, Spring 2007 Metrics and Review of Basic Statistics CS 239 Experimental Methodologies for System Software Peter Reiher April 5, 2007

Lecture 2 Page 2 CS 239, Spring 2007 Notes for next time Lecture is about 15 minutes short –Maybe add back some of the cut material on SMART Add slides showing mean, medien, mode for ping experiment results

Lecture 2 Page 3 CS 239, Spring 2007 Introduction Metrics Why are we talking about statistics? Important statistics concepts Indices of central tendency

Lecture 2 Page 4 CS 239, Spring 2007 Metrics A metric is a measurable quantity For our purposes, one whose value describes an important phenomenon Most of performance evaluation is about properly gathering metrics

Lecture 2 Page 5 CS 239, Spring 2007 Common Types of Metrics Duration/ response time –How long did the simulation run? Processing rate –How many transactions per second? Resource consumption –How much disk is currently used? Error rates –How often did the system crash? What metrics can we use to describe security?

Lecture 2 Page 6 CS 239, Spring 2007 Examples of Response Time Time from keystroke to echo on screen End-to-end packet delay in networks OS bootstrap time Leaving UCLA to getting on 405

Lecture 2 Page 7 CS 239, Spring 2007 Some Measures of Response Time Response time: request-response interval –Measured from end of request –Ambiguous: beginning or end of response? Reaction time: end of request to start of processing Turnaround time: start of request to end of response

Lecture 2 Page 8 CS 239, Spring 2007 Processing Rate How much work is done per unit time? Important for: –Provisioning systems –Comparing alternative configurations –Multimedia

Lecture 2 Page 9 CS 239, Spring 2007 Examples of Processing Rate Bank transactions per hour Packets routed per second Web pages crawled per night

Lecture 2 Page 10 CS 239, Spring 2007 Common Measures of Processing Rate Throughput: requests per unit time: MIPS, MFLOPS, Mb/s, TPS Nominal capacity: theoretical maximum: bandwidth Knee capacity: where things go bad Usable capacity: where response time hits a specified limit Efficiency: ratio of usable to nominal cap.

Lecture 2 Page 11 CS 239, Spring 2007 Nominal, Knee, and Usable Capacities Response-Time Limit Knee Usable Capacity Knee Cap. Nominal Capacity Load Delay

Lecture 2 Page 12 CS 239, Spring 2007 Resource Consumption How much does the work cost? Used in: –Capacity planning –Identifying bottlenecks Also helps to identify “next” bottleneck

Lecture 2 Page 13 CS 239, Spring 2007 Examples of Resource Consumption CPU non-idle time Memory use Fraction of network bandwidth needed How much of your salary is paid for rent

Lecture 2 Page 14 CS 239, Spring 2007 Measures of Resource Consumption Utilization: where u(t) is instantaneous resource usage –Useful for memory, disk, etc. If u(t) is always either 1 or 0, reduces to busy time or its inverse, idle time –Useful for network, CPU, etc.

Lecture 2 Page 15 CS 239, Spring 2007 Error Metrics Failure rates Probability of failures Time to failure

Lecture 2 Page 16 CS 239, Spring 2007 Examples of Error Metrics Percentage of dropped Internet packets ATM down time Lifetime of a component Wrong answers from IRS tax preparation hotline

Lecture 2 Page 17 CS 239, Spring 2007 Measures of Errors Reliability: P(error) or Mean Time Between Errors (MTBE) Availability: –Downtime: Time when system is unavailable, may be measured as Mean Time to Repair (MTTR) –Uptime: Inverse of downtime, often given as Mean Time Between Failures (MTBF/MTTF)

Lecture 2 Page 18 CS 239, Spring 2007 Security Metrics A difficult problem Often no good metrics to express security goals and achievements –Equally bad, some definable metrics are impossible to measure Some failure metrics are applicable –Expected time to break a cipher

Lecture 2 Page 19 CS 239, Spring 2007 Choosing What to Measure Core question in any performance study Pick metrics based on: –Completeness –(Non-)redundancy –Variability –Feasibility

Lecture 2 Page 20 CS 239, Spring 2007 Completeness Must cover everything relevant to problem –Don’t want awkward questions at conferences! Difficult to guess everything a priori –Often have to add things later

Lecture 2 Page 21 CS 239, Spring 2007 Redundancy Some factors are functions of others Measurements are expensive Look for minimal set Again, often an interactive process

Lecture 2 Page 22 CS 239, Spring 2007 Variability Large variance in a measurement makes decisions impossible Repeated experiments can reduce variance –Very expensive –Can only reduce it by a certain amount Better to choose low-variance measures to start with

Lecture 2 Page 23 CS 239, Spring 2007 Feasibility Some things are easy to measure Others are hard A few are impossible Choose metrics you can actually measure But beware of the “drunk under the streetlamp” phenomenon

Lecture 2 Page 24 CS 239, Spring 2007 Variability and Performance Measurements Performance of a system is often complex –Perhaps not fully explainable One result is variability in most metric readings Good performance measurement takes this into account

Lecture 2 Page 25 CS 239, Spring 2007 An Example 10 pings from UCLA to MIT Tuesday night Each took a different amount of time (expressed in msec): How do we understand what this says about how long a packet takes to get from LA to Boston?

Lecture 2 Page 26 CS 239, Spring 2007 How to Get a Handle on Variability? If something we’re trying to measure varies from run to run, how do we express its behavior? That’s what statistics is all about Which is why a good performance analyst needs to understand them

Lecture 2 Page 27 CS 239, Spring 2007 Some Basic Statistics Concepts Independence of events Random variables Cumulative distribution functions (CDFs)

Lecture 2 Page 28 CS 239, Spring 2007 Independent Events Events are independent if: –Occurrence of one event doesn’t affect probability of other Examples: –Coin flips –Inputs from separate users –“Unrelated” traffic accidents

Lecture 2 Page 29 CS 239, Spring 2007 Non-Independent Events Not all events are independent Second person accessing a web page might get it faster than the first –Or than someone asking for it the next day Kids requesting money from their parents –Sooner or later the wallet is empty

Lecture 2 Page 30 CS 239, Spring 2007 Random Variables Variable that takes values probabilistically –Not necessarily just any value, though Variable usually denoted by capital letters, particular values by lowercase Examples: –Number shown on dice –Network delay –CS239 attendance

Lecture 2 Page 31 CS 239, Spring 2007 Cumulative Distribution Function (CDF) Maps a value a of random variable X to probability that the outcome is less than or equal to a: Valid for discrete and continuous variables Monotonically increasing Easy to specify, calculate, measure

Lecture 2 Page 32 CS 239, Spring 2007 CDF Examples Coin flip (T = 1, H = 2): Exponential packet interarrival times:

Lecture 2 Page 33 CS 239, Spring 2007 Probability Density Function (pdf) A “relative” of CDF Derivative of (continuous) CDF: Useful to find probability of a range:

Lecture 2 Page 34 CS 239, Spring 2007 Examples of pdf Exponential interarrival times: Gaussian (normal) distribution:

Lecture 2 Page 35 CS 239, Spring 2007 Probability Mass Function (pmf) PDF doesn’t exist for discrete random variables –Because their CDF not differentiable pmf instead: f(x i ) = p i where p i is the probability that x will take on value x i

Lecture 2 Page 36 CS 239, Spring 2007 Examples of pmf Coin flip: Typical CS grad class size:

Lecture 2 Page 37 CS 239, Spring 2007 Summarizing Data With a Single Number Most condensed form of presentation of set of data Usually called the average –Average isn’t necessarily the mean More formal term is index of central tendency Must be representative of a major part of the data set

Lecture 2 Page 38 CS 239, Spring 2007 Indices of Central Tendencies Specify center of location of the distribution of the observations in the sample Common examples: –Mean –Median –Mode

Lecture 2 Page 39 CS 239, Spring 2007 Sample Indices The mean (or other index) is the mean of all possible elements of random variable You usually don’t test them all The mean of the ones you test is the sample mean Sample mean ≠ mean –For a different set of samples, you get a different sample mean

Lecture 2 Page 40 CS 239, Spring 2007 An Example If we assign value 1 to red and value 2 to blue, mean value in jar is 1.5 Sample mean is 1.75 Many different sample means would be possible

Lecture 2 Page 41 CS 239, Spring 2007 Sample Mean Take sum of all observations Divide by the number of observations More affected by outliers than median or mode Mean is a linear property –Mean of sum is sum of means –Not true for median and mode

Lecture 2 Page 42 CS 239, Spring 2007 Sample Median Sort the observations –In increasing order Take the observation in the middle of the series More resistant to outliers –But not all points given “equal weight”

Lecture 2 Page 43 CS 239, Spring 2007 Sample Mode Plot a histogram of the observations –Using existing categories –Or dividing ranges into buckets Choose the midpoint of the bucket where the histogram peaks –For categorical variables, the most frequently occurring Effectively ignores much of the sample

Lecture 2 Page 44 CS 239, Spring 2007 Characteristics of Mean, Median, and Mode Mean and median always exist and are unique Mode may or may not exist –If there is a mode, there may be more than one Mean, median and mode may be identical –Or may all be different –Or some of them may be the same

Lecture 2 Page 45 CS 239, Spring 2007 Mean, Median, and Mode Identical Median Mean Mode x pdf f(x)

Lecture 2 Page 46 CS 239, Spring 2007 Median, Mean, and Mode All Different Mean Median Mode pdf f(x) x

Lecture 2 Page 47 CS 239, Spring 2007 So, Which Should I Use? Depends on characteristics of the metric –If data is categorical, use mode –If a total of all observations makes sense, use mean –If not, and the distribution is skewed, use median –Otherwise, use mean But think about what you’re choosing

Lecture 2 Page 48 CS 239, Spring 2007 Some Examples Most-used resource in system –Mode –Ficus replica that receives the most original updates Interarrival times –Mean –Time between file access requests in Conquest Load –Median –Number of packets a DefCOM classifier handles per second

Lecture 2 Page 49 CS 239, Spring 2007 Don’t Always Use the Mean Means are often overused and misused –Means of significantly different values –Means of highly skewed distributions –Multiplying means to get mean of a product Only works for independent variables Errors in taking ratios of means

Lecture 2 Page 50 CS 239, Spring 2007 Geometric Means An alternative to the arithmetic mean Use geometric mean if product of observations makes sense

Lecture 2 Page 51 CS 239, Spring 2007 Good Places To Use Geometric Mean Layered architectures Performance improvements over successive versions Average error rate on multihop network path

Lecture 2 Page 52 CS 239, Spring 2007 Harmonic Mean Harmonic mean of sample {x 1, x 2,..., x n } is Use when arithmetic mean of 1/x 1 is sensible

Lecture 2 Page 53 CS 239, Spring 2007 Example of Using Harmonic Mean When working with MIPS numbers from a single benchmark –Since MIPS calculated by dividing constant number of instructions by elapsed time Not valid if different m’s (e.g., different benchmarks for each observation) m x i = titi

Lecture 2 Page 54 CS 239, Spring 2007 Means of Ratios Given n ratios, how do you summarize them? Can’t always just use harmonic mean –Or similar simple method Consider numerators and denominators

Lecture 2 Page 55 CS 239, Spring 2007 Considering Mean of Ratios: Case 1 Both numerator and denominator have physical meaning Then the average of the ratios is the ratio of the averages

Lecture 2 Page 56 CS 239, Spring 2007 Example: CPU Utilizations MeasurementCPU Duration Busy (%) Sum200 % Mean?

Lecture 2 Page 57 CS 239, Spring 2007 Mean for CPU Utilizations MeasurementCPU Duration Busy (%) Sum200 % Mean? Not 40%

Lecture 2 Page 58 CS 239, Spring 2007 Properly Calculating Mean For CPU Utilization Why not 40%? Because CPU Busy percentages are ratios –And their denominators aren’t comparable The duration-100 observation must be weighed more heavily than the duration-1 observations

Lecture 2 Page 59 CS 239, Spring 2007 So What Is the Proper Average? Go back to the original ratios Mean CPU Utilization = = 21 %

Lecture 2 Page 60 CS 239, Spring 2007 Considering Mean of Ratios: Case 1a Sum of numerators has physical meaning, denominator is a constant Take the arithmetic mean of the ratios to get the mean of the ratios

Lecture 2 Page 61 CS 239, Spring 2007 For Example, What if we calculated the CPU utilization from the last example using only the four duration 1 measurements? Then the average is 1414 ( ) =.45

Lecture 2 Page 62 CS 239, Spring 2007 Considering Mean of Ratios: Case 1b Sum of the denominators has a physical meaning, numerator is a constant Take the harmonic mean of the ratios

Lecture 2 Page 63 CS 239, Spring 2007 Considering Mean of Ratios: Case 2 The numerator and denominator are expected to have a multiplicative, near- constant property a i = c b i Estimate c with geometric mean of a i /b i

Lecture 2 Page 64 CS 239, Spring 2007 Example for Case 2 An optimizer reduces the size of code What is the average reduction in size, based on its observed performance on several different programs? Proper metric is percent reduction in size And we’re looking for a constant c as the average reduction

Lecture 2 Page 65 CS 239, Spring 2007 Program Optimizer Example, Continued Code Size ProgramBeforeAfterRatio BubbleP IntmmP PermP PuzzleP QueenP QuickP SieveP TowersP

Lecture 2 Page 66 CS 239, Spring 2007 Why Not Use Ratio of Sums? Why not add up pre-optimized sizes and post-optimized sizes and take the ratio? –Benchmarks of non-comparable size –No indication of importance of each benchmark in overall code mix When looking for constant factor, not the best method

Lecture 2 Page 67 CS 239, Spring 2007 So Use the Geometric Mean Multiply the ratios from the 8 benchmarks Then take the 1/8 power of the result =.82