1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.

Slides:



Advertisements
Similar presentations
Review bootstrap and permutation
Advertisements

I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Hypothesis testing and confidence intervals by resampling by J. Kárász.
October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Sampling: Final and Initial Sample Size Determination
Sampling Distributions (§ )
Chap 10: Summarizing Data 10.1: INTRO: Univariate/multivariate data (random samples or batches) can be described using procedures to reveal their structures.
ELEC 303 – Random Signals Lecture 18 – Statistics, Confidence Intervals Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 10, 2009.
Tests Jean-Yves Le Boudec. Contents 1.The Neyman Pearson framework 2.Likelihood Ratio Tests 3.ANOVA 4.Asymptotic Results 5.Other Tests 1.
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Model Fitting Jean-Yves Le Boudec 0. Contents 1 Virus Infection Data We would like to capture the growth of infected hosts (explanatory model) An exponential.
Copyright 2004 David J. Lilja1 What Do All of These Means Mean? Indices of central tendency Sample mean Median Mode Other means Arithmetic Harmonic Geometric.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Point and Confidence Interval Estimation of a Population Proportion, p
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Chapter 7 Estimation: Single Population
Tests Jean-Yves Le Boudec. Contents 1.The Neyman Pearson framework 2.Likelihood Ratio Tests 3.ANOVA 4.Asymptotic Results 5.Other Tests 1.
AP Statistics Section 10.2 A CI for Population Mean When is Unknown.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
BCOR 1020 Business Statistics
Paul Cornwell March 31,  Let X 1,…,X n be independent, identically distributed random variables with positive variance. Averages of these variables.
Standard error of estimate & Confidence interval.
Bootstrapping applied to t-tests
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Chapter 4 – Modeling Basic Operations and Inputs  Structural modeling: what we’ve done so far ◦ Logical aspects – entities, resources, paths, etc. 
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Topic 5 Statistical inference: point and interval estimate
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
Stat 13, Tue 5/8/ Collect HW Central limit theorem. 3. CLT for 0-1 events. 4. Examples. 5.  versus  /√n. 6. Assumptions. Read ch. 5 and 6.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
1 Statistical Distribution Fitting Dr. Jason Merrick.
10.1: Confidence Intervals – The Basics. Introduction Is caffeine dependence real? What proportion of college students engage in binge drinking? How do.
Chapter 7: Sample Variability Empirical Distribution of Sample Means.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
Week 6 October 6-10 Four Mini-Lectures QMM 510 Fall 2014.
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Announcements First quiz next Monday (Week 3) at 6:15-6:45 Summary:  Recap first lecture: Descriptive statistics – Measures of center and spread  Normal.
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Statistics What is statistics? Where are statistics used?
Bootstrap Event Study Tests Peter Westfall ISQS Dept. Joint work with Scott Hein, Finance.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
Modern Approaches The Bootstrap with Inferential Example.
Project Plan Task 8 and VERSUS2 Installation problems Anatoly Myravyev and Anastasia Bundel, Hydrometcenter of Russia March 2010.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Confidence Intervals. Point Estimate u A specific numerical value estimate of a parameter. u The best point estimate for the population mean is the sample.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Estimating standard error using bootstrap
Confidence Intervals Cont.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Application of the Bootstrap Estimating a Population Mean
Confidence Intervals Copyright  2008 by The McGraw-Hill Companies. This material is intended for educational purposes by licensed users of LearningStats.
Sampling Distributions and Estimation
Stat 31, Section 1, Last Time Sampling Distributions
Statistical Methods For Engineers
Tutorial 9 Suppose that a random sample of size 10 is drawn from a normal distribution with mean 10 and variance 4. Find the following probabilities:
Data Analysis Statistical Measures Industrial Engineering
Sampling Distributions (§ )
Simulation Berlin Chen
Introductory Statistics
Presentation transcript:

1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content

Contents 1.Summarized data 2.Confidence Intervals 3.Independence Assumption 4.Prediction Intervals 5.Which Summarization to Use ? 2

3 1 Summarizing Performance Data How do you quantify: Central value Dispersion (Variability) old new

4 Histogram is one answer old new

5 ECDF allow easy comparison old new

6 Summarized Measures Median, Quantiles Median Quartiles P-quantiles Mean and standard deviation Mean Standard deviation What is the interpretation of standard deviation ? A: if data is normally distributed, with 95% probability, a new data sample lies in the interval

Example 7 mean and standard deviation quantiles

8 Coefficient of Variation Summarizes Variability Scale free Second order For a data set with n samples Exponential distribution: CoV =1 What does CoV = 0 mean ?

Which Summarization Should One Use ? There are (too) many synthetic indices to choose from Traditional measures in engineering are standard deviation, mean and CoV Traditional measures in computer science are mean and JFI JFI is equivalent to CoV In economy, gap and Gini’s index (a variant of Lorenz curve gap) Statisticians like medians and quantiles (robust to statistical assumptions) We will come back to the issue after discussing confidence intervals 9

10 2. Confidence Interval Do not confuse with prediction interval Quantifies uncertainty about an estimation

11 mean and standard deviation quantiles

12 Confidence Intervals for Mean of Difference Mean reduction = 0 is outside the confidence intervals for mean and for median Confidence interval for median

13 Computing Confidence Intervals This is simple if we can assume that the data comes from an iid model Independent Identically Distributed

14 CI for mean and Standard Deviation This is another method, most commonly used method… But requires some assumptions to hold, may be misleading if they do not hold There is no exact theorem as for median and quantiles, but there are asymptotic results and a heuristic.

15 CI for mean, asymptotic case If central limit theorem holds (in practice: n is large and distribution is not “wild”)

16 Example

17 Normal Case Assume data comes from an iid + normal distribution Useful for very small data samples (n <30)

18 Example n =100 ; 95% confidence level CI for mean: CI for standard deviation: same as before except s instead of   for all n instead of 1.98 for n=100 In practice both (normal case and large n asymptotic) are the same if n > 30 But large n asymptotic does not require normal assumption

19 Tables in [Weber-Tables]

20 Standard Deviation: n or n-1 ?

21 Bootstrap Percentile Method A heuristic that is robust (requires only iid assumption) But be careful with heavy tail, see next but tends to underestimate CI Simple to implement with a computer Idea: use the empirical distribution in place of the theoretical (unknown) distribution For example, with confidence level = 95%: the data set is S= Do r=1 to r=999 (replay experiment) Draw n bootstrap replicates with replacement from S Compute sample mean T r Bootstrap percentile estimate is (T (25), T (975) )

22 Example: Compiler Options Does data look normal ? No Methods and give same result (n >30) Method (Bootstrap) gives same result => Asymptotic assumption valid

Confidence Interval for Fairness Index Use bootstrap if data is iid 23

24

25

26

1.[0 ; 0] 2.[0 ; 0.1] 3.[0 ; 0.11] 4.[0 ; 0.21] 5.[0; 0.31] 27

Confidence Interval for Success Probability 28

29

30

31

32 Take Home Message Confidence interval for median (or other quantiles) is easy to get from the Binomial distribution Requires iid No other assumption Confidence interval for the mean Requires iid And Either if data sample is normal and n is small Or data sample is not wild and n is large enough The boostrap is more robust and more general but is more than a simple formula to apply Confidence interval for success probability requires special attention when success or failure is rare To we need to verify the assumptions

3. The Independence Assumption 33 Confidence Intervals require that we can assume that the data comes from an iid model Independent Identically Distributed How do I know if this is true ? Controlled experiments: draw factors randomly with replacement Simulation: independent replications (with random seeds) Else: we do not know – in some cases we will have methods for time series

34 What does independence mean ?

Example Pretend data is iid: CI for mean is [69; 69.8] Is this biased ? 35 dataACF

What happens if data is not iid ? If data is positively correlated Neighbouring values look similar Frequent in measurements CI is underestimated: there is less information in the data than one thinks 36

37 4. Prediction Interval CI for mean or median summarize Central value + uncertainty about it Prediction interval summarizes variability of data

38 Prediction Interval based on Order Statistic Assume data comes from an iid model Simplest and most robust result (not well known, though):

39 Prediction Interval for small n For n=39, [x min, x max ] is a prediction interval at level 95% For n <39 there is no prediction interval at level 95% with this method But there is one at level 90% for n > 18 For n = 10 we have a prediction interval [x min, x max ] at level 81%

Prediction Interval based on Mean 40

Prediction Interval based on Mean If data is not normal, there is no general result – bootstrap can be used If data is assumed normal, how do CI for mean and Prediction Interval based on mean compare ? 41

Prediction Interval based on Mean 42

43 Re-Scaling Many results are simple if the data is normal, or close to it (i.e. not wild). An important question to ask is: can I change the scale of my data to have it look more normal. Ex: log of the data instead of the data A generic transformation used in statistics is the Box-Cox transformation: Continuous in s s=0 : log s=-1: 1/x s=1: identity

44 Prediction Intervals for File Transfer Times mean and standard deviation on rescaled data mean and standard deviation order statistic

45 Which Summarization Should I Use ? Two issues Robustness to outliers Compactness

46 QQplot is common tool for verifying assumption Normal Qqplot X-axis: standard normal quantiles Y-axis: Ordered statistic of sample: If data comes from a normal distribution, qqplot is close to a straight line (except for end points) Visual inspection is often enough If not possible or doubtful, we will use tests later

47 QQPlots of File Transfer Times

48 Take Home Message The interpretation of  as measure of variability is meaningful if the data is normal (or close to normal). Else, it is misleading. The data should be best re- scaled.

5. Which Summarization to Use ? Issues Robustness to outliers Distribution assumptions 49

A Distribution with Infinite Variance 50 True mean True median True mean True median CI based on std dv CI based on bootsrp CI for median

51 Outlier in File Transfer Time

52 Robustness of Conf/Prediction Intervals mean + std dev CI for median geom mean Outlier removed Outlier present Order stat Based on mean + std dev Based on mean + std dev + re-scaling

Fairness Indices Confidence Intervals obtained by Bootstrap How ? JFI is very dependent on one outlier As expected, since JFI is essentially CoV, i.e. standard deviation Gap is sensitive, but less Does not use squaring ; why ? 53

54 Compactness If normal assumption (or, for CI; asymptotic regime) holds,  and  are more compact two values give both: CIs at all levels, prediction intervals Derived indices: CoV, JFI In contrast, CIs for median does not give information on variability Prediction interval based on order statistic is robust (and, IMHO, best)

55 Take-Home Message Use methods that you understand Mean and standard deviation make sense when data sets are not wild Close to normal, or not heavy tailed and large data sample Use quantiles and order statistics if you have the choice Rescale

Questions 56

Questions 57

Questions 58