Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions.

Slides:



Advertisements
Similar presentations
ANOVA Two Factor Models Two Factor Models. 2 Factor Experiments Two factors can either independently or together interact to affect the average response.
Advertisements

Topic 12: Multiple Linear Regression
Copyright 2004 David J. Lilja1 Comparing Two Alternatives Use confidence intervals for Before-and-after comparisons Noncorresponding measurements.
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Design of Experiments and Analysis of Variance
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 10 Simple Regression.
Chapter 11 Analysis of Variance
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 17 Analysis of Variance
k r Factorial Designs with Replications r replications of 2 k Experiments –2 k r observations. –Allows estimation of experimental errors Model:
Go to Table of ContentTable of Content Analysis of Variance: Randomized Blocks Farrokh Alemi Ph.D. Kashif Haqqi M.D.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Analysis of Variance Statistics for Managers Using Microsoft.
Chap 10-1 Analysis of Variance. Chap 10-2 Overview Analysis of Variance (ANOVA) F-test Tukey- Kramer test One-Way ANOVA Two-Way ANOVA Interaction Effects.
CHAPTER 3 Analysis of Variance (ANOVA) PART 1
For Discussion Today (when the alarm goes off) Survey your proceedings for just one paper in which factorial design has been used or, if none, one in which.
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
QNT 531 Advanced Problems in Statistics and Research Methods
Analysis of Variance or ANOVA. In ANOVA, we are interested in comparing the means of different populations (usually more than 2 populations). Since this.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
© 2003 Prentice-Hall, Inc.Chap 11-1 Analysis of Variance IE 340/440 PROCESS IMPROVEMENT THROUGH PLANNED EXPERIMENTATION Dr. Xueping Li University of Tennessee.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions.
© 1998, Geoff Kuenning General 2 k Factorial Designs Used to explain the effects of k factors, each with two alternatives or levels 2 2 factorial designs.
© 2002 Prentice-Hall, Inc.Chap 9-1 Statistics for Managers Using Microsoft Excel 3 rd Edition Chapter 9 Analysis of Variance.
CHAPTER 14 MULTIPLE REGRESSION
A Really Bad Graph. For Discussion Today Project Proposal 1.Statement of hypothesis 2.Workload decisions 3.Metrics to be used 4.Method.
Lecture 8 Page 1 CS 239, Spring 2007 Experiment Design CS 239 Experimental Methodologies for System Software Peter Reiher May 1, 2007.
CHAPTER 12 Analysis of Variance Tests
Chapter 10 Analysis of Variance.
Lecture 10 Page 1 CS 239, Spring 2007 Experiment Designs for Categorical Parameters CS 239 Experimental Methodologies for System Software Peter Reiher.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Analysis of Variance Statistics for Managers Using Microsoft.
One-Way Analysis of Variance
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
CPE 619 Two-Factor Full Factorial Design With Replications Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The.
CPE 619 One Factor Experiments Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama in.
Chapter 11 Analysis of Variance. 11.1: The Completely Randomized Design: One-Way Analysis of Variance vocabulary –completely randomized –groups –factors.
Chapter 4 Analysis of Variance
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Linear Regression Models Andy Wang CIS Computer Systems Performance Analysis.
Multiple Linear Regression
SUMMARY EQT 271 MADAM SITI AISYAH ZAKARIA SEMESTER /2015.
CHAPTER 3 Analysis of Variance (ANOVA) PART 3 = TWO-WAY ANOVA WITH REPLICATION (FACTORIAL EXPERIMENT) MADAM SITI AISYAH ZAKARIA EQT 271 SEM /2015.
© 1998, Geoff Kuenning Comparison Methodology Meaning of a sample Confidence intervals Making decisions and comparing alternatives Special considerations.
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.
DSCI 346 Yamasaki Lecture 4 ANalysis Of Variance.
CHAPTER 3 Analysis of Variance (ANOVA) PART 3 = TWO-WAY ANOVA WITH REPLICATION (FACTORIAL EXPERIMENT)
Chapter 11 Analysis of Variance
CHAPTER 3 Analysis of Variance (ANOVA) PART 1
MADAM SITI AISYAH ZAKARIA
Statistics for Managers Using Microsoft Excel 3rd Edition
Two-Way Analysis of Variance Chapter 11.
Factorial Experiments
CHAPTER 4 Analysis of Variance (ANOVA)
Two-Factor Full Factorial Designs
Relationship with one independent variable
Linear Regression Models
CHAPTER 29: Multiple Regression*
Chapter 11 Analysis of Variance
Chapter 13 Group Differences
Relationship with one independent variable
Replicated Binary Designs
One-Factor Experiments
For Discussion Today Survey your proceedings for just one paper in which factorial design has been used or, if none, one in which it could have been used.
Presentation transcript:

Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions n x r pts 2kr2kr est. errors full factorial additive? Analysis to get: Main effects Interactions/errors % of variation explained Significance (CI or ANOVA)

© 1998, Geoff Kuenning Two-Factor Full Factorial Design Without Replications Used when you have only two parameters But multiple levels for each Test all combinations of the levels of the two parameters At this point, without replicating any observations For factors A and B with a and b levels, ab experiments required

Experimental Design ( l 1,0, l 1,1, …, l 1,n1-1 ) x ( l 2,0, l 2,1, …, l 2,n2-1 ) 2 different factors, each factor with n i levels (and possibly r replications -- defer) Categorical levels Factor 1 Factor 2

© 1998, Geoff Kuenning What is This Design Good For? Systems that have two important factors Factors are categorical More than two levels for at least one factor Examples - –Performance of different processors under different workloads –Characteristics of different compilers for different benchmarks –Effects of different reconciliation topologies and workloads on a replicated file system

© 1998, Geoff Kuenning What Isn’t This Design Good For? Systems with more than two important factors –Use general factorial design Non-categorical variables –Use regression Only two levels –Use 2 2 designs

© 1998, Geoff Kuenning Model For This Design y ij is the observation  is the mean response  j is the effect of factor A at level j  i is the effect of factor B at level i e ij is an error term Sums of  j ’s and  j ’s are both zero

© 1998, Geoff Kuenning What Are the Model’s Assumptions? Factors are additive Errors are additive Typical assumptions about errors –Distributed independently of factor levels –Normally distributed Remember to check these assumptions!

© 1998, Geoff Kuenning Computing the Effects Need to figure out ,  j, and  j Arrange observations in two- dimensional matrix –With b rows and a columns Compute effects such that error has zero mean –Sum of error terms across all rows and columns is zero

© 1998, Geoff Kuenning Two-Factor Full Factorial Example We want to expand the functionality of a file system to allow automatic compression We examine three choices - –Library substitution of file system calls –A new VFS –UCLA stackable layers Using three different benchmarks –With response time as the metric

© 1998, Geoff Kuenning Sample Data for Our Example LibraryVFSLayers Compile Benchmark Benchmark Web Server Benchmark

© 1998, Geoff Kuenning Computing  Averaging the j th column, By assumption, the error terms add to zero Also, the  j s add to zero, so Averaging rows produces Averaging everything produces

© 1998, Geoff Kuenning So the Parameters Are...

© 1998, Geoff Kuenning Sample Data for Our Example LibraryVFSLayers Compile Benchmark Benchmark Web Server Benchmark _____________________________________________________

© 1998, Geoff Kuenning Calculating Parameters for Our Example  = grand mean =  j = (-6.5, -16.3, 22.8)  i = (-264.1, , 386.9) So, for example, the model predicts that the benchmark using a special- purpose VFS will take seconds –Which is seconds

© 1998, Geoff Kuenning Estimating Experimental Errors Similar to estimation of errors in previous designs Take the difference between the model’s predictions and the observations Calculate a Sum of Squared Errors Then allocate the variation

© 1998, Geoff Kuenning Allocating Variation Using the same kind of procedure we’ve used on other models, SSY = SS0 + SSA + SSB + SSE SST = SSY - SS0 We can then divide the total variation between SSA, SSB, and SSE

© 1998, Geoff Kuenning Calculating SS0, SSA, and SSB a and b are the number of levels for the factors

© 1998, Geoff Kuenning Allocation of Variation For Our Example SSE = 2512 SSY = 1,858,390 SS0 = 1,149,827 SSA = 2489 SSB = 703,561 SST=708,562 Percent variation due to A -.35% Percent variation due to B % Percent variation due to errors -.35%

© 1998, Geoff Kuenning Analysis of Variation Again, similar to previous models –With slight modifications As before, use an ANOVA procedure –With an extra row for the second factor –And changes in degrees of freedom But the end steps are the same –Compare F-computed to F-table –Compare for each factor

© 1998, Geoff Kuenning Analysis of Variation for Our Example MSE = SSE/[(a-1)(b-1)]=2512/[(2)(2)]=628 MSA = SSA/(a-1) = 2489/2 = 1244 MSB = SSB/(b-1) = 703,561/2 = 351,780 F-computed for A = MSA/MSE = 1.98 F-computed for B = MSB/MSE = 560 The 95% F-table value for A & B = 6.94 A is not significant, B is

© 1998, Geoff Kuenning Checking Our Results With Visual Tests As always, check if the assumptions made by this analysis are correct Using the residuals vs. predicted and quantile-quantile plots

© 1998, Geoff Kuenning Residuals Vs. Predicted Response for Example

© 1998, Geoff Kuenning What Does This Chart Tell Us? Do we or don’t we see a trend in the errors? Clearly they’re higher at the highest level of the predictors But is that alone enough to call a trend? –Perhaps not, but we should take a close look at both the factors to see if there’s a reason to look further –And take results with a grain of salt

© 1998, Geoff Kuenning Quantile-Quantile Plot for Example

© 1998, Geoff Kuenning Confidence Intervals for Effects Need to determine the standard deviation for the data as a whole From which standard deviations for the effects can be derived –Using different degrees of freedom for each Complete table in Jain, pg. 351

© 1998, Geoff Kuenning Standard Deviations for Our Example s e = 25 Standard deviation of  - Standard deviation of  j - Standard deviation of  i -

© 1998, Geoff Kuenning Calculating Confidence Intervals for Our Example Just the file system alternatives shown here At 95% level, with 4 degrees of freedom CI for library solution - (-39,26) CI for VFS solution - (-49,16) CI for layered solution - (-10,55) So none of these solutions are 95% significantly different than the mean

© 1998, Geoff Kuenning Looking a Little Closer Does this mean that none of the alternatives for adding the functionality are different? Use contrasts to check contrasts – any linear combination of effects whose coeff add up to zero (ch. 18.5)

© 1998, Geoff Kuenning Looking a Little Closer For example, is the library approach significantly better than layers? –Using contrast of  library -  layers, the confidence interval is (-58,-.5) (at 95%) –So library approach is better, with this confidence

Two-Factor Full Factorial Design With Replications Replicating a full factorial design allows separating out the interactions between factors from experimental error Without replicating implies assumption that interactions were negligible and could be viewed as errors. Read Chapter 22

( l 1,0, l 1,1, …, l 1,n1-1 ) x ( l 2,0, l 2,1, …, l 2,n2-1 ) x … x ( l k,0, l k,1, …, l k,nk-1 ) k different factors, each factor with n i levels r replications Informal techniques just to answer which combo is of levels is best – rank responses Factor 1 Factor k Factor 2 Full Factorial Design With k Factors

Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions n x r pts 2kr2kr est. errors full factorial !additive log transform Jain ch 14,15 Jain ch 17 Jain ch 19 Jain ch 18 Jain ch 20Jain ch 23 Jain ch 21 Jain ch 22

For Discussion Project Proposal 1.Statement of hypothesis 2.Workload decisions 3.Metrics to be used 4.Method