Design of Experiments and Data Analysis Graduate Seminar Fall 2006 Hal Carter These slides available at www.ececs.uc.edu/~hcarter/presentations/experimental_design.ppt Bibliography 1. Julian Faraway, “Practical Regression and Anova using R,” July 2002. Avail at cran.us.r-project.org/other-docs.html 2. “The R Project for Statistical Computing,” Available at www.r-project.org. Software available for Linux, Windows, and OS-X. 3. Raj Jain, “The Art of Computer Systems Performance Analysis,” Wiley, 1991. 4. “An Introduction to R,” Available at www.cran.us.r-project.org/manuals.html
Agenda Analyze and Display Data Design Your Experiments Simple Statistical Analysis Comparing Results Determining O(n) Design Your Experiments 2K Designs Including Replications Full Factor Designs
A System Factors System System inputs System outputs Responses
Experimental Research Define System Define system outputs first Then define system inputs Finally, define behavior (i.e., transfer function) Identify Factors and Levels Identify system parameters that vary (many) Reduce parameters to important factors (few) Identify values (i.e., levels) for each factor Identify Response(s) Identify time or space effects of interest Design Experiments Identify factor-level experiments
Create and Execute System; Analyze Data Define Workload Workloads are inputs that are applied to system Workload can be a factor (but often isn't) Create System Create system so it can be executed Real prototype Simulation model Empirical equations Execute System Execute system for each factor-level binding Collect and archive response data Analyze & Display Data Analyze data according to experiment design Evaluate raw and analyzed data for errors Display raw and analyzed data to draw conclusions
Some Examples Epitaxial growth Analog Simulation New method using non-linear temp profile What is the system? Responses Total time Quality of layer Total energy required Maximum layer thickness Factors Temperature profile Oxygen density Initial temperature Ambient temperature Analog Simulation Which of three solvers is best? What is the system? Responses Fastest simulation time Most accurate result Most robust to types of circuits being simulated Factors Solver Type of circuit model Matrix data structure
SIMPLE MODELS OF DATA Evaluation of a new wireless network protocol. System: wireless network with new protocol Workload: 10 messages applied at single source Each message identical configuration Experiment output: Roundtrip latency per message (ms) Data file “latency.dat” Mean: 19.6 ms Variance: 10.71 ms2 Std Dev: 3.27 ms Latency 22 23 19 18 15 20 26 17 % R > data=read.table("latency.dat",header=T) > data Latency 1 22 2 23 3 19 4 18 5 15 6 20 7 26 8 17 9 19 10 17 > attach(data) > mean(Latency) [1] 19.6 > var(Latency) [1] 10.71111 > sd(Latency) [1] 3.272783 > Index=c(1:10) > plot(Index,Latency,pch=19,cex.lab=1.5)
Verify Model Preconditions Check normal distribution `Use quantile-quantile plot Pattern adheres consistently along ideal quantile-quantile line Check randomness Use plot of residuals around mean Residuals appear random > # Plot residuals to assess randomness > Residuals=Latency-mean(Latency) > plot(Index,Residuals,pch=19,cex.lab=1.5) > abline(0,0) > # Plot quantile-quantile plot to assess if residuals > # normally distributed > qqnorm(Latency, pch=19,cex.lab=1.5) > qqline(Latency)
For the latency data, m = 10, a = 0.05: Confidence Intervals Sample mean vs Population mean CI: > 30 samples CI: < 30 samples > mean(Latency) - qt(0.975,9)*sd(Latency)/sqrt(10) [1] 17.25879 > mean(Latency) + qt(0.975,9)*sd(Latency)/sqrt(10) [1] 21.94121 For the latency data, m = 10, a = 0.05: (17.26, 21.94) Raj Jain, “The Art of Computer Systems Performance Analysis,” Wiley, 1991.
Depth Resistance 1 1.689015 2 4.486722 3 7.915209 4 6.362388 5 11.830739 6 12.329104 7 14.011396 8 17.600094 9 19.022146 10 21.513802 Scatter and Line Plots Resistance profile of doped silicon epitaxial layer Expect linear resistance increase as depth increases > data=read.table("xyscatter.dat", header=T) > attach(data) > model = lm(Resistance ~ Depth) > summary(model) Residuals: Min 1Q Median 3Q Max -2.11330 -0.40679 0.05759 0.51211 1.57310 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.05863 0.76366 -0.077 0.94 Depth 2.13358 0.12308 17.336 1.25e-07 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 1.118 on 8 degrees of freedom Multiple R-Squared: 0.9741, Adjusted R-squared: 0.9708 F-statistic: 300.5 on 1 and 8 DF, p-value: 1.249e-07 > plot(Depth,Resistance,main=”Epi Layer Resistance”,xlab=”Depth, + microns”,ylab=”Resistance, Mohms”,pch=19, + cex.main=1.5,cex.axis=1.5,,cex.lab=1.5) > abline(-0.05863, 2.13358) > error=Resistance-(-0.05863+2.13358*Depth) > plot(Depth,error,main=”Residual Plot”,xlab=”Depth, micron”, + ylab=”Error, Mohms”,cex.main=1.5,cex.axis=1.5,pch=19,cex.lab=1.5) > abline(0,0)
Linear Regression Statistics model = lm(Resistance ~ Depth) summary(model) Residuals: Min 1Q Median 3Q Max -2.11330 -0.40679 0.05759 0.51211 1.57310 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.05863 0.76366 -0.077 0.94 Depth 2.13358 0.12308 17.336 1.25e-07 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 1.118 on 8 degrees of freedom Multiple R-Squared: 0.9741, Adjusted R-squared: 0.9708 F-statistic: 300.5 on 1 and 8 DF, p-value: 1.249e-07
Validating Residuals Errors are marginally normally distributed due to “tails” > qqnorm(error, pch=19,cex.lab=1.5,cex.axis=1.5,cex.main=1.5) > qqline(error)
Comparing Two Sets of Data Example: Consider two wireless different access points. Which one is faster? Inputs: same set of 10 messages communicated through both access points. Response (usecs): Latency1 Latency2 22 19 23 20 19 24 18 20 15 14 20 18 26 21 17 17 19 17 17 18 Approach: Take difference of data and determine CI of difference. If CI straddles zero, cannot tell which access point is faster. > data=read.table("compare.dat", header=T) > data Latency1 Latency2 1 22 19 2 23 20 3 19 24 4 18 20 5 15 14 6 20 18 7 26 21 8 17 17 9 19 17 10 17 18 > diff=Latency1-Latency2 > mean(diff)-qt(0.975,9)*sd(diff)/sqrt(10) [1] -1.273301 > mean(diff)+qt(0.975,9)*sd(diff)/sqrt(10) [1] 2.873301 CI95% = (-1.27, 2.87) usecs Confidence interval straddles zero. Thus, cannot determine which is faster with 95% confidence
Plots with error bars Execution time of SuperLU linear system solution on parallel computer Ax = b For each p, ran problem multiple times with same matrix size but different values Determined mean and CI for each p to obtain curve and error intervals # Load Hmisc library > library("Hmisc") # Read data from file > data <- read.table("demo.data", header=T) # Display the data on screen > data x y delta 1 0.1 10.0 0.8 2 0.2 18.6 1.0 3 0.3 38.4 1.5 4 0.4 74.0 3.0 5 0.5 135.0 5.0 6 0.6 227.1 10.0 7 0.7 356.0 20.0 8 0.8 522.0 50.0 9 0.9 751.4 60.0 10 1.0 1010.5 80.0 attach(data) # Plot dashed line curve on screen with error bars > errbar(x, y, y-delta, y+delta, xlab="Number of Processors, p", ylab="Execution Time, msecs") > lines(x, y, type="l", lty=2)
How to determine O(n) > model = lm(t ~ poly(p,4)) > summary(model) Call: lm(formula = t ~ poly(p, 4)) Residuals: 1 2 3 4 5 6 7 8 9 -0.4072 0.7790 0.5840 -1.3090 -0.9755 0.8501 2.6749 -3.1528 0.9564 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 236.9444 0.7908 299.636 7.44e-10 *** poly(p, 4)1 679.5924 2.3723 286.467 8.91e-10 *** poly(p, 4)2 268.3677 2.3723 113.124 3.66e-08 *** poly(p, 4)3 42.8772 2.3723 18.074 5.51e-05 *** poly(p, 4)4 2.4249 2.3723 1.022 0.364 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 2.372 on 4 degrees of freedom Multiple R-Squared: 1, Adjusted R-squared: 0.9999 F-statistic: 2.38e+04 on 4 and 4 DF, p-value: 5.297e-09 > model = lm(t ~ poly(p,4)) > summary(model) Call: lm(formula = t ~ poly(p, 4)) Residuals: 1 2 3 4 5 6 7 8 9 -0.4072 0.7790 0.5840 -1.3090 -0.9755 0.8501 2.6749 -3.1528 0.9564 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 236.9444 0.7908 299.636 7.44e-10 *** poly(p, 4)1 679.5924 2.3723 286.467 8.91e-10 *** poly(p, 4)2 268.3677 2.3723 113.124 3.66e-08 *** poly(p, 4)3 42.8772 2.3723 18.074 5.51e-05 *** poly(p, 4)4 2.4249 2.3723 1.022 0.364 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 2.372 on 4 degrees of freedom Multiple R-Squared: 1, Adjusted R-squared: 0.9999 F-statistic: 2.38e+04 on 4 and 4 DF, p-value: 5.297e-09
R2 – Coefficient of Determination SSE around the mean is SST = ∑ (yi – mean(y))2 = ∑(yi2) – n(mean(y)2) = SSY -SS0 SSE around the model is SSE = ∑ei2 SSR = SST – SSE R2 = SSR/SST = (SST-SSE)/SST R2 is a measure of how good the model is. The closer R2 is to 1 the better. Example: Let SST = 1499 and SSE = 97. Then R2 = 93.5%
Using the t-test Consider the following data (“sleep.R”) extra group 1 0.7 1 2 -1.6 1 3 -0.2 1 4 -1.2 1 5 -0.1 1 6 3.4 1 7 3.7 1 8 0.8 1 9 0.0 1 10 2.0 1 11 1.9 2 12 0.8 2 13 1.1 2 14 0.1 2 15 -0.1 2 16 4.4 2 17 5.5 2 18 1.6 2 19 4.6 2 20 3.4 2 File “sleep.R” from /usr/lib/R/library/base/data/ "sleep" <- structure(list(extra = c(0.7, -1.6, -0.2, -1.2, -0.1, 3.4, 3.7, 0.8, 0, 2, 1.9, 0.8, 1.1, 0.1, -0.1, 4.4, 5.5, 1.6, 4.6, 3.4), group = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), .Label = c("1", "2"), class = "factor")), .Names = c("extra", "group"), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20"), class = "data.frame") To read into an R session: > source("sleep.R") > sleep extra group 1 0.7 1 2 -1.6 1 3 -0.2 1 4 -1.2 1 5 -0.1 1 6 3.4 1 7 3.7 1 8 0.8 1 9 0.0 1 <more data> From “Introduction to R”, http://www.R-project.org
T.test result > t.test(extra ~ group, data = sleep) Welch Two Sample t-test data: extra by group t = -1.8608, df = 17.776, p-value = 0.0794 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.3654832 0.2054832 sample estimates: mean of x mean of y 0.75 2.33 > data(sleep) > plot(extra ~ group, data = sleep) > ## Traditional interface > with(sleep, t.test(extra[group == 1], extra[group == 2])) Welch Two Sample t-test data: extra[group == 1] and extra[group == 2] t = -1.8608, df = 17.776, p-value = 0.0794 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.3654832 0.2054832 sample estimates: mean of x mean of y 0.75 2.33 > ## Formula interface > t.test(extra ~ group, data = sleep) data: extra by group mean in group 1 mean in group 2 0.75 2.33 p-value is smallest 1- confidence where null hyp. not true. p-value = 0.0794 means difference not 0 above 92%
2k Factorial Design y = q0 + qAxA + qBxB + qABxAB (k=2) SST = total variation around the mean = ∑ (yi – mean(y))2 = SSA+SSB+SSAB where SSA = 22qA2 Note: var(y) = SST/(n-1) Fraction of variation explained by A = SSA/SST
2k Design Cache Factor Levels Experiment Design Address Trace Misses Line Length (L) 32, 512 words No. Sections (K) 4, 16 sections Control Method (C) multiplexed, linear Cache Experiment Design Address Trace Misses L K C Misses 32 4 mux 512 4 mux 32 16 mux 512 16 mux 32 4 lin 512 4 lin 32 16 lin 512 16 lin Are all factors needed? If a factor has little effect on the variability of the output, why study it further? Method? a. Evaluate variation for each factor using only two levels each b. Must consider interactions as well Encoded Experiment Design L K C Misses -1 -1 -1 1 -1 -1 -1 1 -1 1 1 -1 -1 -1 1 1 -1 1 -1 1 1 1 1 1 > data=read.table("2k.data", header=T) > data L K C Misses 1 32 4 mux 14 2 512 4 mux 22 3 32 16 mux 10 4 512 16 mux 34 5 32 4 lin 46 6 512 4 lin 58 7 32 16 lin 50 8 512 16 lin 86 > attach(data) > L=factor(L) > K=factor(K) > C=factor(C) > analysis=aov(Misses~L*K*C) Interaction: effect of a factor dependent on the levels of another
2k Design Analyze Results (Sign Table) Obtain Reponses I L K C LK LC KC LKC Miss.Rate 1 -1 -1 -1 1 1 1 -1 14 1 1 -1 -1 -1 -1 1 1 22 1 -1 1 -1 -1 1 -1 1 10 1 1 1 -1 1 -1 -1 -1 34 1 -1 -1 1 1 -1 -1 1 46 1 1 -1 1 -1 1 -1 -1 58 1 -1 1 1 -1 -1 1 -1 50 1 1 1 1 1 1 1 1 86 L K C Misses -1 -1 -1 14 1 -1 -1 22 -1 1 -1 10 1 1 -1 34 -1 -1 1 46 1 -1 1 58 -1 1 1 50 1 1 1 86 > analysis Call: aov(formula = Misses ~ L * K * C) Terms: L K C L:K L:C K:C L:K:C Sum of Squares 800 200 3200 200 32 72 8 Deg. of Freedom 1 1 1 1 1 1 1 Estimated effects may be unbalanced > summary(analysis) Df Sum Sq Mean Sq L 1 800 800 K 1 200 200 C 1 3200 3200 L:K 1 200 200 L:C 1 32 32 K:C 1 72 72 L:K:C 1 8 8 > SSx=c(800,200,3200,200,32,72,8) > SST=sum(SSx) > Percent.Variation=100*SSx/SST > Percent.Variation [1] 17.7304965 4.4326241 70.9219858 4.4326241 0.7092199 1.5957447 0.1773050 qi: 40 10 5 20 5 2 3 1 = 1/∑(signi*Responsei) SSL = 23q2L = 800 SST = SSL+SSK+SSC+SSLK+SSLC+SSKC+SSLKC = 800+200+3200+200+32+72+8 = 4512 %variation(L) = SSL/SST = 800/4512 = 17.7%
Full Factorial Design Model: yij = m+ai + bj + eij Effects computed such that ∑ai = 0 and ∑bj = 0 m = mean(y..) aj = mean(y.j) – m bi = mean(yi.) – m Experimental Errors SSE = ei2j SS0 = abm2 SSA= b∑a2 SSB= a∑b2 SST = SS0+SSA+SSB+SSE
Full-Factor Design Example Determination of the speed of light Morley Experiments Factors: Experiment No. (Expt) Run No. (Run) Levels: Expt – 5 experiments Run – 20 repeated runs Expt Run Speed 001 1 1 850 002 1 2 740 003 1 3 900 004 1 4 1070 <more data> 019 1 19 960 020 1 20 960 021 2 1 960 022 2 2 940 023 2 3 960 096 5 16 940 097 5 17 950 098 5 18 800 099 5 19 810 100 5 20 870 > mm <- read.table("morley.tab") # Get file from # usr/lib/R/library/base/data > mm Expt Run Speed 1 1 1 850 2 1 2 740 3 1 3 900 4 1 4 1070 5 1 5 930 <95 more lines> > attach(mm) > Expt <- factor(Expt) # Experiment is a factor with levels 1, 2, ..., 5 > Run <- factor(Run) # Run is a factor with levels 1, 2, ..., 20 > Plot a boxplot of each factor > plot(Expt, Speed, main="Speed of Light Data (units)", xlab="Experiment No.") > fm <- aov(Speed~Run+Expt, data=mm) # Determine ANOVA > summary(fm) # Display ANOVA of factors Df Sum Sq Mean Sq F value Pr(>F) Run 19 113344 5965 1.1053 0.363209 Expt 4 94514 23629 4.3781 0.003071 ** Residuals 76 410166 5397 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Box Plots of Factors
Two-Factor Full Factorial > fm <- aov(Speed~Run+Expt, data=mm) # Determine ANOVA > summary(fm) # Display ANOVA of factors Df Sum Sq Mean Sq F value Pr(>F) Run 19 113344 5965 1.1053 0.363209 Expt 4 94514 23629 4.3781 0.003071 ** Residuals 76 410166 5397 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Conclusion: Data across experiments has acceptably small variation, but variation within runs is significant