SJS SDI_31 Design of Statistical Investigations Stephen Senn 3. Design of Experiments 1 Some Basic Ideas
SJS SDI_32 Elements of an Experiment The Nouns Experimental material –Basic units –Blocks –Replications Treatments –Orderings –Dimensions –Combinations
SJS SDI_33 Elements of an Experiment The Verbs Allocation –Which material gets which treatment For example using some form of randomisation Conduct –How will it all be carried out? Measuring –When to measure what Analysis
SJS SDI_34 Exp_1 Rat TXB2 Experimental material –36 Rats Treatments to be studied –6 in a one-way layout 4 new chemical entities 1 vehicle 1 marketed product
SJS SDI_35 Caution!!!!! In practice such things are not given Material –Why rats and not mice, dogs, or guinea-pigs? –Why 36? Treatments –Why these 6? In practice the statistician can be involved in such decisions also
SJS SDI_36 Exp_1 Rat TXB2 Allocation If rats are not differentiable in any way we can determine, we might as well allocate at random? Unconstrained randomisation not a good idea, however. Some treatments will be allocated to few rats. So constrain to have 6 rats per group
SJS SDI_37 S-Plus Randomisation #M2 Rat TXB2 Randomisation #Vector of treatments treat<-c(rep("V",6),rep("M",6),rep("a",6), rep("b",6),rep("c",6),rep("d",6)) #Random number for each rat rnumb<-runif(36,0,1) #Sort rats by random number rat<-sort.list(rnumb) #Join rats and treatments temp.frame<-data.frame(rat,treat) #Sort rows by rat des.frame<-sort.col(temp.frame, c("rat","treat"),"rat") #Print design des.frame We shall illustrate an alternative using the sample function later in the course
SJS SDI_38 Result of Randomisation rat treat 9 1 M 22 2 b 4 3 V 33 4 d 13 5 a 11 6 M 10 7 M 31 8 d 7 9 M b 3 11 V c a rat treat M a b b d c b c a b d c 8 26 M rat treat a 1 28 V c d 6 31 V 5 32 V d a 2 35 V c
SJS SDI_39 Exp_1 Rat TXB2 Conduct We will not cover this in this course This does not mean that this is not important In the Exp_1 example precise instructions might be necessary for treating the rats.
SJS SDI_310 Exp_1 Rat TXB2 Measurement Obviously we have to decide what it is important to measure Here it has been decided to measure TXB2 a marker of Cox-1 activity Cox = cyclooxygenase Analgesics are designed to inhibit Cox-2, which is involved in synthesis of inflammatory prostaglandins
SJS SDI_311 Measurement (Cont) However they also tend to inhibit Cox-1 which is involved in synthesis of the prostaglandins that help maintain gastric mucosa Cox-1 inhibition can lead to ulcers Ulcers are an unwanted side-effect of Non Steroidal Anti-inflammatory Drugs (NSAIDs)
SJS SDI_312 The Moral Even simple experiments may involve complex subject matter-knowledge It may be dangerous for the statistician to assume that all that is being produced is sets of numbers, details being irrelevant Team work may be necessary
SJS SDI_313 Analysis One-way layout Six treatments Balanced design No-brainer is one-way ANOVA –We shall look at the maths of one-way ANOVA in more detail later. –For the moment take this as understood
SJS SDI_314 S-PLUS ANOVA Code #Analysis of TXB2 data #Set contrast options options(contrasts=c(factor="contr.treatment", ordered="contr.poly")) #Input data treat<-factor(c(rep(1,6),rep(2,6), rep(3,6),rep(4,6),rep(5,6),rep(6,6)), labels=c("V","M","a","b","c","d")) TXB2<- c(196.85,124.40,91.20,328.05,268.30,214.70, 2.08,1.97,4.80,5.01,2.52,9.35, ,75.60,322.80,212.15,42.95, , ,81.75,52.70,352.85,198.80,107.65, 83.19,66.80,81.15,39.00,61.96,87.00, 74.48,60.00,77.00,42.00,48.95,66.30) fit1<-aov(TXB2~treat)#ANOVA summary(fit1)
SJS SDI_315 S-PLUS Output summary(fit1) Df Sum of Sq Mean Sq F Value Pr(F) treat Residuals So there is highly significant difference between treatments but this does not make this an adequate analysis
SJS SDI_316 S-PLUS Diagnostic Code #Diagnostic plot data par (mfrow=c(2,2)) plot(treat~TXB2) hist(resid(fit1),xlab="residual") plot(fit1$fitted.values,resid(fit1),xlab="fitted",ylab= "residual") abline(h=0) qqnorm(resid(fit1),xlab="theoretical",ylab="empirical") qqline(resid(fit1))
SJS SDI_317
SJS SDI_318 Model Failure Histogram of residuals has heavy tails QQ Plot shows clear departure from Normality Variance increases with mean –Suggests log-transformation
SJS SDI_319
SJS SDI_320 Exp_2: A Simple Design Problem (The simplest) You have N experimental units in total They are completely exchangeable You have two treatments A and B –with no prior knowledge of their effects You wish to compare A and B –continuous outcome assumed Normal How many units for A and for B?
SJS SDI_321 Solution is obvious Allocate half the units to one treatment and half to the other –Assuming that there is an even number of units However, we should go through the design cycle What sort of data will we collect? What will we do with them?
SJS SDI_322 Basic Design Cycle Objective Tentative Design Potential Data Possible Analysis Possible Conclusions Relevant factors
SJS SDI_323 The Anticipated Data Two mean outcomes Variances expected to be the same –Assumption but Reasonable under null hypothesis No other assumption is more reasonable given that we know nothing about the treatments We will calculate the contrast between these means
SJS SDI_324
SJS SDI_325 Now set the derivatives equal to zero From (2) and (3) we have
SJS SDI_326 So What!!?? Solution is obvious Statistical theory does not seem to have helped us very much However, this was a trivial problem We now try a slightly more complicated experiment This leads to a non-trivial problem
SJS SDI_327 Exp_3 A More Complicated Case Now suppose that we are comparing k experimental treatments to a single control. The treatments will not be compared to each other. How many units should we allocate to each treatment? –We assume that variances do not vary with treatment: homoscedasticity
SJS SDI_328 Exp_3 Continued Arguments of symmetry suggest the active treatments be given to the same number of units, say n. Suppose that m units will be allocated the control. With N units in total we have N = m + kn
SJS SDI_329 We consider the variance of a typical contrast Incorporating the necessary constraint using a Lagrange multiplier we obtain the following objective function And proceed to minimise this by setting the partial derivatives with respect to m, n and equal to zero. (Note that we assume that k and N are fixed in the design specification.)
SJS SDI_330 Set derivatives equal to zero. Solution gives Setting equal to zero we have
SJS SDI_331 From (4) and (5) we have Substituting in (4) we have
SJS SDI_332 Check Exp_2 was a special case of Exp_3 with k = 1 So our general solution must give the same answer as the special case when k = 1 But when k = 1 the formula yields m = N/2, which is the solution we reached before
SJS SDI_333
SJS SDI_334 Exp_3 Concluded The optimal solution was not easy to guess It consists of more units to the control than to the experimental treatment Lesson: be careful!
SJS SDI_335 Questions What are the practical problems in implementing the solution we found for Exp_3? Why might this not be a good solution after all? Are there any implications for the design of Exp_1?