Download presentation
Presentation is loading. Please wait.
1
The language of experimental design
Statistical Data Analysis - Lecture /04/03 The language of experimental design We need to learn a few terms in order to talk about more complicated experiments (with more than one treatment) Response The response is the measurement we make on each subject in our experiment. Factor A factor is something we classify our subjects by. It could be some natural grouping or a particular treatment. A factor is a “known source of variation” Block/Blocking A block is another word for a factor Blocking is the partioning of experimental subjects into groups. Statistical Data Analysis - Lecture /04/03
2
Statistical Data Analysis - Lecture14 - 04/04/03
Replication Replication is the process whereby we assign the same combination of treatments to a certain number of test subjects Randomisation We use randomisation to evenly distribute the effects of “unknown sources of variation” amongst the treatment groups Control group The control group is the set of experimental subjects that receives no treatment. The control group is used as a “baseline” from which to measure departures Treatment group Corresponding to the control group is the treatment group or experimental group. Statistical Data Analysis - Lecture /04/03
3
Statistical Data Analysis - Lecture14 - 04/04/03
Some examples A farmer wishes to trial two varieties of tomatoes and compare them on the basis of yield A drug company has a new drug for the treatment of acne, but is concerned about the potential side effects if alcohol is consumed whilst someone is taking it A psychologist thinks that reaction times might be affected by cellular phone use and cocaine consumption Statistical Data Analysis - Lecture /04/03
4
Statistical Data Analysis - Lecture14 - 04/04/03
Example 1: A farmer wishes to trial two varieties of tomatoes and compare them on the basis of yield Here our response is yield – this might be measured in kilograms of fruit per plant Our factor is the tomato variety – we expect there to be a difference between the two, but we don’t know, so we’re carry out our experiment to find out We have a divided our hot house into sections (plots) and we randomly assign plantings of the different varieties to the plots this randomisation helps us even out the effects of things we can’t control for such as potential temperature gradients as the sun moves, moisture level etc. This is called a completely randomised design Statistical Data Analysis - Lecture /04/03
5
Statistical Data Analysis - Lecture14 - 04/04/03
Blocking Say our farmer had three hot houses that he wished to carry his experiment out in Obviously there is the potential for variation amongst hot houses (one might be shaded by trees for example) Therefore, in this situation, we would block on the factor hot house And then we would randomly assign treatments (varieties) to the plots within the hot houses (blocks) This is called a completely randomised block design, and generally is much more efficient than a completely randomised design One measure of the efficiency of a design is the number of test subjects (plots) you need in order to find a significant effect Statistical Data Analysis - Lecture /04/03
6
Statistical Data Analysis - Lecture14 - 04/04/03
Example 2. A drug company has a new drug for the treatment of acne, but is concerned about the potential side effects if alcohol is consumed whilst someone is taking it In this medical experiment our response is the patient’s reaction to the drugs (say reaction time) The factors are the acne medication and alcohol consumption Each factor has levels associated with it A level is a number assigned to each variation of a factor. Usually the level corresponds to a dosage (low, med, high) etc. In this experiment, the levels are no medication/recommended dose for medication low/medium/high for alcohol Statistical Data Analysis - Lecture /04/03
7
Statistical Data Analysis - Lecture14 - 04/04/03
Medical statistics In biomedical applications of experimental design there is a different language. It is useful to know some of these words Placebo The placebo is a dummy treatment given to the test subject. If the treatment is a pill, then the placebo might be a sugar pill. If it is an injection, then it might be a saline (salt water) solution Blinding Blinding is effectively randomisation, but also means that the experimental unit (patient) is unaware of whether they are receiving the treatment or the placebo Double blinding occurs when the person who is giving out the treatments (say the nurse doing the injections) has no information as to which treatment they are giving as well as the patient being unaware Placebo effect (False) Positive response exhibited by those who received the placebo This is what blinding is supposed to reduce Statistical Data Analysis - Lecture /04/03
8
Statistical Data Analysis - Lecture14 - 04/04/03
Example 3. A psychologist thinks that reaction times might be affected by cellular phone use and cocaine consumption In this experiment our response is the pigeon’s reaction time (to peck on a red light say) The factors are the cellular phone activity and cocaine consumption In this experiment, the levels are Cell phone is not transmitting/transmitting low/medium/high for cocaine In all of these examples, the factors have been under the control of an experimenter. There are situations where the factors are decided by the test subjects (to some extent) Statistical Data Analysis - Lecture /04/03
9
Observational studies
There are situations where we cannot allocate treatments to experimental units For example if we wish to measure the effect of cannabis use on incidence of schizophrenia, we cannot ask people to start using cannabis just so we can see if they develop schizophrenia Cot death: we can’t ask mothers to put babies to sleep on their stomachs to see if they die etc. In these situations, we have what is known as an observational study The key fact to remember is that the mode of analysis may be the same, but, an observational study cannot prove anything whereas a well designed experiment can Statistical Data Analysis - Lecture /04/03
10
Two-way Analysis of Variance
We’ve seen how to deal with an experiment with one factor => one-way ANOVA What happens when we have two factors? How do we analyse results from Examples 2 and 3? Two-way ANOVA Two talk about two-way ANOVA we need a few more terms We know what the grand mean is and the effects are. In two way ANOVA the effects become the main effects Now that we have two factors it is conceivable that the factors might work in conjunction with each other to affect the response. To take this into account we also have interactions Statistical Data Analysis - Lecture /04/03
11
The two-way ANOVA probability model
Statistical Data Analysis - Lecture /04/03 The two-way ANOVA probability model Our probability model for two-way ANOVA is or In words this says we believe that “each response is given by the grand mean plus an effect due to factor A plus an effect due to factor B plus an effect due to the interaction of factors A and B plus some residual. Futhermore, the residuals are iid Normal with mean zero and variance sigma squared” Statistical Data Analysis - Lecture /04/03
12
The two-way ANOVA data model
Corresponding to our probability model we have a data model (recall a data model doesn’t say anything about the distribution of the residuals) yijk = t + ai + bj + (ab)ij + rijk Whilst it is useful to separate the effects from the grand means for constructing ANOVA identities, most people and software packages prefer to think the model without it (i.e. with the grand mean incorporated into the effects) There are now a set of questions that need to be asked, and the order in which they’re asked in is important Statistical Data Analysis - Lecture /04/03
13
Statistical Data Analysis - Lecture14 - 04/04/03
Interactions The interaction term measures the effect (if any) that the combination of the two main effects has on the response For example we know that humans need water to live, and they also need food to live. If we just fed someone water they’d be basically okay for about 2 months (hunger strikers) If we just fed them sugar they’d last about two to three days If we feed them both sugar and water they’d last about 6 months (then they’d die of scurvy) If an interaction is present, then it makes no sense to talk about one the main effects because it is confounded by the other main effect Statistical Data Analysis - Lecture /04/03
14
Interactions - Additivity
Therefore, we need to ask whether the interactions are significant (non-zero). If they are, then (apart from some diagnostic checking) we proceed no further If the interactions are not significant (zero) then we can check whether the “additive” model may describe our data, i.e. does We can now ask whether the main effects are significant (non-zero) Now, we have a set of questions to ask, and the order in which we need to ask them. How do we answer them? Statistical Data Analysis - Lecture /04/03
15
Statistical Data Analysis - Lecture14 - 04/04/03
Two-way ANOVA table Rather than learn the explicit formulae (any book on ANOVA will have it) we should learn to deal with the form presented by most statistics packages The following is from Minitab from an experiment on weight gain in sheep with doses of cobalt and copper as the factors (2 levels in each) Two-way Analysis of Variance Analysis of Variance for Weight Source DF SS MS F P Copper Cobalt Interaction Error Total Statistical Data Analysis - Lecture /04/03
16
Statistical Data Analysis - Lecture14 - 04/04/03
Two-way ANOVA table Notice there are three F-statistics and three P-values. These correspond to each of our questions The hypothesis being tested for the interaction term is H0: ij = ( )ij= 0, i = 1,..,I, j = 1,…,J If we fail to reject this null hypothesis then we can test the hypotheses H0: i = 0, i = 1,..,I, and H0: j = 0, j = 1,…,J If we reject one or both of these hypotheses then the data provide some evidence that the factor(s) affects the response Statistical Data Analysis - Lecture /04/03
17
Statistical Data Analysis - Lecture14 - 04/04/03
Example Trace minerals are important for any animal’s diet In the following experiment, the experimenter wished to examined the effect (on weight gain) of adding copper and/or cobalt to a sheep’s diet The response here is the percentage weight gain The factors are cobalt and copper Each factor has two levels either none added or some added Do these trace minerals have an effect on weight gain in sheep Statistical Data Analysis - Lecture /04/03
18
Statistical Data Analysis - Lecture14 - 04/04/03
Example We fit the model % Weight gain = copper(level i) + cobalt(level j) + interaction(ij) + noise We code each of the factors as 0/1 variables, i.e. a zero in the copper column means the sheep received no copper whereas a 1 means it received a does. Similarly for cobalt. So 0/0 means no copper, no cobalt 0/1 means dose of cobalt only 1/0 means dose of copper only 1/1 means dose of both Statistical Data Analysis - Lecture /04/03
19
Statistical Data Analysis - Lecture14 - 04/04/03
Analysis of Variance Table Response: Weight Df Sum Sq Mean Sq F value Pr(>F) Copper Cobalt * Copper:Cobalt Residuals P-value for interaction term >> 0.05 => not significant therefore we can examine the effects of the factors without worrying about interaction P-value for Cobalt term < => significant at 5% level therefore it appears there is some effect due to Cobalt P-value for Copper term > 0.05 => not significant at 5% level therefore it appears there is no real effect due to copper Statistical Data Analysis - Lecture /04/03
20
Statistical Data Analysis - Lecture14 - 04/04/03
Two Sample T-Test and Confidence Interval Welch Two Sample t-test data: Weight[Cobalt == 0] and Weight[Cobalt == 1] t = , df = , p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: Conclusion, with 95% confidence we can say that the addition of cobalt to the sheep’s diet will on average increase the weight by 0.3% to 4.7% Statistical Data Analysis - Lecture /04/03
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.