Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Principles of Experimental Design Chris Holmes Thanks to Dov Stekel.

Similar presentations


Presentation on theme: "Statistical Principles of Experimental Design Chris Holmes Thanks to Dov Stekel."— Presentation transcript:

1 Statistical Principles of Experimental Design Chris Holmes Thanks to Dov Stekel

2 Sources of Variation Systematic: Dye effects, print tips Chance, or random variation Population variability Unforeseen effects E.g. unnoticed change in experimental conditions

3 Overview Fundamental concepts: confidence and power How many replicates? Blocking and randomization Arrangement of samples and arrays

4 Microarray We are interested in detecting differentially expressed genes When stating that a gene is (or is not) differentially expressed two complementary things can go wrong 1. You say it is differentially expressed and it’s not 2. You say it’s not diff. and it is!

5 Type I and Type II Errors Our Call Truth H0 – No effect H1 - Effect H0 Correct Type II error H1 Type I error Correct

6 Confidence The confidence is the probability of not getting a false positive (FP) result. FP: It’s not diff. expressed and you say it is It is the probability of accepting the null hypothesis when the null hypothesis is true. A false positive result is known as a Type I Error. We control for Type I errors explicitly by selecting an appropriate confidence level In microarray experiments, we must modify the confidence level to account for multiplicity

7 Power The power is the probability of not getting a false negative (FN) result. FN: It is diff expressed and you miss it It is the probability of rejecting the null hypothesis when the null hypothesis is false. A false negative result is known as a Type II Error. We control the power implicitly via the confidence level and the experimental design.

8 Type I and Type II Errors Our Call Truth H0 – No effect H1 - Effect H0 Correct Type II error H1 Type I error Correct

9 Power Analysis Calculating the Power of a study is a vital part of experimental design An overpowered study is wasteful of resources An underpowered study will be unable to reveal interesting results

10 Power Analysis The power of an analysis depends on the following factors: True (unknown) difference in mean we are trying to detect (Unknown) Standard deviation of the population Chosen significance threshold Type of test Number of replicates

11 Experimental Variability Individuals Sample preparations Dyes Print runs Pins Arrays Hybridizations Laboratories Researchers Imaging Software

12 Experimental Variability In order to perform a power analysis we must measure and quantify the levels of experimental variability in our system: Calibration experiments Pilot experiments We perform the analysis relative to the largest source of variability. The best way to get maximum power from a statistical analysis is to minimise the level of experimental error and noise

13 Power Analysis Assumptions We assume that the data is approximately log normally distributed This corresponds to standard deviation of the errors of the raw data being proportional to the signal intensity This is equivalent to a constant standard deviation in the logged data The standard deviation divided by the mean is called the coefficient of variation

14 Log Normally Distributed Data

15 Fold Ratios and Mean Differences If the data is log normally distributed: The difference in mean in the logged data is equal to the log of the fold ratio of the raw data. The standard deviation (s) of the logged data relates to the coefficient of variation (v) of the raw data via the formula: s = sqrt(ln(v 2 +1)) / ln(2)

16 Power Analysis We will use the power.t.test() formula in R to calculate the power of one and two sample tests power.t.test(n, delta, sd, sig.level, power, type, alternative) Formula is used with one of the first five variables omitted and will calculate the unknown variable

17 Power Analysis Example: Doxorubicin Chemotherapy We are interested in the treatment of breast cancer patients with doxorubicin chemotherapy We want to perform a microarray experiment to determine genes that are up- or down- regulated as a result of the chemotherapy We would like to know: How to design the experiment? How many patients we need?

18 Paired vs Unpaired Design In a paired design, we take samples from each patient before and after treatment, and for each gene, look at the difference in expression before and after treatment In an unpaired design, we have two groups of patients, one group treated, the other group untreated. We look at the difference in gene expression between the two groups Which is a better experiment?

19 Power Analysis Assumptions Suppose we know from a pilot study and evaluation of our technology that the coefficient of variation is 40% Let's say that we want to detect genes that are 2-fold regulated We are testing 10,000 genes so we will use a signficance threshold of 0.0001 to compensate for multiplicity How many patients do we need for a power of 80%, 90% and 99%?

20 Paired Experiment The standard deviation of the underlying normal distribution equivalent to 40% variability is 0.39 The difference in means is log 2 (2) = 1 The number of patients we need is: Power 80% 90% 99% Number 14 16 20

21 Unpaired Experiment The standard deviation and difference in means is the same. The number of patients we need is: Power 80% 90% 99% Group Size 18 21 28 Number 36 42 56 1-Sample Number 14 16 20

22 Paired vs Unpaired In this example, we need more than twice the patients in the unpaired experiment to obtain the same power as the paired experiment Paired experimental design is more powerful than unpaired experimental design because the differences between individuals are factored out in the analysis

23 Blocking, Randomization and Blinding Arrangement of experimental design that minimises problems from extraneous sources of variability Use blocking to avoid confounding Use randomization and blinding to avoid bias

24 Toxicity Example We are interested in characterising the toxic effect of Benzo(a)pyrene (BP) on rats 8 Rats are to be treated with BP and 8 rats with a control compound Each array will be hybridized against a reference sample 16 Arrays in the experiment

25 Experimental Design Suppose there are two batches of 8 slides from two different print runs (1 and 2) Hybridisation will be done by two researchers, Alison and Brian. What is the best way to arrange the experiment?

26 Design 1 Alison prepares all 8 BP samples and hybridises them to the arrays of print run 1 Brian prepares all 8 control samples and hybridises them to the arrays of print run 2

27 Design 2 Alison chooses 8 rats and treats 4 with BP and 4 with control substance. She prepares and hybridises 2 BP samples to arrays from print run 1 and 2 BP samples to arrays from print run 2 She prepares and hybridises 2 control samples to arrays from print run 1 and 2 control samples to arrays from print run 2 Brian does the same with the other 8 rats

28 Alison Treated Control Print Run 1Print Run 2 Design 2 Brian Treated Control Print Run 1Print Run 2

29 Design 3 8 rats are randomly assigned to Alison, along with 4 BP preparations and 4 control preparations. She is not told which preparations are which. She prepares and hybridises samples to randomly pre-arranged arrays so that 2 BP samples and 2 control samples are hybridised to 4 arrays from each of print runs 1 and 2. Brian does the same with the other 8 rats

30 What is wrong with design 1? Treatment, researcher and print run are confounded variables We cannot tell whether differences between the two groups of rats result from treatment, researcher or print run Use blocking in designs 2 and 3 to deconfound the variability of interest (treatment) from the extraneous variabilities (researcher and print run) Designs 2 and 3 are also balanced which increases power of analyses

31 What is wrong with design 2? Alison's choice of rats may be biased For example, she may choose the healthiest rats, so confounding potential treatment effects with researcher variability Use randomization and blinding in design 3 to avoid bias

32 Arrangement of Samples and Arrays Is it better to use Affymetrix arrays or a two-colour array system? If using a two-colour array system, is it better to use a reference sample? If using a two-colour array system, what is the best arrangement of samples on the slides?

33 Several Factors Available technology Cost Statistical considerations We consider problem from perspective of three different experiments

34 Example 1: Hepatocellular Carcinomas 20 Samples are taken from disease and healthy tissue from patients suffering from hepatocellular carcinomas and hybridised to microarrays. We would like to identify genes that are up- or down- regulated in hepatocellular carcinomas relative to healthy tissue.

35 Design 1.1 Array 1Array 2 Reference Sample Healthy 1 Reference Sample Disease 1 x 20

36 Design 1.2 GeneChip 1GeneChip 2 Healthy 1Disease 1 x 20

37 Design 1.3 Array 1 Healthy 1 Disease 1 x 20

38 Design 1.4 Array 1Array 11 Healthy 1 Disease 1 Disease 11 Healthy 11 x 10

39 Design 1.5 Array 1Array 2 Healthy 1 Disease 1 Healthy 1 x 20

40 Which is the best design? Simple experiment - five different designs! Design 1.1 is bad because it increases variability. Design 1.3 is bad because it confounds colour with disease state. Designs 1.4 and 1.5 are best.

41 Design 1.1 Coefficient of Variability is 30% Design increases variability to 43% Array 1Array 2 Reference Sample Healthy Reference Sample Disease

42 Design 1.5 Coefficient of Variability: 30% Experimental design reduces variability to 21% Array 1Array 2 Healthy Disease Healthy Disease

43 Example 2: B-Cell Lymphomas Samples are taken from 60 patients suffering from B-cell lymphomas and hybridised to microarrays. The aim of the experiment is to identify clinically relevant subgroups of patients using a cluster analysis, and then to build a classification model to differentiate between the subgroups.

44 Design 2.1 Array 1 Patient 1 Patient 2 x 30

45 Design 2.2 Array 1 Patient 1 Reference x 60

46 Design 2.3 GeneChip 1 Patient 1 x 60

47 Which design is best? Design 2.1 is bad because it is difficult to compare patients on equal footing. Designs 2.2 and 2.3 are good. Probably most appropriate use of Affymetrix technology.

48 Example 3: Yeast Time Series Budding yeast can reproduce sexually by producing haploid cells through a process called sporulation. Yeast was placed in a sporulating medium, samples taken at 7 timepoints from the start of sporulation. We are interested in identifying genes that show similar profiles in the timecourse.

49 Design 3.1 Time 0 Time 1 Time 0 Time 2 Time 0 Time 3 Time 0 Time 4 Time 0 Time 5 Time 0 Time 6 Array 12 345 6

50 Design 3.2 Array 1 Time 0 Time 1 2 Time 2 3 Time 3 4 Time 4 5 Time 5 6 Time 6 7 Time 0

51 Design 3.3 GeneChip 1 Time 0 2 345 6 7 Time 1Time 2Time 3Time 4Time 5Time 6

52 Which is the best design? Design 3.3 requires careful normalisation because timepoint is confounded with array. Design 3.2 is a loop design. It is a good design, but harder to analyse. Design 3.1 may be the best design.

53 Conclusions Number of replicates: Calculate using power analyses Extraneous variability: Block to avoid confounding variables Randomisation to avoid bias Blocked experiments require ANOVA analyses

54 Conclusions Two sample experiments Reference samples increase variability. Hybridise both samples to same array. Multiple patient comparisons Reference samples or Affymetrix technology enable comparisons. Time series analysis Reference samples are useful.

55 Practical Use reference sample to estimate coefficient of variability Power analysis for population inference test Power and false positive analysis for differentially expressed genes in a single patient

56 References Statistics for Experimenters. (1978). Box, G.E.P et al. Wiley. Gary Churchill’s group, Jackson Laboratory Statistics for Microarrays. (2005). E. Wit


Download ppt "Statistical Principles of Experimental Design Chris Holmes Thanks to Dov Stekel."

Similar presentations


Ads by Google