Joseph G Pigeon Villanova University Complex Experimental Design and Simple Data Analysis: A Pharmaceutical Example Joseph G Pigeon Villanova University
Introduction Designs with restricted randomization have multiple error measures Pharmaceutical example where the split plot structure is even more complex Whole plot structure in two dimensions Correlation structure in two dimensions Caveats Limited understanding of the biology involved No originality of statistical methods claimed
Split Plot Designs Originated in agricultural experiments where Levels of some factors are applied to whole plots Levels of other factors are applied to sub plots Separate randomizations to whole plots and sub plots Two types of experimental units Two types of error measures Correlation among the observations
Split Plot Designs Also common in industrial experiments when Complete randomization does not occur Some factor levels may be impractical, inconvenient or too costly to change This restriction on randomization results in some whole plot factors and some sub plot factors Data analysis needs to account for this restricted randomization or split plot structure
Split Plot Example Consider a paper manufacturer who wants to study Effects of 3 pulp preparation methods Effects of 4 temperatures Response is tensile strength Pilot plant is capable of 12 runs per day One replicate on each of three days
Split Plot Example
Split Plot Example Initially, we might consider this to be a 4 x 3 factorial in a randomized block design If true, then the order of experimentation within a block should have been completely randomized However, this was not feasible; data were not collected this way The multiplication sign in 4 x 3 had to be replaced with an x.
Split Plot Example Experiment was conducted as follows: A batch of pulp was produced by one of the three methods The batch was divided into four samples Each sample was cooked at one of the four temperatures Split plot design with Pulp preparation method as whole plot treatment Temperature as sub (split) plot treatment
Split Plot Example
Split Plot Example Subplot error is less than whole plot error (typical)
Split Plot Example Lessons We must carefully consider how the data were collected and incorporate all randomization restrictions into the analysis Whole plot effects measured against whole plot error Sub plot effects measured against sub plot error
Description of Example – MQPA Assay Multivalent Q-PCR based Potency Assay Used to assign potencies (independently) to each of five reassortants of a pentavalent vaccine Relies on the quantitation of viral nucleic acid generated in 24 hours Two major components Biological component (infection of the standard and sample viruses) Biochemical component (quantitative PCR reaction where PCR = Polymerase Chain Reaction)
Polymerase Chain Reaction (PCR)
Description of Example- Biological Component Vero cell maintenance and set up Serial dilution of known standard and unknown sample are incubated with trypsin Infected in 4 replicate wells of Vero cell monolayers seeded in a 96 well plate Infection proceeds for 24 hours and then halted with the addition of a detergent and storage at –70C
Description of Example- Biochemical Component Lysate is thawed and diluted Preparation of a “master mix” Preparation of Q-PCR plate (master mix + diluted lysates) Configuration of the Q-PCR detection system Potency is determined by parallel line analysis of standard and test samples Specific interest is on optimization of the PCR portion of the assay
PCR Optimization Design Discussions with Biologists identified 13 factors 8 factors associated with preparation of master mix 5 factors associated with configuration of PCR detection system (instrument) Discussions with Biologists identified 3 responses Lowest cycle time (range: 1 – 40) Least variability between replicates Valid amplification plot (range: 0 – 4) Completion of experiments and analysis immediately!
PCR Optimization Design
PCR Optimization Design Considerations Interactions not expected to exist Experiments performed in a 96 well plate Each plate can accommodate at most 15 master mix combinations 12 run PB deign for 8 factors The exponent 2 ^ (5-1) would not copy from word. I left two parentheses to mark the spot.
PCR Optimization Design Considerations Time constraints imply at most 16 plates (instrument settings) 25-1 fractional factorial for 5 factors (5 = 1234) Concern about using only 12 of 28 combinations Half of the plates use a 12 run PB design (123 = 45 = +1) Half of the plates use the foldover PB design (123 = 45 = 1)
Plackett-Burman Design Factors: 8 Replicates: 1 Design: 12 Runs: 12 Center pts (total): 0 Data Matrix (randomized) Run A B C D E F G H 1 - + + + - + + - 2 + + - + - - - + 3 + - + - - - + + 4 - + + - + - - - 5 + + - + + - + - 6 + - + + - + - - 7 - + - - - + + + 8 + - - - + + + - 9 - - + + + - + + 10 - - - - - - - - 11 + + + - + + - + 12 - - - + + + - +
Half Fraction Design Factors: 5 Base Design: 5, 16 Resolution: V Runs: 16 Replicates: 1 Fraction: 1/2 Blocks: none Center pts (total): 0 Design Generators: E = ABCD Row StdOrder RunOrder A B C D E 1 1 7 -1 -1 -1 -1 1 2 2 8 1 -1 -1 -1 -1 3 3 3 -1 1 -1 -1 -1 4 4 15 1 1 -1 -1 1 5 5 13 -1 -1 1 -1 -1 6 6 9 1 -1 1 -1 1 7 7 10 -1 1 1 -1 1 8 8 6 1 1 1 -1 -1 9 9 16 -1 -1 -1 1 -1 10 10 2 1 -1 -1 1 1 11 11 4 -1 1 -1 1 1 12 12 12 1 1 -1 1 -1 13 13 5 -1 -1 1 1 1 14 14 11 1 -1 1 1 -1 15 15 14 -1 1 1 1 -1 16 16 1 1 1 1 1 1
PCR Optimization Design Layout Each represents a 12 run PB design 16 × 12 = 192 observations
PCR Optimization Design Layout Master Mix 1 2 11 12 13 14 23 24 X Plate 15 16
PCR Optimization Design Layout Master Mix 1 2 11 12 13 14 23 24 X Plate 15 16 Whole plot structure in two dimensions
PCR Optimization Results Biologists provided this summary of the 21 runs with an amplification plot rating of 4
PCR Optimization Results plate Count mm Count mm1 Count mm2 Count mm3 Count mm4 Count 3 3 5 2 -1 11 -1 16 -1 6 -1 16 4 4 6 3 1 10 1 5 1 15 1 5 5 1 8 5 N= 21 N= 21 N= 21 N= 21 7 2 9 3 10 1 14 2 11 2 19 5 12 3 22 1 14 1 N= 21 15 3 16 1 N= 21 mm5 Count mm6 Count mm7 Count mm8 Count instr1 Count -1 7 -1 9 1 21 -1 14 -1 12 1 14 1 12 N= 21 1 7 1 9 N= 21 N= 21 N= 21 N= 21 instr2 Count instr3 Count instr4 Count instr5 Count -1 10 -1 8 -1 19 -1 13 1 11 1 13 1 2 1 8 N= 21 N= 21 N= 21 N= 21
PCR Optimization Analysis Log mm7 = 1; instr4 = –1
PCR Optimization Results plate Count mm Count mm1 Count mm2 Count mm3 Count mm4 Count 1 4 1 6 -1 31 -1 26 -1 47 -1 28 2 4 2 6 1 32 1 37 1 16 1 35 4 3 3 3 N= 63 N= 63 N= 63 N= 63 5 5 4 4 6 6 7 7 7 5 11 5 8 8 13 5 9 4 15 2 10 6 16 6 11 3 17 5 12 1 18 3 13 3 20 2 14 5 21 4 15 3 22 3 16 3 23 2 N= 63 N= 63 mm5 Count mm6 Count mm7 Count mm8 Count instr1 Count -1 30 -1 31 -1 41 -1 21 -1 34 1 33 1 32 1 22 1 42 1 29 N= 63 N= 63 N= 63 N= 63 N= 63 instr2 Count instr3 Count instr4 Count instr5 Count -1 31 -1 42 -1 26 -1 29 1 32 1 21 1 37 1 34 N= 63 N= 63 N= 63 N= 63
PCR Optimization Analysis Log mm7 = 1; instr4 = –1 mm3 = 1; mm7 = 1; mm8 = –1; instr3 = 1
PCR Optimization Results Fractional Factorial Fit: ctgm Estimated Effects and Coefficients for ctgm (coded units) Term Effect Coef SE Coef T P Constant 33.919 0.3852 88.06 0.000 instr1 -1.264 -0.632 0.3852 -1.64 0.103 instr2 0.596 0.298 0.3852 0.77 0.440 instr3 -2.157 -1.078 0.3852 -2.80 0.006 instr4 1.152 0.576 0.3852 1.50 0.137 instr5 0.667 0.333 0.3852 0.87 0.388 instr1*instr2 0.892 0.446 0.3852 1.16 0.249 instr1*instr3 0.424 0.212 0.3852 0.55 0.582 instr1*instr4 -0.221 -0.110 0.3852 -0.29 0.775 instr1*instr5 -0.276 -0.138 0.3852 -0.36 0.721 instr2*instr3 -1.110 -0.555 0.3852 -1.44 0.151 instr2*instr4 0.240 0.120 0.3852 0.31 0.756 instr2*instr5 1.522 0.761 0.3852 1.98 0.050 instr3*instr4 0.484 0.242 0.3852 0.63 0.531 instr3*instr5 0.182 0.091 0.3852 0.24 0.814 instr4*instr5 0.027 0.014 0.3852 0.04 0.972
PCR Optimization Results
PCR Optimization Results
PCR Optimization Results
PCR Optimization Results
PCR Optimization Analysis Log mm7 = 1; instr4 = -1 mm3 = 1; mm7 = 1; mm8 = -1; instr3 = 1 Instr3 = 1; instr2 and instr5 should have opposite signs?
PCR Optimization Results
PCR Optimization Results
PCR Optimization Results Estimated Effects and Coefficients for ctgm (coded units) Term Effect Coef SE Coef T P Constant 33.947 0.3206 105.90 0.000 mm1 -0.304 -0.152 0.3206 -0.47 0.636 mm2 0.699 0.350 0.3206 1.09 0.277 mm3 -4.070 -2.035 0.3206 -6.35 0.000 mm4 0.222 0.111 0.3206 0.35 0.730 mm5 -0.341 -0.171 0.3206 -0.53 0.595 mm6 -0.027 -0.013 0.3206 -0.04 0.967 mm7 -4.525 -2.263 0.3207 -7.06 0.000 mm8 2.061 1.030 0.3206 3.21 0.002
PCR Optimization Results
PCR Optimization Results
PCR Optimization Results
PCR Optimization Analysis Log mm7 = 1; instr4 = – 1 mm3 = 1; mm7 = 1; mm8 = –1; instr3 = 1 instr3 = 1; instr2 and instr5 should have opposite signs? mm3 = 1; mm7 = 1; mm8 = –1
PCR Optimization Results Row plate mm ct1 ct2 ct3 ct4 ctgm well1 well2 1 3 14 26.88 27.33 27.25 27.13 27.15 37.98 40 2 3 19 27.62 28.10 28.02 27.40 27.78 40.00 40 3 4 5 29.20 29.04 29.39 28.70 29.08 40.00 40 4 11 14 27.53 26.97 28.04 27.90 27.61 40.00 40 5 11 19 28.25 28.57 28.64 28.09 28.39 40.00 40 6 12 5 28.13 28.93 28.39 28.51 28.49 40.00 40 Row amprating mm1 mm2 mm3 mm4 mm5 mm6 mm7 mm8 instr1 instr2 1 4 1 1 1 -1 -1 -1 1 -1 1 1 2 4 -1 -1 1 -1 1 1 1 -1 1 1 3 4 -1 1 1 1 -1 1 1 -1 -1 -1 4 4 1 1 1 -1 -1 -1 1 -1 -1 1 5 4 -1 -1 1 -1 1 1 1 -1 -1 1 6 3 -1 1 1 1 -1 1 1 -1 1 -1 Row instr3 instr4 instr5 1 1 -1 -1 2 1 -1 -1 3 1 -1 -1 4 1 -1 1 5 1 -1 1 6 1 -1 1
PCR Optimization Results
PCR Optimization Results
PCR Optimization Summary No complex models – all simple analyses 5 factors were found to be significant (mm3, mm7, mm8, instr3 and instr4) These factors were further studied using response surface experiments Scientists seem quite happy with the results of the PCR optimization experiments
Concluding Remarks Many industrial experiments do have a split or strip plot structure which means multiple and possibly complex error measures Arises from the conduct of an experiment and/or any restrictions on the randomization We need to incorporate these considerations into a proper analysis and interpretation of experimental data
Concluding Remarks Experimental designs with balance, symmetry and orthogonality permit simple but effective graphical analyses (even with some missing data) Much can be learned from simple analyses following suitable experimental design All models are wrong, but some models are useful All models are wrong, but some models are more wrong than others