Design and Analysis of Augmented Designs in Screening Trials Kathleen Yeater USDA-ARS-SPA 3rd Curators Workshop February 3, 2010
5 Basic Steps of Experiment 1. Research Planning 2. Experimental Design 3. Summarize Observations 4. Analysis – Statistical Inference 5. Document / Present study results Why start with this? This is how the talk is constructed / organized.
Research Planning - What is the Question? Is the focus on development? Are you trying to find something better? Is it discovery research? Not necessarily a specific hypothesis This is why the phrase “screening” is in the Title.
Remember the Basics, the 3 R’s ? Replication Valid estimation of error variance Reduction of variation among plots Controls (reduces) error variance Use blocking to control heterogeneity present in experiment; Block at scale of variability Randomization Unbiased estimates of means and variances Minimum blocking –fewer blocks; blocking at scale of variability Cause and Effect
What if replication is ? Impractical, prohibitively expensive, impossible Not enough material (seed, -icides) Not enough space Not enough time Too many entries
What leads to unreplicated design in field trials? 3 R’s, making cause and effect statements In screening, making a cut based on good/bad in testing …What is the Research Question again? Design for the experiment under consideration; DO NOT experiment for design In principles of experimental design (3 Rs), trying to make cause and effect statements. Whereas in screening, we’re trying to make first cut diagnoses like in an Early Generation Variety Testing.
Augmented Design Introduced by various publications of W.T. Federer Developed for plant breeding research Genotypes Yield Disease Insecticides All are excellent subject Herbicides variables in a Screening Fertilizers Trial With Screening Designs – focus is on development – trying to find something better – it is discovery research We have identified our Research Question in the Research Planning phase. Now we move on to step two, which is to identify an appropriate design to implement the experiment.
Augmented Design as Experimental Design Utilizes experimental designs principles for arrangement of checks New treatments (n) are not replicated and checks are replicated as points of reference Usually want between 4-6 checks n can be large
Augmented Design - Implementation I – Select any experiment design for the check(s) II – Enlarge the blocks or increase number of rows and/or columns to accommodate the new test entries (treatments, n) III – New test entries are randomly distributed among blocks/rows/columns
Design Set-up - RCB A B C D moisture gradient Block effect now removes moisture effect, fair comparisons among treatments.
Design Set-up – Augmented RCB 14 19 C 12 D 23 5 20 13 8 C 18 A 4 24 D 16 B moisture gradient 1 9 22 D 11 C B 3 A 6 Now we’ve expanded the block out to include the unreplicated treatments. B 10 2 15 D A C 21 17 7
Advantages of Augmented Design More than one check included 4 to 6 optimal Allows for estimate of experimental error and is efficient Less physical space needed
How to select a check Checks are units of experimentation [varieties/genotypes/cultivars/entries] with known ranges of various measurement characteristics that you want to evaluate What are good checks for your objective? Yield CHO content Seed characteristics A quantitative measurement that holds constant
Variability of Checks The test entries (n) range of measurement will be 5.0 – 40.0. The mean of the overall test entries ~ 12.6. What do you think about these checks? Did I do a good job of selecting appropriate checks?
Consistent Checks that cover the range of our test data. The test entries (n) range of measurement will be 5.0 – 40.0. The mean of the overall test entries ~ 12.6.
Augmented RCBD Goal: Screen 300 new entries for response X. This is an Early Generation Screening Trial. 4 additional genotypes are CHECK entries (A, B, C, D) 6 Blocks (field plots, time placement for lab assay, location in growth chamber) Randomize location of A, B, C, D within each block (replicate the check genotypes within block for spatial variation) 300/6 = 50 new entries randomly selected and placed within each block Give each genotype a random number assignment from 1 to 300, then numbers 1-49 are used in Block 1, numbers 50-99 are in Block 2, etc.\ Discuss how any design is selectable here, you could have incomplete block, split plot/block are easily applied, the key is that the any experimental design can be augmented to accommodate a set of new treatments that are to be replicated once.
How many repeats of each check for optimal design? Design Resources Server Indian Agricultural Statistics Research Institute (ICAR), New Delhi, India. www.iasri.res.in/design Online Design Generation-I Augmented Design The reference by Parsad et al has not been peer-reviewed as far as my research tells me.
Construction of Augmented Designs via IASRI site Home (Augmented designs) Outline of Analysis Welcome to construction of Augmented Designs. Use this to generate augmented design. Fill in the number of test treatments, control treatments and number of blocks etc. Number of Test treatments (w): Number of Control Treatments (u) Number of Blocks (b) Number of replication of control Requires Javascript to be enabled; Runs in Microsoft(R) Internet Explorer(R) 5.0 and above and Mozilla(R) Firefox (R) 2.0 & above Kindly send us your comments, problems to V.K. Gupta / Rajender Parsad / A. Dhandapani. Home (Augmented designs) Design Resources Copyright Disclaimer How to Quote this page Report Error Comments/suggestions Inserted from <http://www.iasri.res.in/design/Augmented%20Designs/home.htm>
Cells filled out, Enter block sizes Augmented Designs Home (Augmented designs) Outline of Analysis Welcome to construction of Augmented Designs. Use this to generate augmented design. Fill in the number of test treatments, control treatments and number of blocks etc. Number of Test treatments (w): 300 Number of Control Treatments (u) 4 Number of Blocks (b) 6 Number of replication of control 2 Optimum Total Number of Experimental Units required: 348 To enter Block Sizes click here Block 1: 58 Block 2: 58 Block 3: 58 Block 4: 58 Block 5: 58 Block 6: 58 Total Number of Experimental Units = 348; Assigned so far = 348; Remaining = 0. Submit
Generated Design Generated Design Block 1: (T132, T1, T214, T159, T15, T55, T69, T197, C3, T31, T88, T124, T134, T245, T165, C1, T290, T163, T291, T101, T238, T298, T74, T282, C2, T35, C3, T135, T185, T181, T8, T72, T166, T217, T3, T260, T156, T270, T7, T220, T84, T207, T170, C4, T20, T240, C1, T51, T210, T138, T112, T68, T5, C2, C4, T108, T258, T118) Block 2: (T295, C4, T208, T152, T39, T91, T178, T219, T215, T52, T237, T82, T248, C1, T62, T133, T29, T53, T162, C3, T60, T66, T172, T198, T231, T271, T269, T183, T106, C2, T253, T80, T235, C4, T199, T257, C1, T232, T289, T125, T204, T78, T43, T119, T9, T97, T16, T115, C3, T102, T294, T192, T85, C2, T300, T143, T160, T223) Block 3: (T107, T184, T280, T42, T200, T131, T17, T21, T46, T276, T100, T169, T93, T180, C1, T267, T283, C1, T4, T239, T275, T120, T262, T236, T233, T193, T70, T277, T56, T168, T110, T105, T287, T38, T171, T27, T18, C2, T274, T136, T265, C3, T293, T281, C2, T175, T59, T79, T128, C4, T158, T146, C4, T252, T161, C3, T225, T89) Block 4: ( C2, T41, T234, T14, T10, T86, T206, T145, T230, T249, T196, T114, T209, T90, T205, C2, C3, T13, C3, T244, T142, T48, T76, C1, T2, C4, T261, T12, T36, T23, T109, T113, T255, T213, T191, T202, T218, T58, C1, T285, T96, T67, T188, T45, T150, T137, T164, T259, T221, T273, T73, T144, T226, T155, T40, T194, C4, T167) Block 5: (T266, T71, C2, T157, T179, T222, T123, C1, T104, T278, T111, T272, T22, T47, T148, T182, T212, T81, T195, T247, T216, T174, T25, C3, T130, T251, T203, T28, T228, T263, T297, C4, T246, T92, T64, T117, C1, T61, C4, T32, T19, T147, T288, T94, T99, T189, T34, T98, T243, C3, T57, T254, T126, T6, T122, C2, T242, T121) Block 6: (T54, T139, T151, T77, T49, C4, C2, T201, T284, T129, T241, T227, C2, T65, T95, T264, T11, T211, T268, T296, T75, T154, T141, T33, T149, C3, T173, T176, T286, T224, T250, T37, C1, T256, T87, T177, C3, T44, T116, T103, C4, T187, C1, T190, T299, T26, T140, T50, T127, T30, T279, T229, T63, T292, T24, T83, T153, T186)
RCB Model - Augmented Y = u + check + block + test entry + error checks are fixed effects (source of experimental error) block, test entry, and error are random effects Recall: Fixed effects = parameter estimation (mean and experimental error) Random effects = sources of variability Just like you can’t replicate a block – we’re also not replicating a treatment These are random effects – variance estimates Treatments are Random b/c 1) they represent a random selection of the population 2) Another time we might have a different sample, hence they are random Select appropriate model to account for variation present in data from experiment
Data Structure – Summarize Observations Data data-set; input BLOCK ENTRY $ CHECK $ Response ; datalines; 1 132 0 7.787375521 1 55 0 11.95042697 1 69 0 17.13925023 1 197 0 13.52741812 1 99_3 C 13.12996798 1 99_1 A 8.549609236 2 53 0 12.17290059 2 162 0 6.076716034 2 99_3 C 10.37331168 99_2 B 12.80038665 3 120 0 16.8998517 3 262 0 18.39342604 3 236 0 13.57836737 3 233 0 17.51976356 3 193 0 17.2673414 3 70 0 11.65930329 3 277 0 9.888766943 3 56 0 15.04252244 5 272 0 19.11184598 5 22 0 16.55621631 5 47 0 16.19072031 5 148 0 14.56102025 6 99_4 D 18.84718079 6 24 0 10.43401752 6 83 0 7.158976503 6 153 0 18.68505246 Switch Genotypes to Entry or Entries in previous slides We need to have dummy coding to estimate the fixed and random effects
Analysis of Augmented RCB proc mixed; class CHECK BLOCK ENTRY; model response = CHECK / solution; random BLOCK ENTRY / solution; lsmeans CHECK; run; Remember – Check has a label of 0 for the overall entries in the data
Covariance Parameters Covariance Parameter Estimates Cov Parm Estimate BLOCK 0.0646 variance component of block ENTRY 25.4391 variance component of entry Residual 2.8793 error variance Variance estimate corresponding with this response is greatest with the Entry – this is what you want, it shows the greatest variability
LSMEANS – Checks Least Squares Means Standard Effect CHECK Estimate Error CHECK 0 12.6521 0.3243 CHECK A 7.2446 5.0685 CHECK B 12.9712 5.0685 CHECK C 10.3447 5.0685 CHECK D 18.2209 5.0685 Can you all see that where the entries lie within the checks. Checks are more appropriate.
SOLUTION option in MODEL statement presents estimates of the fixed effect parameters Solution for Fixed Effects Standard Effect CHECK Estimate Error Intercept 18.2209 5.0685 CHECK 0 -5.5689 5.0768 CHECK A -10.9763 7.1665 CHECK B -5.2497 7.1665 CHECK C -7.8762 7.1665 CHECK D 0 . Intercept = grand mean
Estimated BLUPs Best Linear Unbiased Predictors Random effects – estimate the variance Estimate “realized values of random variables” (test entries) Augmented designs – use SOLUTION option in RANDOM statement random BLOCK ENTRY / solution;
Solution for Random Effects Std Err Effect ENTRY BLOCK Estimate Pred DF t Value Pr > |t| BLOCK 1 -0.1491 0.2289 39 -0.65 0.5186 BLOCK 2 0.02255 0.2289 39 0.10 0.9220 BLOCK 3 -0.01523 0.2289 39 -0.07 0.9473 ENTRY 1 0.4749 1.6420 39 0.29 0.7740 ENTRY 10 3.2731 1.6420 39 1.99 0.0533 ENTRY 100 -2.4803 1.6420 39 -1.51 0.1390 ENTRY 101 -2.9505 1.6420 39 -1.80 0.0801 ENTRY 102 0.9282 1.6420 39 0.57 0.5751 ENTRY 103 -5.2855 1.6420 39 -3.22 0.0026 ENTRY 104 11.2214 1.6420 39 6.83 <.0001 ENTRY 105 -1.4715 1.6420 39 -0.90 0.3757 ENTRY 106 -2.8277 1.6420 39 -1.72 0.0930 ENTRY 107 4.8133 1.6420 39 2.93 0.0056 ENTRY 108 -4.5646 1.6420 39 -2.78 0.0083 ENTRY 109 -4.5226 1.6420 39 -2.75 0.0089 ENTRY 11 9.7842 1.6420 39 5.96 <.0001 ENTRY 110 18.3420 1.6420 39 11.17 <.0001 ENTRY 111 1.6596 1.6420 39 1.01 0.3184
“Realized values” Rearrange estimates of entries from highest to lowest proc sort data=data-set; by DESCENDING estimate; run; Add Intercept to Estimate values – calculate predicted adjusted mean values data data-set; pred_adjmean = estimate + 18.2209;
Predicted Adjusted Mean Values StdErr pred_ Obs Effect BLOCK ENTRY Estimate Pred DF tValue Probt adjmean 1 ENTRY _ 97 26.4030 1.6420 39 16.08 <.0001 44.6239 2 ENTRY _ 250 23.1752 1.6420 39 14.11 <.0001 41.3961 3 ENTRY _ 110 18.3420 1.6420 39 11.17 <.0001 36.5629 4 ENTRY _ 203 18.3088 1.6420 39 11.15 <.0001 36.5297 5 ENTRY _ 16 16.0034 1.6420 39 9.75 <.0001 34.2243 6 ENTRY _ 19 13.6315 1.6420 39 8.30 <.0001 31.8524 7 ENTRY _ 104 11.2214 1.6420 39 6.83 <.0001 29.4423 8 ENTRY _ 11 9.7842 1.6420 39 5.96 <.0001 28.0051 9 ENTRY _ 98 9.6497 1.6420 39 5.88 <.0001 27.8706 10 ENTRY _ 115 9.3452 1.6420 39 5.69 <.0001 27.5661 11 ENTRY _ 64 9.1083 1.6420 39 5.55 <.0001 27.3292 12 ENTRY _ 288 9.0960 1.6420 39 5.54 <.0001 27.3169 13 ENTRY _ 57 7.8487 1.6420 39 4.78 <.0001 26.0696 14 ENTRY _ 275 7.8267 1.6420 39 4.77 <.0001 26.0476 15 ENTRY _ 147 7.8158 1.6420 39 4.76 <.0001 26.0367 16 ENTRY _ 139 7.8106 1.6420 39 4.76 <.0001 26.0315 17 ENTRY _ 285 7.6711 1.6420 39 4.67 <.0001 25.8920 18 ENTRY _ 201 7.5526 1.6420 39 4.60 <.0001 25.7735 19 ENTRY _ 114 7.1821 1.6420 39 4.37 <.0001 25.4030 20 ENTRY _ 286 7.1567 1.6420 39 4.36 <.0001 25.3776 Discuss how to ‘look’ at the data. So, if our focus is to find something ‘better’ i.e. ‘higher up the list’. The focus is not really on doing any multiple comparisons, there is no need. The ranking of the pred_adjusted means allows you to visualize which ‘treatments’ or worth pushing forward for further research. You need to be more willing to accept Type I or Type II errors, because they will probably happen. This is a Linear Model based on the checks. These numbers are a linear model prediction. The ranks and the ordering are the information that help you move forward, it is the potential of the test entry. The effect of the test entry is random, conclusion drawn pertain only to the response of the fixed effects (Checks). Conclusions about the levels at hand – Narrow Space Inference
Augmented Designs - Recap Screening – Discovery Driven Select 4-6 meaningful checks Select appropriate experimental design and increase rows and columns to include unreplicated test entries in each block RCB is simplest case, split-plots, factorials are also possibilities (can look at interactions and autocorrelations) Mixed model analyses Phase II begins – Select entries to do pilot study to elicit better estimate of true response; generate hypotheses
Augmented Designs - References To get started: Google! Federer et al (2001) Agron. J. 93:389-395 Federer, W.T. (2005) Agron. J. 97:578-586 Burgueño and Crossa (2000) SAS Macro for Analysing Unreplicated Designs http://www.cimmyt.org CIMMYT CRIL (crop research informatics laboratory) IASRI, Augmented Design tool http://www.iasri.res.in/design IRRISTAT