Design and Analysis of Augmented Designs in Screening Trials

Slides:

Advertisements

Similar presentations

Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,

Advertisements

Split-Plot Designs Usually used with factorial sets when the assignment of treatments at random can cause difficulties –large scale machinery required.

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.

Combined Analysis of Experiments Basic Research –Researcher makes hypothesis and conducts a single experiment to test it –The hypothesis is modified and.

Types of Checks in Variety Trials One could be a long term check that is unchanged from year to year –serves to monitor experimental conditions from year.

Data: Crab mating patterns Data: Typists (Poisson with random effects) (Poisson Regression, ZIP model, Negative Binomial) Data: Challenger (Binomial with.

Augmented Designs Mike Popelka & Jason Morales. What is an augmented design? A replicated check experiment augmented by unreplicated entries. Step 1:

Statistics in Science  Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10.

Multiple Comparisons in Factorial Experiments

SPH 247 Statistical Analysis of Laboratory Data 1April 2, 2013SPH 247 Statistical Analysis of Laboratory Data.

Topic 32: Two-Way Mixed Effects Model. Outline Two-way mixed models Three-way mixed models.

Latin Square Designs (§15.4)

INDIAN AGRICULTURAL STATISTICS RESEARCH INSTITUTE, NEW DELHI SPBD RELEASE 1.0: SPBD RELEASE 1.0: A STATISTICAL PACKAGE FOR BLOCK DESIGNS.

1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.

Chapter 4 Randomized Blocks, Latin Squares, and Related Designs

i) Two way ANOVA without replication

Weekend Workshop I PROC MIXED. Random or Fixed ?RANDOMFIXEDLevels: Selected at random from infinite population Finite number of possibilities Another.

Experimental Design, Response Surface Analysis, and Optimization

STATISTICAL PACKAGE FOR AUGMENTED DESIGNS

Comparison of Repeated Measures and Covariance Analysis for Pretest-Posttest Data -By Chunmei Zhou.

Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.

Design and Analysis of Experiments Dr. Tai-Yue Wang Department of Industrial and Information Management National Cheng Kung University Tainan, TAIWAN,

1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.

Statistics: The Science of Learning from Data Data Collection Data Analysis Interpretation Prediction  Take Action W.E. Deming “The value of statistics.

Agronomy Trials Usually interested in the factors of production: –When to plant? –What seeding rate? –Fertilizer? What kind? –Irrigation? When? How much?

Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.

Biostatistics-Lecture 9 Experimental designs Ruibin Xi Peking University School of Mathematical Sciences.

Så används statistiska metoder i jordbruksförsök Svenska statistikfrämjandets vårkonferens den 23 mars 2012 i Alnarp Johannes Forkman, Fältforsk, SLU.

Experimental Design in Agriculture CROP 590, Winter, 2015

Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.

5-1 Introduction 5-2 Inference on the Means of Two Populations, Variances Known Assumptions.

Control of Experimental Error Accuracy = without bias  average is on the bull’s-eye  achieved through randomization Precision = repeatability  measurements.

Module 7: Estimating Genetic Variances – Why estimate genetic variances? – Single factor mating designs PBG 650 Advanced Plant Breeding.

1 Experimental Statistics - week 7 Chapter 15: Factorial Models (15.5) Chapter 17: Random Effects Models.

1 Experimental Statistics - week 6 Chapter 15: Randomized Complete Block Design (15.3) Factorial Models (15.5)

 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.

Fixed vs. Random Effects Fixed effect –we are interested in the effects of the treatments (or blocks) per se –if the experiment were repeated, the levels.

Experimental Design An Experimental Design is a plan for the assignment of the treatments to the plots in the experiment Designs differ primarily in the.

23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.

1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 6 Solving Normal Equations and Estimating Estimable Model Parameters.

The Scientific Method Formulation of an H ypothesis P lanning an experiment to objectively test the hypothesis Careful observation and collection of D.

DESIGN AND ANALYSIS OF EXPERIMENTS: Basics Hairul Hafiz Mahsol Institute for Tropical Biology & Conservation School of Science & Technology POSTGRADUATE.

Linear correlation and linear regression + summary of tests

The Completely Randomized Design (§8.3)

BUSI 6480 Lecture 8 Repeated Measures.

PROCESS MODELLING AND MODEL ANALYSIS © CAPE Centre, The University of Queensland Hungarian Academy of Sciences Statistical Model Calibration and Validation.

Control of Experimental Error Blocking - –A block is a group of homogeneous experimental units –Maximize the variation among blocks in order to minimize.

Topic 30: Random Effects. Outline One-way random effects model –Data –Model –Inference.

1 Always be contented, be grateful, be understanding and be compassionate.

Topic 26: Analysis of Covariance. Outline One-way analysis of covariance –Data –Model –Inference –Diagnostics and rememdies Multifactor analysis of covariance.

Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.

Three or More Factors: Latin Squares

Chapter 5 Multilevel Models

IE241: Introduction to Design of Experiments. Last term we talked about testing the difference between two independent means. For means from a normal.

The Mixed Effects Model - Introduction In many situations, one of the factors of interest will have its levels chosen because they are of specific interest.

1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 9 Review.

Jump to first page Bayesian Approach FOR MIXED MODEL Bioep740 Final Paper Presentation By Qiang Ling.

F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.

I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)

Generalized Linear Models

Topics Randomized complete block design (RCBD) Latin square designs

Chapter 5 Introduction to Factorial Designs

IE-432 Design Of Industrial Experiments

Latin Square Designs KNNL – Sections

Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,

Experimental Design All experiments consist of two basic structures:

Experimental Statistics - week 8

Presentation transcript:

Design and Analysis of Augmented Designs in Screening Trials Kathleen Yeater USDA-ARS-SPA 3rd Curators Workshop February 3, 2010

5 Basic Steps of Experiment 1. Research Planning 2. Experimental Design 3. Summarize Observations 4. Analysis – Statistical Inference 5. Document / Present study results Why start with this? This is how the talk is constructed / organized.

Research Planning - What is the Question? Is the focus on development? Are you trying to find something better? Is it discovery research? Not necessarily a specific hypothesis This is why the phrase “screening” is in the Title.

Remember the Basics, the 3 R’s ? Replication Valid estimation of error variance Reduction of variation among plots Controls (reduces) error variance Use blocking to control heterogeneity present in experiment; Block at scale of variability Randomization Unbiased estimates of means and variances Minimum blocking –fewer blocks; blocking at scale of variability Cause and Effect

What if replication is ? Impractical, prohibitively expensive, impossible Not enough material (seed, -icides) Not enough space Not enough time Too many entries

What leads to unreplicated design in field trials? 3 R’s, making cause and effect statements In screening, making a cut based on good/bad in testing …What is the Research Question again? Design for the experiment under consideration; DO NOT experiment for design In principles of experimental design (3 Rs), trying to make cause and effect statements. Whereas in screening, we’re trying to make first cut diagnoses like in an Early Generation Variety Testing.

Augmented Design Introduced by various publications of W.T. Federer Developed for plant breeding research Genotypes Yield Disease Insecticides All are excellent subject Herbicides variables in a Screening Fertilizers Trial With Screening Designs – focus is on development – trying to find something better – it is discovery research We have identified our Research Question in the Research Planning phase. Now we move on to step two, which is to identify an appropriate design to implement the experiment.

Augmented Design as Experimental Design Utilizes experimental designs principles for arrangement of checks New treatments (n) are not replicated and checks are replicated as points of reference Usually want between 4-6 checks n can be large

Augmented Design - Implementation I – Select any experiment design for the check(s) II – Enlarge the blocks or increase number of rows and/or columns to accommodate the new test entries (treatments, n) III – New test entries are randomly distributed among blocks/rows/columns

Design Set-up - RCB A B C D moisture gradient Block effect now removes moisture effect, fair comparisons among treatments.

Design Set-up – Augmented RCB 14 19 C 12 D 23 5 20 13 8 C 18 A 4 24 D 16 B moisture gradient 1 9 22 D 11 C B 3 A 6 Now we’ve expanded the block out to include the unreplicated treatments. B 10 2 15 D A C 21 17 7

Advantages of Augmented Design More than one check included 4 to 6 optimal Allows for estimate of experimental error and is efficient Less physical space needed

How to select a check Checks are units of experimentation [varieties/genotypes/cultivars/entries] with known ranges of various measurement characteristics that you want to evaluate What are good checks for your objective? Yield CHO content Seed characteristics A quantitative measurement that holds constant

Variability of Checks The test entries (n) range of measurement will be 5.0 – 40.0. The mean of the overall test entries ~ 12.6. What do you think about these checks? Did I do a good job of selecting appropriate checks?

Consistent Checks that cover the range of our test data. The test entries (n) range of measurement will be 5.0 – 40.0. The mean of the overall test entries ~ 12.6.

Augmented RCBD Goal: Screen 300 new entries for response X. This is an Early Generation Screening Trial. 4 additional genotypes are CHECK entries (A, B, C, D) 6 Blocks (field plots, time placement for lab assay, location in growth chamber) Randomize location of A, B, C, D within each block (replicate the check genotypes within block for spatial variation) 300/6 = 50 new entries randomly selected and placed within each block Give each genotype a random number assignment from 1 to 300, then numbers 1-49 are used in Block 1, numbers 50-99 are in Block 2, etc.\ Discuss how any design is selectable here, you could have incomplete block, split plot/block are easily applied, the key is that the any experimental design can be augmented to accommodate a set of new treatments that are to be replicated once.

How many repeats of each check for optimal design? Design Resources Server Indian Agricultural Statistics Research Institute (ICAR), New Delhi, India. www.iasri.res.in/design Online Design Generation-I Augmented Design The reference by Parsad et al has not been peer-reviewed as far as my research tells me.

Construction of Augmented Designs via IASRI site Home (Augmented designs) Outline of Analysis Welcome to construction of Augmented Designs. Use this to generate augmented design. Fill in the number of test treatments, control treatments and number of blocks etc. Number of Test treatments (w): Number of Control Treatments (u) Number of Blocks (b) Number of replication of control Requires Javascript to be enabled; Runs in Microsoft(R) Internet Explorer(R) 5.0 and above and Mozilla(R) Firefox (R) 2.0 & above Kindly send us your comments, problems to V.K. Gupta / Rajender Parsad / A. Dhandapani. Home (Augmented designs) Design Resources Copyright Disclaimer How to Quote this page Report Error Comments/suggestions Inserted from <http://www.iasri.res.in/design/Augmented%20Designs/home.htm>

Cells filled out, Enter block sizes Augmented Designs Home (Augmented designs) Outline of Analysis Welcome to construction of Augmented Designs. Use this to generate augmented design. Fill in the number of test treatments, control treatments and number of blocks etc. Number of Test treatments (w): 300 Number of Control Treatments (u) 4 Number of Blocks (b) 6 Number of replication of control 2 Optimum Total Number of Experimental Units required: 348 To enter Block Sizes click here Block 1: 58 Block 2: 58 Block 3: 58 Block 4: 58 Block 5: 58 Block 6: 58 Total Number of Experimental Units = 348; Assigned so far = 348; Remaining = 0. Submit

Generated Design Generated Design Block 1: (T132, T1, T214, T159, T15, T55, T69, T197, C3, T31, T88, T124, T134, T245, T165, C1, T290, T163, T291, T101, T238, T298, T74, T282, C2, T35, C3, T135, T185, T181, T8, T72, T166, T217, T3, T260, T156, T270, T7, T220, T84, T207, T170, C4, T20, T240, C1, T51, T210, T138, T112, T68, T5, C2, C4, T108, T258, T118) Block 2: (T295, C4, T208, T152, T39, T91, T178, T219, T215, T52, T237, T82, T248, C1, T62, T133, T29, T53, T162, C3, T60, T66, T172, T198, T231, T271, T269, T183, T106, C2, T253, T80, T235, C4, T199, T257, C1, T232, T289, T125, T204, T78, T43, T119, T9, T97, T16, T115, C3, T102, T294, T192, T85, C2, T300, T143, T160, T223) Block 3: (T107, T184, T280, T42, T200, T131, T17, T21, T46, T276, T100, T169, T93, T180, C1, T267, T283, C1, T4, T239, T275, T120, T262, T236, T233, T193, T70, T277, T56, T168, T110, T105, T287, T38, T171, T27, T18, C2, T274, T136, T265, C3, T293, T281, C2, T175, T59, T79, T128, C4, T158, T146, C4, T252, T161, C3, T225, T89) Block 4: ( C2, T41, T234, T14, T10, T86, T206, T145, T230, T249, T196, T114, T209, T90, T205, C2, C3, T13, C3, T244, T142, T48, T76, C1, T2, C4, T261, T12, T36, T23, T109, T113, T255, T213, T191, T202, T218, T58, C1, T285, T96, T67, T188, T45, T150, T137, T164, T259, T221, T273, T73, T144, T226, T155, T40, T194, C4, T167) Block 5: (T266, T71, C2, T157, T179, T222, T123, C1, T104, T278, T111, T272, T22, T47, T148, T182, T212, T81, T195, T247, T216, T174, T25, C3, T130, T251, T203, T28, T228, T263, T297, C4, T246, T92, T64, T117, C1, T61, C4, T32, T19, T147, T288, T94, T99, T189, T34, T98, T243, C3, T57, T254, T126, T6, T122, C2, T242, T121) Block 6: (T54, T139, T151, T77, T49, C4, C2, T201, T284, T129, T241, T227, C2, T65, T95, T264, T11, T211, T268, T296, T75, T154, T141, T33, T149, C3, T173, T176, T286, T224, T250, T37, C1, T256, T87, T177, C3, T44, T116, T103, C4, T187, C1, T190, T299, T26, T140, T50, T127, T30, T279, T229, T63, T292, T24, T83, T153, T186)

RCB Model - Augmented Y = u + check + block + test entry + error checks are fixed effects (source of experimental error) block, test entry, and error are random effects Recall: Fixed effects = parameter estimation (mean and experimental error) Random effects = sources of variability Just like you can’t replicate a block – we’re also not replicating a treatment These are random effects – variance estimates Treatments are Random b/c 1) they represent a random selection of the population 2) Another time we might have a different sample, hence they are random Select appropriate model to account for variation present in data from experiment

Data Structure – Summarize Observations Data data-set; input BLOCK ENTRY $ CHECK $ Response ; datalines; 1 132 0 7.787375521 1 55 0 11.95042697 1 69 0 17.13925023 1 197 0 13.52741812 1 99_3 C 13.12996798 1 99_1 A 8.549609236 2 53 0 12.17290059 2 162 0 6.076716034 2 99_3 C 10.37331168 99_2 B 12.80038665 3 120 0 16.8998517 3 262 0 18.39342604 3 236 0 13.57836737 3 233 0 17.51976356 3 193 0 17.2673414 3 70 0 11.65930329 3 277 0 9.888766943 3 56 0 15.04252244 5 272 0 19.11184598 5 22 0 16.55621631 5 47 0 16.19072031 5 148 0 14.56102025 6 99_4 D 18.84718079 6 24 0 10.43401752 6 83 0 7.158976503 6 153 0 18.68505246 Switch Genotypes to Entry or Entries in previous slides We need to have dummy coding to estimate the fixed and random effects

Analysis of Augmented RCB proc mixed; class CHECK BLOCK ENTRY; model response = CHECK / solution; random BLOCK ENTRY / solution; lsmeans CHECK; run; Remember – Check has a label of 0 for the overall entries in the data

Covariance Parameters Covariance Parameter Estimates Cov Parm Estimate BLOCK 0.0646 variance component of block ENTRY 25.4391 variance component of entry Residual 2.8793 error variance Variance estimate corresponding with this response is greatest with the Entry – this is what you want, it shows the greatest variability

LSMEANS – Checks Least Squares Means Standard Effect CHECK Estimate Error CHECK 0 12.6521 0.3243 CHECK A 7.2446 5.0685 CHECK B 12.9712 5.0685 CHECK C 10.3447 5.0685 CHECK D 18.2209 5.0685 Can you all see that where the entries lie within the checks. Checks are more appropriate.

SOLUTION option in MODEL statement presents estimates of the fixed effect parameters Solution for Fixed Effects Standard Effect CHECK Estimate Error Intercept 18.2209 5.0685 CHECK 0 -5.5689 5.0768 CHECK A -10.9763 7.1665 CHECK B -5.2497 7.1665 CHECK C -7.8762 7.1665 CHECK D 0 . Intercept = grand mean

Estimated BLUPs Best Linear Unbiased Predictors Random effects – estimate the variance Estimate “realized values of random variables” (test entries) Augmented designs – use SOLUTION option in RANDOM statement random BLOCK ENTRY / solution;

Solution for Random Effects Std Err Effect ENTRY BLOCK Estimate Pred DF t Value Pr > |t| BLOCK 1 -0.1491 0.2289 39 -0.65 0.5186 BLOCK 2 0.02255 0.2289 39 0.10 0.9220 BLOCK 3 -0.01523 0.2289 39 -0.07 0.9473 ENTRY 1 0.4749 1.6420 39 0.29 0.7740 ENTRY 10 3.2731 1.6420 39 1.99 0.0533 ENTRY 100 -2.4803 1.6420 39 -1.51 0.1390 ENTRY 101 -2.9505 1.6420 39 -1.80 0.0801 ENTRY 102 0.9282 1.6420 39 0.57 0.5751 ENTRY 103 -5.2855 1.6420 39 -3.22 0.0026 ENTRY 104 11.2214 1.6420 39 6.83 <.0001 ENTRY 105 -1.4715 1.6420 39 -0.90 0.3757 ENTRY 106 -2.8277 1.6420 39 -1.72 0.0930 ENTRY 107 4.8133 1.6420 39 2.93 0.0056 ENTRY 108 -4.5646 1.6420 39 -2.78 0.0083 ENTRY 109 -4.5226 1.6420 39 -2.75 0.0089 ENTRY 11 9.7842 1.6420 39 5.96 <.0001 ENTRY 110 18.3420 1.6420 39 11.17 <.0001 ENTRY 111 1.6596 1.6420 39 1.01 0.3184

“Realized values” Rearrange estimates of entries from highest to lowest proc sort data=data-set; by DESCENDING estimate; run; Add Intercept to Estimate values – calculate predicted adjusted mean values data data-set; pred_adjmean = estimate + 18.2209;

Predicted Adjusted Mean Values StdErr pred_ Obs Effect BLOCK ENTRY Estimate Pred DF tValue Probt adjmean 1 ENTRY _ 97 26.4030 1.6420 39 16.08 <.0001 44.6239 2 ENTRY _ 250 23.1752 1.6420 39 14.11 <.0001 41.3961 3 ENTRY _ 110 18.3420 1.6420 39 11.17 <.0001 36.5629 4 ENTRY _ 203 18.3088 1.6420 39 11.15 <.0001 36.5297 5 ENTRY _ 16 16.0034 1.6420 39 9.75 <.0001 34.2243 6 ENTRY _ 19 13.6315 1.6420 39 8.30 <.0001 31.8524 7 ENTRY _ 104 11.2214 1.6420 39 6.83 <.0001 29.4423 8 ENTRY _ 11 9.7842 1.6420 39 5.96 <.0001 28.0051 9 ENTRY _ 98 9.6497 1.6420 39 5.88 <.0001 27.8706 10 ENTRY _ 115 9.3452 1.6420 39 5.69 <.0001 27.5661 11 ENTRY _ 64 9.1083 1.6420 39 5.55 <.0001 27.3292 12 ENTRY _ 288 9.0960 1.6420 39 5.54 <.0001 27.3169 13 ENTRY _ 57 7.8487 1.6420 39 4.78 <.0001 26.0696 14 ENTRY _ 275 7.8267 1.6420 39 4.77 <.0001 26.0476 15 ENTRY _ 147 7.8158 1.6420 39 4.76 <.0001 26.0367 16 ENTRY _ 139 7.8106 1.6420 39 4.76 <.0001 26.0315 17 ENTRY _ 285 7.6711 1.6420 39 4.67 <.0001 25.8920 18 ENTRY _ 201 7.5526 1.6420 39 4.60 <.0001 25.7735 19 ENTRY _ 114 7.1821 1.6420 39 4.37 <.0001 25.4030 20 ENTRY _ 286 7.1567 1.6420 39 4.36 <.0001 25.3776 Discuss how to ‘look’ at the data. So, if our focus is to find something ‘better’ i.e. ‘higher up the list’. The focus is not really on doing any multiple comparisons, there is no need. The ranking of the pred_adjusted means allows you to visualize which ‘treatments’ or worth pushing forward for further research. You need to be more willing to accept Type I or Type II errors, because they will probably happen. This is a Linear Model based on the checks. These numbers are a linear model prediction. The ranks and the ordering are the information that help you move forward, it is the potential of the test entry. The effect of the test entry is random, conclusion drawn pertain only to the response of the fixed effects (Checks). Conclusions about the levels at hand – Narrow Space Inference

Augmented Designs - Recap Screening – Discovery Driven Select 4-6 meaningful checks Select appropriate experimental design and increase rows and columns to include unreplicated test entries in each block RCB is simplest case, split-plots, factorials are also possibilities (can look at interactions and autocorrelations) Mixed model analyses Phase II begins – Select entries to do pilot study to elicit better estimate of true response; generate hypotheses

Augmented Designs - References To get started: Google! Federer et al (2001) Agron. J. 93:389-395 Federer, W.T. (2005) Agron. J. 97:578-586 Burgueño and Crossa (2000) SAS Macro for Analysing Unreplicated Designs http://www.cimmyt.org CIMMYT CRIL (crop research informatics laboratory) IASRI, Augmented Design tool http://www.iasri.res.in/design IRRISTAT