Mimicking anova in reml mixed-modelling of comparative experiments using the R-package asremlPlus Chris Brien Australian Centre for Plant Functional Genomics,

Slides:



Advertisements
Similar presentations
Analysis of Variance (ANOVA)
Advertisements

Factor-allocation in gene- expression microarray experiments Chris Brien Phenomics and Bioinformatics Research Centre University of South Australia
GENERAL LINEAR MODELS: Estimation algorithms
ANOVA and Linear Models. Data Data is from the University of York project on variation in British liquids. Data is from the University of York project.
Hypothesis Testing Steps in Hypothesis Testing:
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Objectives (BPS chapter 24)
Chapter 5 Introduction to Factorial Designs
Part I – MULTIVARIATE ANALYSIS
Chapter 3 Experiments with a Single Factor: The Analysis of Variance
1 Chapter 5 Introduction to Factorial Designs Basic Definitions and Principles Study the effects of two or more factors. Factorial designs Crossed:
Chapter 11 Multiple Regression.
8. ANALYSIS OF VARIANCE 8.1 Elements of a Designed Experiment
13-1 Designing Engineering Experiments Every experiment involves a sequence of activities: Conjecture – the original hypothesis that motivates the.
Linear Regression/Correlation
13 Design and Analysis of Single-Factor Experiments:
Biostatistics-Lecture 9 Experimental designs Ruibin Xi Peking University School of Mathematical Sciences.
Introduction to Multilevel Modeling Using SPSS
Statistical Modelling Chapter X 1 X.Sample size and power X.AHow it is done X.BPower X.CComputing the required sample size for the CRD and RCBD with a.
Simple Linear Regression Models
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Design of Engineering Experiments Part 4 – Introduction to Factorials
Identifying the Split-plot and Constructing an Analysis George A. Milliken Department of Statistics Kansas State University
So far... We have been estimating differences caused by application of various treatments, and determining the probability that an observed difference.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
ERCIM May 2001 Analysis of variance, general balance and large data sets Roger Payne Statistics Department, IACR-Rothamsted, Harpenden, Herts AL5 2JQ.
Exercise 1 You have a clinical study in which 10 patients will either get the standard treatment or a new treatment Randomize which 5 of the 10 get the.
1 1 Slide Simple Linear Regression Estimation and Residuals Chapter 14 BA 303 – Spring 2011.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Two-Way (Independent) ANOVA. PSYC 6130A, PROF. J. ELDER 2 Two-Way ANOVA “Two-Way” means groups are defined by 2 independent variables. These IVs are typically.
1 Design of Engineering Experiments – The 2 k Factorial Design Text reference, Chapter 6 Special case of the general factorial design; k factors, all at.
Model adequacy checking in the ANOVA Checking assumptions is important –Normality –Constant variance –Independence –Have we fit the right model? Later.
Designs for Experiments with More Than One Factor When the experimenter is interested in the effect of multiple factors on a response a factorial design.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture #10 Testing the Statistical Significance of Factor Effects.
BPS - 5th Ed. Chapter 231 Inference for Regression.
ANOVA: Analysis of Variation
23. Inference for regression
Missing data: Why you should care about it and what to do about it
ANOVA: Analysis of Variation
Confidence Intervals.
Statistical Data Analysis - Lecture /04/03
Effect Sizes (continued)
Maximising the Value of Time Series Data:
Two-way ANOVA with significant interactions
Factorial Experiments
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
CHAPTER 13 Design and Analysis of Single-Factor Experiments:
Statistical Data Analysis - Lecture /04/03
Linear Mixed Models in JMP Pro
Comparing Three or More Means
2nd Level Analysis Methods for Dummies 2010/11 - 2nd Feb 2011
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
12 Inferential Analysis.
Chapter 5 Introduction to Factorial Designs
1-Way ANOVA with Numeric Factor – Dose-Response
The Practice of Statistics in the Life Sciences Fourth Edition
Stats Club Marnie Brennan
BA 275 Quantitative Business Methods
CHAPTER 29: Multiple Regression*
Linear Regression/Correlation
Chapter 11: The ANalysis Of Variance (ANOVA)
12 Inferential Analysis.
Comparing Means.
Basic Practice of Statistics - 3rd Edition Inference for Regression
The Analysis of Variance
One-Factor Experiments
Essentials of Statistics for Business and Economics (8e)
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Mimicking anova in reml mixed-modelling of comparative experiments using the R-package asremlPlus Chris Brien Australian Centre for Plant Functional Genomics, University of Adelaide. School of Information Technology and Mathematical Sciences, University of South Australia

Outline A three-factor generalized RCBD on ladybirds. Principles that grew alongside anova. Some features of asremlPlus. Using asremlPlus for reml mixed modelling. Conclusions.

1. Three-factor generalized randomized complete-block experiment on ladybirds (Welham et al. (2015) Example 8.2) Will ladybirds transfer fungus to aphids on plants? 2 runs of 36 containers with a plant and aphids. 12 treatments: Host plant (beans, trefoil), infected Cadavers (5, 10, 20), Ladybird (-, +). 2 Host 3 Cadavers 2 Ladybird 12 treatments 2 Run 36 Plant in R 72 plants Analyse the logit of the proportion of live aphids that were infected.

The anova (dominant analysis method of 20th Century) Ladybird.aov <- aov(logitP ~ Host*Cadavers*Ladybird + Error(Run/Plant), data=Ladybird.dat) summary(Ladybird.aov) Error: Run Df Sum Sq Mean Sq F value Pr(>F) Residuals 1 0.06766 0.06766 Error: Run:Plant Df Sum Sq Mean Sq F value Pr(>F) Host 1 13.599 13.599 59.172 1.82e-10 Cadavers 2 17.027 8.514 37.044 3.78e-11 Ladybird 1 11.091 11.091 48.257 3.33e-09 Host:Cadavers 2 0.308 0.154 0.670 0.5158 Host:Ladybird 1 0.228 0.228 0.992 0.3234 Cadavers:Ladybird 2 1.735 0.867 3.774 0.0287 Host:Cadavers:Ladybird 2 0.200 0.100 0.435 0.6493 Residuals 59 13.560 0.230 Analysis achieved with a single fit. However, could do a second analysis removing the ns terms that would pool these terms with the Residual. The Run:Plant Residual MSq (0.230) in this anova is a pure estimate of error, being derived from replicates only. The estimate of the Run variance component is negative: (0.06766 – 0.230) / 36. Only significant interaction is C:L & Host is significant. Tables of means for C:L and Host examined.

2. Principles that grew alongside anova Include all unit (recipient) terms as random terms; hypothesis tests for them are not valid. Justified by the randomization argument for a valid estimate of error and that there is no randomization test available. Dropping terms leads to problematic pooling of terms (Janky, 2000). Test treatment (allocated) terms by testing their sources, but never remove them from the analysis. Ensures a pure estimate of error, in case a Type II error occurs. Use hypothesis tests to select a marginality-compliant model (Nelder, 1977) and base conclusions on corresponding tables of means. If A#B is significant, there is no need to test the marginal terms (Grand mean), A and B: the fitted model is E[Y] = A:B (ab parameters). If A#B is not significant (ns), E[Y] = A + B and conclusions should be based on A and B means (increased precision because more obsn per mean). A:B standards for a separate parameter for each combination of the levels of A and B

Difficulties with the principles in mixed modelling Focus on II and III Include all unit (recipient) terms as random terms ; hypothesis tests for them are not valid. Because of constraints imposed, random unit variance components that are negative are, in effect, omitted because they are estimated as zero. Test treatment (allocated) terms by testing their sources, but never remove them from the analysis. From a modelling perspective, it is natural to remove nonsignificant terms. Use hypothesis tests to select a marginality-compliant model (Nelder, 1977) and base conclusions on corresponding tables of means. Researchers want predictions for all factor combinations. To get these when an additive model is selected, must ns interaction terms be omitted? When E[Y] = A + B (A#B is ns), how to get estimates for all combinations A & B without removing A#B?

3. Some features of asremlPlus Enhances asreml. Creates an asrtests object that contains (i) the fitted asreml object, (ii) the wald.tab for the fixed terms and (iii) a test.summary data.frame. Functions: setvarianceterms allows one to easily change the bound placed on random terms in the model. chooseModel selects the significant terms in model, given the marginality relations between the terms. predictPlus enhances predict from asreml, including linear.transformation of the predictions and error.intervals of various types (se, CI, 0.5LSD). plotPredictions uses ggplot2 to plot predictions.

4. Using asremlPlus for reml mixed modelling m <- asreml(logitP ~ Host*Cadavers*Ladybird, random = ~ Run, residual = ~ Run:Plant, data = Ladybird.dat) Warning in asreml(logitP ~ Host * Cadavers * Ladybird, random = ~Run, residual = ~Run:Plant, : Some components changed by more than 1% on the last iteration. summary(m)$varcomp #shows bound Run component component std.error z.ratio bound %ch Run 2.298309e-08 0.01638903 1.402346e-06 B 93.7 Run:Plant!R 2.271216e-01 0.04156985 5.463612e+00 P 0.0 The Run component was set to allow negative estimates.

The fixed effects analysis Set up an asrtests object and print out wald.tab (with denominator df calculated using Kenward-Rogers). current.asrt <- asrtests(m) Calculating denominator DF current.asrt$wald.tab Wald tests for fixed effects. Response: logitP Df denDF F.inc Pr (Intercept) 1 1 1550.00 0.01617 Host 1 59 59.17 0.00000 Cadavers 2 59 37.04 0.00000 Ladybird 1 59 48.26 0.00000 Host:Cadavers 2 59 0.67 0.51581 Host:Ladybird 1 59 0.99 0.32342 Cadavers:Ladybird 2 59 3.77 0.02868 Host:Cadavers:Ladybird 2 59 0.44 0.64932

Use chooseModel to select a marginality-compliant fixed model Based on the marginality relationships between terms. One term is marginal to another if the column space of the incidence matrix for the first term is a subspace of that of the marginal term.

The marginality matrix Ladybird.pstr <- pstructure(~ Host*Cadavers*Ladybird, data = Ladybird.dat) HCL.marg <- marginality(Ladybird.pstr) print(HCL.marg) Host Cadavers Host:Cadavers Ladybird Host:Ladybird Cadavers:Ladybird Host 1 0 1 0 1 0 Cadavers 0 1 1 0 0 1 Host:Cadavers 0 0 1 0 0 0 Ladybird 0 0 0 1 1 1 Host:Ladybird 0 0 0 0 1 0 Cadavers:Ladybird 0 0 0 0 0 1 Host:Cadavers:Ladybird 0 0 0 0 0 0 Host:Cadavers:Ladybird Host 1 Cadavers 1 Host:Cadavers 1 Ladybird 1 Host:Ladybird 1 Cadavers:Ladybird 1 Host:Cadavers:Ladybird 1 pstructure is a dae function for forming the projection matrices for a formula, a side effect of which is to produce the marginality matrix.

Choose the model sigmod <- chooseModel(current.asrt, terms.marginality = HCL.marg) Calculating denominator DF Calculating denominator DF Calculating denominator DF Calculating denominator DF Calculating denominator DF sigmod is a list with two components: asrtests.obj and sig.terms. current.asrt <- sigmod$asrtests.obj print(current.asrt$test.summary) terms DF denDF p action 1 Host:Cadavers:Ladybird 2 59 6.493237e-01 Nonsignificant 2 Cadavers:Ladybird 2 59 2.868479e-02 Significant 3 Host:Ladybird 1 59 3.234182e-01 Nonsignificant 4 Host:Cadavers 2 59 5.158105e-01 Nonsignificant 5 Host 1 59 1.816387e-10 Significant print(sigmod$sig.terms) [[1]] [1] "Cadavers:Ladybird" [[2]] [1] "Host"

Use predictPlus to obtain predictions Will obtain predictions for Host:Cadavers:Ladybird that conforms to the selected model. Function predictPlus has a linear.transformation argument that can be a formula or a matrix of contrasts. For a formula, the projection of the predicted values onto the formula subspace is produced. diffs <- predictPlus(current.asrt$asreml.obj, classify = "Host:Cadavers:Ladybird", linear.transformation = ~Cadavers:Ladybird + Host, wald.tab = current.asrt$wald.tab, error.intervals = "halfLeast", meanLSD.type = "factor.combination", LSDby = "Host", tables = "predictions") LSD varies with Host comparison, so use within Host LSD.

An alldiffs.obj from predictPlus A list consisting of the following components: predictions: the predictions, their standard errors and error intervals; vcov: the variance matrix of the predictions; differences: all pairwise differences between the predictions, p.differences: p-values for all pairwise differences between the predictions; sed: the standard errors of all pairwise differences between the predictions; LSD: the mean, minimum and maximum LSDs, possibly by factors. All components (except LSD) in standard order and labelled for the classify: "Host:Cadavers:Ladybird"

Predictions Predictions for all combinations of the levels of Host, Ladybird and Cadavers. But, same pattern in predictions for both Hosts within each Ladybird level (row of the plot). Error bars are ± 0.5 LSD (5%) (Snee, 1981): When overlap within a Host, prediction differences are ns; LSD = 0.392; based on 12 values. If drop ns terms, same predictions, but s2 = 0.223 and LSD = 0.386. For a mean based on 6 values, LSD = 0.554. LSD (5%) = 2.000 sqrt(2 * 0.230 /6) = 0.554

Heat map of p-values for pairwise differences Based on exact t-tests, so equivalent to LSDs. The differences within a panel are down the antidiagonal. All differences within a panel are significant. Note that the top and the bottom two antidiagonal grid pairs are the same.

5. Conclusions Properties of the analysis: Estimates of variance components are based solely on replicate variation. Predictions Are for all combinations of the treatment factors. Conform to the selected model. Have smaller standard errors for the estimated differences, because they are based on lower order terms. Tricky for quantitative variables with predictions for several values of each (not implemented as yet). R package asremlPlus assists in fulfilling the principles that originate from anova when doing reml mixed modelling. Available from the R repository at http://chris.brien.name/rpackages Compatible with asreml-R versions 3 and 4.1. The Ladybird.dat data set is available with asremlPlus

Thank you for your attention References Brien, C. J. (2018) asremlPlus: augments the use of ASReml-R in fitting mixed models. R package version 4.1-04. http://chris.brien.name/rpackages. Eisenhart, C. (1947). The assumptions underlying the analysis of variance. Biometrics, 3, 1-21. Fisher, R. A. (1918). The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh, 52, 388-433. Janky, D. G. (2000). Sometimes pooling for analysis of variance hypothesis tests: A review and study of a split-plot model. The American Statistician, 54, 269-279. Littell, R. C. (2002). Analysis of unbalanced mixed model data: A case study comparison of ANOVA versus REML/GLS. Journal of Agricultural, Biological, and Environmental Statistics, 7, 472-490. Nelder, J. A. (1977). A reformulation of linear models (with discussion). Journal of the Royal Statistical Society, Series A (General), 140, 48-77. Patterson, H. D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545-554. Snee, R. D. (1981). Graphical Display and Assessment of Means. Biometrics, 37, 835-836. Welham, S. J., Gezan, S. A., Clark, S. J., & Mead, A. (2014). Statistical Methods in Biology: Design and Analysis of Experiments and Regression. Boca Raton: Chapman and Hall/CRC. Thank you for your attention