Whole genome QTL analysis using variable selection in complex linear mixed models Julian Taylor Postdoctoral Fellow Food Futures National Research Flagship.

Slides:



Advertisements
Similar presentations
Copula Representation of Joint Risk Driver Distribution
Advertisements

Planning breeding programs for impact
Software for Incorporating Marker Data in Genetic Evaluations Kathy Hanford U.S. Meat Animal Research Center Agricultural Research Service U.S. Department.
Qualitative and Quantitative traits
A.M. Alonso, C. García-Martos, J. Rodríguez, M. J. Sánchez Seasonal dynamic factor model and bootstrap inference: Application to electricity market forecasting.
Believing in MAGIC: Validation of a novel experimental breeding design Emma Huang, Ph.D. Biometrics on the Lake December 2, 2009.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Experimental Design, Response Surface Analysis, and Optimization
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
QTL Mapping R. M. Sundaram.
MALD Mapping by Admixture Linkage Disequilibrium.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Models with Discrete Dependent Variables
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Quantitative Genetics
Prediction and model selection
Linear and generalised linear models
Linear and generalised linear models
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Maximum likelihood (ML)
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Review Session Monday, November 8 Shantz 242 E (the usual place) 5:00-7:00 PM I’ll answer questions on my material, then Chad will answer questions on.
Objectives of Multiple Regression
QTL mapping in animals. It works QTL mapping in animals It works It’s cheap.
ConceptS and Connections
Gene, Allele, Genotype, and Phenotype
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
QTL Cartographer A Program Package for finding Quantitative Trait Loci C. J. Basten Z.-B. Zeng and B. S. Weir.
Association between genotype and phenotype
Www. geocities.com/ResearchTriangle/Forum/4463/anigenetics.gif.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Review of statistical modeling and probability theory Alan Moses ML4bio.
Genetic mapping and QTL analysis - JoinMap and QTLNetwork -
CWR 6536 Stochastic Subsurface Hydrology Optimal Estimation of Hydrologic Parameters.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Bayesian Variable Selection in Semiparametric Regression Modeling with Applications to Genetic Mappping Fei Zou Department of Biostatistics University.
Dirk-Jan de Koning*, Örjan Carlborg*, Robert Williams†, Lu Lu†,
Linear Mixed Models in JMP Pro
Statistical Tools in Quantitative Genetics
Genome Wide Association Studies using SNP
Gene mapping in mice Karl W Broman Department of Biostatistics
Mapping Quantitative Trait Loci
Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E
OVERVIEW OF LINEAR MODELS
What are BLUP? and why they are useful?
Detecting variance-controlling QTL
Lecture 16: Likelihood and estimates of variances
5.4 General Linear Least-Squares
Statistical Tools in Quantitative Genetics
OVERVIEW OF LINEAR MODELS
ENM 310 Design of Experiments and Regression Analysis Chapter 3
Presentation transcript:

Whole genome QTL analysis using variable selection in complex linear mixed models Julian Taylor Postdoctoral Fellow Food Futures National Research Flagship 30 th December 2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A AAAAAAA A A AAAA

CSIRO. QTL analysis using variable selection in mixed models Outline Introduction Motivating Data The Genetics The Problem Mixed Model Variable Selection (MMVS) Epistatic Model and Estimation Dimension Reduction Algorithm Model Selection Results Simulations: Main Effects Example: Main Effects Summary

CSIRO. QTL analysis using variable selection in mixed models The Motivating Data This research focusses on improving wheat quality through the analysis of Quantitative Trait Loci (QTLs) QTLs are segments of the genome believed to be linked to a trait of interest Data has been collected from two field trials, Griffith and Biloela Each trial consisted of 180 lines of an experimental crossing of wheat varieties, Chara and Glenlea Of interest are wheat quality traits obtained at different phases of the bread making process For example, Field Trial Milling Baking

CSIRO. QTL analysis using variable selection in mixed models The Motivating Data In fact, many experiments are under investigation each providing a set of wheat quality traits Yield, Ave. Grain Hard., Ave. Grain Weight Milling Yield, Flour Protein Baking Volume, Oven Spring, Cell No. Millin g Field Mixo- graph Water Absor b Water Absor b RVA Bakin g HPLC Exten sogra ph Micro - Zelen y

CSIRO. QTL analysis using variable selection in mixed models The Motivating Data As there is 180 genotypes of wheat under investigation it is not cost effective to completely replicate all varieties Cullis et al (2006) shows partial replication can be used at each phase of the experimental process Griffith Site: Example: Field Milling Baking Can be complex with designed experiments at each phase!

CSIRO. QTL analysis using variable selection in mixed models The plant world, including wheat, have been slow to catch up to the high dimensional data used in other biological areas, e.g humans Currently the wheat genetic map is around 1000 markers and is slowly increasing. This research in this talk uses a map of around 400 markers Eventually this will become high dimensional and epistasis is already becoming of interest Epistasis: Interaction between genes not necessarily located on the same chromosome The Genetics

CSIRO. QTL analysis using variable selection in mixed models In plant breeding, without the genetics, we have a possibly complex model of the form where are unknown fixed effects, are unobserved random effects (such as varieties), and are unknown sets of variance ratio parameters usually associated with extraneous variation (spatial, blocks, etc). How do we incorporate possibly high dimensional genetic components into a complex linear mixed model? Needs to be computationally efficient when the number of genetic variables is much bigger than the number of observations Needs to be incorporated into flexible software as plant breeding analyses are often complex with fixed and random effect model terms Needs to slay the dragon and save the princess! The Problem

CSIRO. QTL analysis using variable selection in mixed models Mixed Model Variable Selection (MMVS): Epistatic Working Model We incorporate the genetic component directly into a working model For markers/intervals the genetic effects are decomposed into a genetic model, for the ith genetic line where is a residual polygenic effect, is the indicator of parental type at a QTL in the jth interval, and are main effects and epistatic effects respectively In vector format, and using interval regression (Whittaker 1996) we have Absorb into and let and to give the mixed model

CSIRO. QTL analysis using variable selection in mixed models MMVS: Variable Selection Distribution Our work considers a variable selection approach to the problem where the distribution of the epistatic effects,,are of the form where acts as a variance parameter determines the severity of the penalty We respect statistical marginality and initially let the main effects be

CSIRO. QTL analysis using variable selection in mixed models MMVS: Estimation Derive mixed model equations from joint likelihood Focussing on we linearise its derivative to give where is a diagonal matrix with jth element Mixed model equations (MME) for the specified model are i.e in MME is very similar to a random effect but with as known weights. Thus where and are estimated, along with other variance components of the mixed model, using REML

CSIRO. QTL analysis using variable selection in mixed models MMVS: Dimension Reduction Solving of MME requires the inversion of the matrix which is likely to be very large for epistatic effects We use a dimension reduction by considering a linear model where and. MME equations after first absorption step (integrating out ) where is an matrix. Solution for epistatic effects is Recovery of is found by back transformation

CSIRO. QTL analysis using variable selection in mixed models MMVS: Working Model Algorithm 1.Initial estimates for the working model are taken from a baseline model (i.e. no or ) and initially. is fixed throughout this algorithm 2.Linear mixed model is fitted with main effect term ( ) and epistatic effect term ( ) and mixed model equations are solved using REML. is found by back transformation. 3.To ensure marginality only the epistatic estimates for are extracted. Estimates of falling below a threshold, are deemed not significant and omitted. This reduced set, along with reduced matrix is then placed in in and the algorithm returns to 2 and repeats until convergence 4.The final epistatic set and their associated main effects are fitted additively in the fixed effects with removed from the model. The remaining main effects are treated similarly using 1 – 3. 5.The final main effects set are added to the fixed effects of the final model

CSIRO. QTL analysis using variable selection in mixed models MMVS: Model Selection (What about !) cannot be estimated from the mixed model Remember determines the severity of the penalty We chose to use the Bayesian Information Criterion where is the final log-likelihood, is the number of parameters in the model and is the number of observations The BIC is calculated for a range of and the minimum BIC is used as the final model We are also investigating BIC from Broman and Speed (2002) and DIC (Speigelhalter 2002). Both of these are not as easy as to implement as they appear. We are also investigating ways of estimating using descent methods. This algorithm has been coded alongside the very flexible mixed model software, ASReml-R (Butler, 2009).

CSIRO. QTL analysis using variable selection in mixed models Simulations (Main Effects) Low dimensional study 9 chromosomes with 11 markers equally spaced 10cM apart 7 QTLs simulated with locations at midpoints of Chr 1, Interval 4; Chr 1, Interval 8 (Repulsion) Chr 2, Interval 4; Chr 2, Interval 8 (Coupling) Chr 3, Interval 6 Chr 4, Interval 4 Chr 5, Interval 1 All simulated with size 0.38 (Chr 1, Interval 8 has size -0.38) 1000 simulations for population sizes 100,200 and 400 were analysed WGAIM (Verbyla et al, 2007) and new Mixed Model Variable Selection, MMVS, methods were used for analysis WGAIM outperforms CIM quite considerably across all population sizes and so CIM is not presented here

CSIRO. QTL analysis using variable selection in mixed models Simulations (ctd.) Below are the results for the QTLs using the WGAIM and MMVS approaches

CSIRO. QTL analysis using variable selection in mixed models Simulations (ctd.) Simulation results for extraneous QTLs, linked and unlinked Slightly higher rate of extraneous QTL detection for MMVS method This is with BIC.. Our thoughts are that we can reduce this considerably with a better model selection criteria such as BIC or even direct estimation of

CSIRO. QTL analysis using variable selection in mixed models Example : Yield Main Effects QTLs for yield trait (first phase)

CSIRO. QTL analysis using variable selection in mixed models Example: Cell No. Main Effects QTLs for cell number (third phase) All traits analysed show an increase in the detection of QTLs in coupling and repulsion for the MMVS method

CSIRO. QTL analysis using variable selection in mixed models QTL plot from WGAIM package

CSIRO. QTL analysis using variable selection in mixed models Summary and Future Work New MMVS method we can incorporate high dimensional data into complex mixed models in a natural way This is not restricted to statistical genetics! R package is coming shortly The method is general and so opens the door for high dimensional analysis in other areas requiring complex mixed models Future work: A methods epistatic interactions paper is in prep. which will highlight the difficulty with finding these effects QTL mapping with multi-way crosses using WGAIM and MMVS is in progress

CSIRO. QTL analysis using variable selection in mixed models As Rove calls it …. Here comes …. The Plug! 1)Taylor, J. D and Verbyla, A. P (2009) A variable selection method for the analysis of QTLs in complex linear mixed models, Finalised. 2)Taylor, J. D and Verbyla, A. P (2009) High dimensional analysis of QTLs in complex linear mixed models, In Preparation. 3) Taylor, J. D and Verbyla, A. P (2009) Efficient variable selection using the normal- inverse gamma specification, Journal of Computational and Graphical Statistics, Submitted. 4) Cavanagh, C. R and Taylor, J. D et al. (2009) Sponge and dough bread making: genetic and phenotypic correlations of sponge wheat quality traits, Theoretical and Applied Genetics, Submitted.

Contact Us Phone: or Web: Say hi to your mum for me! CMIS/Agribusiness Julian Taylor Postdoctoral Fellow Phone: Web: CMIS/Agribusiness Ari Verbyla Professor Phone: Web: