Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos.

Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos

Objectives To go, in a very short session, over a set of examples that illustrate how to implement various types of genome-enabled prediction methods using the BGLR package. BGLR is a new package that we recently developed that implements various types of Bayesian parametric and semi-parametric methods. The focus will be on examples; theory and deeper treatment will be offered in a short course (last week of September)

Outline Brief Introduction to whole-genome regression & roadmap.
Ridge Regression and the Genomic BLUP (G-BLUP). Bayesian Methods (‘The Bayesian Alphabet’). Kernel Regression.

Classical Quantitative Genetic Model
Phenotype Genetic Value Environmental effect Our error terms will involve both true environmental effects plus approximation errors emerging due to model miss-specification and because of imperfect LD between markers and QTL (‘error in predictor variables’)

The two most important challenges
Complexity. How to incorporate in our models the complexity of a genetic mechanism that may involve complex interactions between alleles at multiple genes (non-linearity) as well as interactions with environmental conditions? Coping with the course of dimensionality. In the models we will consider the number of unknowns (e.g., marker effects) can vastly exceed the sample size. This induce high sampling variance of estimates and consequently large MSE. How do we confront this?

Confronting Complexity
Elements of Model Specification How many markers? Which markers? What type of interactions? Dominance Epistasis (type, order) What about non-parametric approaches?

Confronting the ‘Curse of dimensionality’
In the regressions we will consider the number of parameters exceeds by large the number of data-points. In this context, standard estimation procedures (OLS, ML) cannot be used (often the solution is not unique, and when it is estiamtes have large sampling variance). Therefore, we will consider in all cases regularized regression which involve either shrinkage of estimates or variable selection or a combination of bot.

The Bias-Variance Tradeoffs
Sampling Distribution of Estimates Variance Squared-Bias Bias-Variance Tradeoffs

Roadmap Linear methods
- Effects of shrinkage: a case study based on Ridge Regression. - Genomic Best Linear Unbiased predictor. - Methods for really large p (e.g., 1 million markers). - The Bayesian Alphabet (a collection of methods that perform different type of shrinkage of estimates) 2. Reproducing Kernel Hilbert Spaces Regressions (RKHS) - Choice of bandwidth parameter. - Kernel Averaging

1. Parametric Methods

Whole-Genome Regression Methods [1]
Penalized Parametric Bayesian Ridge Regression (Shrinkage) LASSO (Shrinkage and Selection) Elastic Net Bayesian Ridge Regression (shrinkage) Bayes B/C (selection & shrinkage) Bayes A Bayesian LASSO [1]: Meuwissen Hayes & Goddard (2001)

2. Ridge Regression & The Genomic BLUP (G-BLUP)

Penalized Regressions
OLS maximizes goodness of fit to the training data (min RSS, equivalent to maximize R2). Problem: when p is large relative to n, estimates have large sampling variance, and consequently large Mean-Squared-Error. Regularization Parameter Penalty on Model Complexity

Commonly Used Penalties (Bridge Regression)

Penalty on Model Complexity
Ridge Regression Penalty on Model Complexity Regularization Parameter

Example 1. How does λ affects: Shrinkage of estimates.
Goodness of fit (e.g., residual sum of squares ) Model complexity (e.g., DF) Prediction Accuracy

Results Example 1

Results (DF)

Results (estimates)

Results (shrinkage of estimates with RR)

Results (fitness to training data)

Results (fitness to testing data)

Ridge Regression & G-BLUP

Example 1.

Computation of Genomic Relationship Matrix with large numbers of markers

3. The Bayesian Alphabet

Penalized and Bayesian Regressions
- In penalized regressions, shrinkage is induced by adding to the objective function a penalty on model complexity - The type of shrinkage induced depends on the form of the penalty

Commonly Used Penalties (Bridge Regression)

Bayesian Regression Model for Genomic Selection

A grouping of priors

Results

Average Prediction Squared Error of Effects
Markers ‘QTL’ BRR e BA e BL e BC e BB e

Estimated Marker Effects: BRR

Estimated Marker Effects: BayesA

Estimated Marker Effects: BayesC

Estimates of marker Effects BayesA Vs BRR

Estimates of marker Effects BayesA Vs BayesC

Estimates of marker Effects BayesA Vs BL

Estimates of marker Effects BayesA Vs BRR

Prediction Accuracy of realized genetic values by model

4. Kernel Regression

Framework Phenotype Genetic Value Model Residual
Ridge Regression / LASSO Bayes A, Bayes B, Bayesian LASSO … - Linear model: - Reproducing Kernel Hilbert Spaces Regression Neural Networks … - Semi-parametric models:

RKHS Regressions (Background)
Uses: Scatter-plot smoothing (Smoothing Splines) [1] Spatial smoothing (‘Kriging’) [2] Classification problems (Support vector machines) [3] Animal model … Regression setting (it can be of any nature) unknown function [1] Wahba (1990) Spline Models for Observational Data. [2] Cressie, N. (1993) Statistics for Spatial Data. [3] Vapnik, V. (1998) Statistical Learning Theory.

RKHS Regressions (Background)
Non-parametric representation of functions Reproducing Kernel: Must be positive (semi) definite: Defines a correlation function: Defines a RKHS of real-valued functions [1] [1] Aronszajn, N. (1950) Theory of reproducing kernels

Functions as Gaussian processes
K=A => Animal Model [1] [1] de los Campos Gianola and Rosa (2008) Journal of Animal Sci.

RKHS Regression in BGLR1
ETA<-list( list(K=K,model='RKHS') ) fm<-BGLR(y=y,ETA=ETA,nIter=...) [1]: the algorithm is described in de los Campos et al. Genetics Research (2010)

Choosing the RK based on predictive ability
Strategies Grid of Values of ө + CV Fully Bayesian: assign a prior to ө (computationally demanding) Kernel Averaging [1] [1] de los Campos et al. (2010) WCGALP & Genetics Research (In press)

Histograms of the off-diagonal entries of each of the three t kernels used (K1, K2, K3) in the RKHS models for the wheat dataset

How to Choose the Reproducing Kernel? [1]
Pedigree-models K=A Genomic Models: - Marker-based kinship - Model-derived Kernel Predictive Approach Explore a wide variety of kernels => Cross-validation => Bayesian methods [1] Shawne-Taylor and Cristianini (2004)

Example 2

Example 3 Kernel Averaging

Kernel Averaging Strategies Grid of Values of  + CV
Fully Bayesian: assign a prior to  (computationally demanding) Kernel Averaging [1] [1] de los Campos et al., Genetics Research (2010)

Kernel Averaging

Example 4 (100th basis function)

Example 4 (100th basis function, h=)

Example 4 (KA: trace plot residual variance)

Example 4 (KA: trace plot kernel-variances)

Example 4 (KA: prediction accuracy)

Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos.

Similar presentations

Presentation on theme: "Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos.

Similar presentations

Presentation on theme: "Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos."— Presentation transcript:

Similar presentations

About project

Feedback