Download presentation
Presentation is loading. Please wait.
Published byVivian Gallagher Modified over 6 years ago
1
Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos
2
Objectives To go, in a very short session, over a set of examples that illustrate how to implement various types of genome-enabled prediction methods using the BGLR package. BGLR is a new package that we recently developed that implements various types of Bayesian parametric and semi-parametric methods. The focus will be on examples; theory and deeper treatment will be offered in a short course (last week of September)
3
Outline Brief Introduction to whole-genome regression & roadmap.
Ridge Regression and the Genomic BLUP (G-BLUP). Bayesian Methods (‘The Bayesian Alphabet’). Kernel Regression.
4
Classical Quantitative Genetic Model
Phenotype Genetic Value Environmental effect Our error terms will involve both true environmental effects plus approximation errors emerging due to model miss-specification and because of imperfect LD between markers and QTL (‘error in predictor variables’)
5
The two most important challenges
Complexity. How to incorporate in our models the complexity of a genetic mechanism that may involve complex interactions between alleles at multiple genes (non-linearity) as well as interactions with environmental conditions? Coping with the course of dimensionality. In the models we will consider the number of unknowns (e.g., marker effects) can vastly exceed the sample size. This induce high sampling variance of estimates and consequently large MSE. How do we confront this?
6
Confronting Complexity
Elements of Model Specification How many markers? Which markers? What type of interactions? Dominance Epistasis (type, order) What about non-parametric approaches?
7
Confronting the ‘Curse of dimensionality’
In the regressions we will consider the number of parameters exceeds by large the number of data-points. In this context, standard estimation procedures (OLS, ML) cannot be used (often the solution is not unique, and when it is estiamtes have large sampling variance). Therefore, we will consider in all cases regularized regression which involve either shrinkage of estimates or variable selection or a combination of bot.
8
The Bias-Variance Tradeoffs
Sampling Distribution of Estimates Variance Squared-Bias Bias-Variance Tradeoffs
9
Roadmap Linear methods
- Effects of shrinkage: a case study based on Ridge Regression. - Genomic Best Linear Unbiased predictor. - Methods for really large p (e.g., 1 million markers). - The Bayesian Alphabet (a collection of methods that perform different type of shrinkage of estimates) 2. Reproducing Kernel Hilbert Spaces Regressions (RKHS) - Choice of bandwidth parameter. - Kernel Averaging
10
1. Parametric Methods
11
Whole-Genome Regression Methods [1]
Penalized Parametric Bayesian Ridge Regression (Shrinkage) LASSO (Shrinkage and Selection) Elastic Net Bayesian Ridge Regression (shrinkage) Bayes B/C (selection & shrinkage) Bayes A Bayesian LASSO [1]: Meuwissen Hayes & Goddard (2001)
12
2. Ridge Regression & The Genomic BLUP (G-BLUP)
13
Penalized Regressions
OLS maximizes goodness of fit to the training data (min RSS, equivalent to maximize R2). Problem: when p is large relative to n, estimates have large sampling variance, and consequently large Mean-Squared-Error. Regularization Parameter Penalty on Model Complexity
14
Commonly Used Penalties (Bridge Regression)
15
Penalty on Model Complexity
Ridge Regression Penalty on Model Complexity Regularization Parameter
16
Example 1. How does λ affects: Shrinkage of estimates.
Goodness of fit (e.g., residual sum of squares ) Model complexity (e.g., DF) Prediction Accuracy
17
Results Example 1
18
Results (DF)
19
Results (estimates)
20
Results (estimates)
21
Results (shrinkage of estimates with RR)
22
Results (fitness to training data)
23
Results (fitness to testing data)
24
Ridge Regression & G-BLUP
25
Example 1.
26
Computation of Genomic Relationship Matrix with large numbers of markers
27
3. The Bayesian Alphabet
28
Penalized and Bayesian Regressions
- In penalized regressions, shrinkage is induced by adding to the objective function a penalty on model complexity - The type of shrinkage induced depends on the form of the penalty
29
Commonly Used Penalties (Bridge Regression)
30
Bayesian Regression Model for Genomic Selection
31
A grouping of priors
32
Results
33
Average Prediction Squared Error of Effects
Markers ‘QTL’ BRR e BA e BL e BC e BB e
34
Estimated Marker Effects: BRR
35
Estimated Marker Effects: BayesA
36
Estimated Marker Effects: BayesC
37
Estimates of marker Effects BayesA Vs BRR
38
Estimates of marker Effects BayesA Vs BayesC
39
Estimates of marker Effects BayesA Vs BL
40
Estimates of marker Effects BayesA Vs BRR
41
Prediction Accuracy of realized genetic values by model
42
4. Kernel Regression
43
Framework Phenotype Genetic Value Model Residual
Ridge Regression / LASSO Bayes A, Bayes B, Bayesian LASSO … - Linear model: - Reproducing Kernel Hilbert Spaces Regression Neural Networks … - Semi-parametric models:
44
RKHS Regressions (Background)
Uses: Scatter-plot smoothing (Smoothing Splines) [1] Spatial smoothing (‘Kriging’) [2] Classification problems (Support vector machines) [3] Animal model … Regression setting (it can be of any nature) unknown function [1] Wahba (1990) Spline Models for Observational Data. [2] Cressie, N. (1993) Statistics for Spatial Data. [3] Vapnik, V. (1998) Statistical Learning Theory.
45
RKHS Regressions (Background)
Non-parametric representation of functions Reproducing Kernel: Must be positive (semi) definite: Defines a correlation function: Defines a RKHS of real-valued functions [1] [1] Aronszajn, N. (1950) Theory of reproducing kernels
46
Functions as Gaussian processes
K=A => Animal Model [1] [1] de los Campos Gianola and Rosa (2008) Journal of Animal Sci.
47
RKHS Regression in BGLR1
ETA<-list( list(K=K,model='RKHS') ) fm<-BGLR(y=y,ETA=ETA,nIter=...) [1]: the algorithm is described in de los Campos et al. Genetics Research (2010)
48
Choosing the RK based on predictive ability
Strategies Grid of Values of ө + CV Fully Bayesian: assign a prior to ө (computationally demanding) Kernel Averaging [1] [1] de los Campos et al. (2010) WCGALP & Genetics Research (In press)
49
Histograms of the off-diagonal entries of each of the three t kernels used (K1, K2, K3) in the RKHS models for the wheat dataset
50
How to Choose the Reproducing Kernel? [1]
Pedigree-models K=A Genomic Models: - Marker-based kinship - Model-derived Kernel Predictive Approach Explore a wide variety of kernels => Cross-validation => Bayesian methods [1] Shawne-Taylor and Cristianini (2004)
51
Example 2
52
Example 2
53
Example 2
54
Example 2
55
Example 3 Kernel Averaging
56
Kernel Averaging Strategies Grid of Values of + CV
Fully Bayesian: assign a prior to (computationally demanding) Kernel Averaging [1] [1] de los Campos et al., Genetics Research (2010)
57
Kernel Averaging
58
Example 4 (100th basis function)
59
Example 4 (100th basis function, h=)
60
Example 4 (KA: trace plot residual variance)
61
Example 4 (KA: trace plot kernel-variances)
62
Example 4 (KA: trace plot kernel-variances)
63
Example 4 (KA: prediction accuracy)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.