Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos.

Similar presentations


Presentation on theme: "Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos."— Presentation transcript:

1 Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos

2 Objectives To go, in a very short session, over a set of examples that illustrate how to implement various types of genome-enabled prediction methods using the BGLR package. BGLR is a new package that we recently developed that implements various types of Bayesian parametric and semi-parametric methods. The focus will be on examples; theory and deeper treatment will be offered in a short course (last week of September)

3 Outline Brief Introduction to whole-genome regression & roadmap.
Ridge Regression and the Genomic BLUP (G-BLUP). Bayesian Methods (‘The Bayesian Alphabet’). Kernel Regression.

4 Classical Quantitative Genetic Model
Phenotype Genetic Value Environmental effect Our error terms will involve both true environmental effects plus approximation errors emerging due to model miss-specification and because of imperfect LD between markers and QTL (‘error in predictor variables’)

5 The two most important challenges
Complexity. How to incorporate in our models the complexity of a genetic mechanism that may involve complex interactions between alleles at multiple genes (non-linearity) as well as interactions with environmental conditions? Coping with the course of dimensionality. In the models we will consider the number of unknowns (e.g., marker effects) can vastly exceed the sample size. This induce high sampling variance of estimates and consequently large MSE. How do we confront this?

6 Confronting Complexity
Elements of Model Specification How many markers? Which markers? What type of interactions? Dominance Epistasis (type, order) What about non-parametric approaches?

7 Confronting the ‘Curse of dimensionality’
In the regressions we will consider the number of parameters exceeds by large the number of data-points. In this context, standard estimation procedures (OLS, ML) cannot be used (often the solution is not unique, and when it is estiamtes have large sampling variance). Therefore, we will consider in all cases regularized regression which involve either shrinkage of estimates or variable selection or a combination of bot.

8 The Bias-Variance Tradeoffs
Sampling Distribution of Estimates Variance Squared-Bias Bias-Variance Tradeoffs

9 Roadmap Linear methods
- Effects of shrinkage: a case study based on Ridge Regression. - Genomic Best Linear Unbiased predictor. - Methods for really large p (e.g., 1 million markers). - The Bayesian Alphabet (a collection of methods that perform different type of shrinkage of estimates) 2. Reproducing Kernel Hilbert Spaces Regressions (RKHS) - Choice of bandwidth parameter. - Kernel Averaging

10 1. Parametric Methods

11 Whole-Genome Regression Methods [1]
Penalized Parametric Bayesian Ridge Regression (Shrinkage) LASSO (Shrinkage and Selection) Elastic Net Bayesian Ridge Regression (shrinkage) Bayes B/C (selection & shrinkage) Bayes A Bayesian LASSO [1]: Meuwissen Hayes & Goddard (2001)

12 2. Ridge Regression & The Genomic BLUP (G-BLUP)

13 Penalized Regressions
OLS maximizes goodness of fit to the training data (min RSS, equivalent to maximize R2). Problem: when p is large relative to n, estimates have large sampling variance, and consequently large Mean-Squared-Error. Regularization Parameter Penalty on Model Complexity

14 Commonly Used Penalties (Bridge Regression)

15 Penalty on Model Complexity
Ridge Regression Penalty on Model Complexity Regularization Parameter

16 Example 1. How does λ affects: Shrinkage of estimates.
Goodness of fit (e.g., residual sum of squares ) Model complexity (e.g., DF) Prediction Accuracy

17 Results Example 1

18 Results (DF)

19 Results (estimates)

20 Results (estimates)

21 Results (shrinkage of estimates with RR)

22 Results (fitness to training data)

23 Results (fitness to testing data)

24 Ridge Regression & G-BLUP

25 Example 1.

26 Computation of Genomic Relationship Matrix with large numbers of markers

27 3. The Bayesian Alphabet

28 Penalized and Bayesian Regressions
- In penalized regressions, shrinkage is induced by adding to the objective function a penalty on model complexity - The type of shrinkage induced depends on the form of the penalty

29 Commonly Used Penalties (Bridge Regression)

30 Bayesian Regression Model for Genomic Selection

31 A grouping of priors

32 Results

33 Average Prediction Squared Error of Effects
Markers ‘QTL’ BRR e BA e BL e BC e BB e

34 Estimated Marker Effects: BRR

35 Estimated Marker Effects: BayesA

36 Estimated Marker Effects: BayesC

37 Estimates of marker Effects BayesA Vs BRR

38 Estimates of marker Effects BayesA Vs BayesC

39 Estimates of marker Effects BayesA Vs BL

40 Estimates of marker Effects BayesA Vs BRR

41 Prediction Accuracy of realized genetic values by model

42 4. Kernel Regression

43 Framework Phenotype Genetic Value Model Residual
Ridge Regression / LASSO Bayes A, Bayes B, Bayesian LASSO - Linear model: - Reproducing Kernel Hilbert Spaces Regression Neural Networks - Semi-parametric models:

44 RKHS Regressions (Background)
Uses: Scatter-plot smoothing (Smoothing Splines) [1] Spatial smoothing (‘Kriging’) [2] Classification problems (Support vector machines) [3] Animal model Regression setting (it can be of any nature) unknown function [1] Wahba (1990) Spline Models for Observational Data. [2] Cressie, N. (1993) Statistics for Spatial Data. [3] Vapnik, V. (1998) Statistical Learning Theory.

45 RKHS Regressions (Background)
Non-parametric representation of functions Reproducing Kernel: Must be positive (semi) definite: Defines a correlation function: Defines a RKHS of real-valued functions [1] [1] Aronszajn, N. (1950) Theory of reproducing kernels

46 Functions as Gaussian processes
K=A => Animal Model [1] [1] de los Campos Gianola and Rosa (2008) Journal of Animal Sci.

47 RKHS Regression in BGLR1
ETA<-list( list(K=K,model='RKHS') ) fm<-BGLR(y=y,ETA=ETA,nIter=...) [1]: the algorithm is described in de los Campos et al. Genetics Research (2010)

48 Choosing the RK based on predictive ability
Strategies Grid of Values of ө + CV Fully Bayesian: assign a prior to ө (computationally demanding) Kernel Averaging [1] [1] de los Campos et al. (2010) WCGALP & Genetics Research (In press)

49 Histograms of the off-diagonal entries of each of the three t kernels used (K1, K2, K3) in the RKHS models for the wheat dataset

50 How to Choose the Reproducing Kernel? [1]
Pedigree-models K=A Genomic Models: - Marker-based kinship - Model-derived Kernel Predictive Approach Explore a wide variety of kernels => Cross-validation => Bayesian methods [1] Shawne-Taylor and Cristianini (2004)

51 Example 2

52 Example 2

53 Example 2

54 Example 2

55 Example 3 Kernel Averaging

56 Kernel Averaging Strategies Grid of Values of  + CV
Fully Bayesian: assign a prior to  (computationally demanding) Kernel Averaging [1] [1] de los Campos et al., Genetics Research (2010)

57 Kernel Averaging

58 Example 4 (100th basis function)

59 Example 4 (100th basis function, h=)

60 Example 4 (KA: trace plot residual variance)

61 Example 4 (KA: trace plot kernel-variances)

62 Example 4 (KA: trace plot kernel-variances)

63 Example 4 (KA: prediction accuracy)


Download ppt "Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos."

Similar presentations


Ads by Google