Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Lecture 4. Linear Models for Regression
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Prediction with Regression
Pattern Recognition and Machine Learning
Computer vision: models, learning and inference Chapter 8 Regression.
Aaron Lorenz Department of Agronomy and Horticulture
Model assessment and cross-validation - overview
Data mining and statistical learning - lecture 6
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul 11 January 2010.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
INTRODUCTION TO Machine Learning 3rd Edition
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
CpSc 881: Machine Learning
Chapter1: Introduction Chapter2: Overview of Supervised Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning 5. Parametric Methods.
VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Computacion Inteligente Least-Square Methods for System Identification.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Bayesian Semi-Parametric Multiple Shrinkage
CEE 6410 Water Resources Systems Analysis
Deep Feedforward Networks
6. Kernel Regression.
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Boosting and Additive Trees (2)
Ch3: Model Building through Regression
CSE 4705 Artificial Intelligence
Machine learning, pattern recognition and statistical data modelling
The Elements of Statistical Learning
Machine learning, pattern recognition and statistical data modelling
Bias and Variance of the Estimator
Roberto Battiti, Mauro Brunato
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Lecture 1: Introduction to Machine Learning Methods
What is Regression Analysis?
OVERVIEW OF LINEAR MODELS
Presenter: Georgi Nalbantov
Linear Model Selection and regularization
What are BLUP? and why they are useful?
The Bias Variance Tradeoff and Regularization
Biointelligence Laboratory, Seoul National University
OVERVIEW OF LINEAR MODELS
Basis Expansions and Generalized Additive Models (2)
Model generalization Brief summary of methods
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Multivariate Methods Berlin Chen, 2005 References:
Label propagation algorithm
Presentation transcript:

Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos

Objectives To go, in a very short session, over a set of examples that illustrate how to implement various types of genome-enabled prediction methods using the BGLR package. BGLR is a new package that we recently developed that implements various types of Bayesian parametric and semi-parametric methods. The focus will be on examples; theory and deeper treatment will be offered in a short course (last week of September)

Outline Brief Introduction to whole-genome regression & roadmap. Ridge Regression and the Genomic BLUP (G-BLUP). Bayesian Methods (‘The Bayesian Alphabet’). Kernel Regression.

Classical Quantitative Genetic Model Phenotype Genetic Value Environmental effect Our error terms will involve both true environmental effects plus approximation errors emerging due to model miss-specification and because of imperfect LD between markers and QTL (‘error in predictor variables’)

The two most important challenges Complexity. How to incorporate in our models the complexity of a genetic mechanism that may involve complex interactions between alleles at multiple genes (non-linearity) as well as interactions with environmental conditions? Coping with the course of dimensionality. In the models we will consider the number of unknowns (e.g., marker effects) can vastly exceed the sample size. This induce high sampling variance of estimates and consequently large MSE. How do we confront this?

Confronting Complexity Elements of Model Specification How many markers? Which markers? What type of interactions? Dominance Epistasis (type, order) What about non-parametric approaches?

Confronting the ‘Curse of dimensionality’ In the regressions we will consider the number of parameters exceeds by large the number of data-points. In this context, standard estimation procedures (OLS, ML) cannot be used (often the solution is not unique, and when it is estiamtes have large sampling variance). Therefore, we will consider in all cases regularized regression which involve either shrinkage of estimates or variable selection or a combination of bot.

The Bias-Variance Tradeoffs Sampling Distribution of Estimates Variance Squared-Bias Bias-Variance Tradeoffs

Roadmap Linear methods - Effects of shrinkage: a case study based on Ridge Regression. - Genomic Best Linear Unbiased predictor. - Methods for really large p (e.g., 1 million markers). - The Bayesian Alphabet (a collection of methods that perform different type of shrinkage of estimates) 2. Reproducing Kernel Hilbert Spaces Regressions (RKHS) - Choice of bandwidth parameter. - Kernel Averaging

1. Parametric Methods

Whole-Genome Regression Methods [1] Penalized Parametric Bayesian Ridge Regression (Shrinkage) LASSO (Shrinkage and Selection) Elastic Net Bayesian Ridge Regression (shrinkage) Bayes B/C (selection & shrinkage) Bayes A Bayesian LASSO [1]: Meuwissen Hayes & Goddard (2001)

2. Ridge Regression & The Genomic BLUP (G-BLUP)

Penalized Regressions OLS maximizes goodness of fit to the training data (min RSS, equivalent to maximize R2). Problem: when p is large relative to n, estimates have large sampling variance, and consequently large Mean-Squared-Error. Regularization Parameter Penalty on Model Complexity

Commonly Used Penalties (Bridge Regression)

Penalty on Model Complexity Ridge Regression Penalty on Model Complexity Regularization Parameter

Example 1. How does λ affects: Shrinkage of estimates. Goodness of fit (e.g., residual sum of squares ) Model complexity (e.g., DF) Prediction Accuracy

Results Example 1

Results (DF)

Results (estimates)

Results (estimates)

Results (shrinkage of estimates with RR)

Results (fitness to training data)

Results (fitness to testing data)

Ridge Regression & G-BLUP

Example 1.

Computation of Genomic Relationship Matrix with large numbers of markers

3. The Bayesian Alphabet

Penalized and Bayesian Regressions - In penalized regressions, shrinkage is induced by adding to the objective function a penalty on model complexity - The type of shrinkage induced depends on the form of the penalty

Commonly Used Penalties (Bridge Regression)

Bayesian Regression Model for Genomic Selection

A grouping of priors

Results

Average Prediction Squared Error of Effects Markers ‘QTL’ --------------------------------------------- BRR 1.475324e-05 0.03782839 BA 1.388743e-05 0.03586657 BL 1.513329e-05 0.03705841 BC 4.641837e-05 0.02864067 BB 1.834702e-05 0.03374704 ----------------------------------------------

Estimated Marker Effects: BRR

Estimated Marker Effects: BayesA

Estimated Marker Effects: BayesC

Estimates of marker Effects BayesA Vs BRR

Estimates of marker Effects BayesA Vs BayesC

Estimates of marker Effects BayesA Vs BL

Estimates of marker Effects BayesA Vs BRR

Prediction Accuracy of realized genetic values by model

4. Kernel Regression

Framework Phenotype Genetic Value Model Residual Ridge Regression / LASSO Bayes A, Bayes B, Bayesian LASSO … - Linear model: - Reproducing Kernel Hilbert Spaces Regression Neural Networks … - Semi-parametric models:

RKHS Regressions (Background) Uses: Scatter-plot smoothing (Smoothing Splines) [1] Spatial smoothing (‘Kriging’) [2] Classification problems (Support vector machines) [3] Animal model … Regression setting (it can be of any nature) unknown function [1] Wahba (1990) Spline Models for Observational Data. [2] Cressie, N. (1993) Statistics for Spatial Data. [3] Vapnik, V. (1998) Statistical Learning Theory.

RKHS Regressions (Background) Non-parametric representation of functions Reproducing Kernel: Must be positive (semi) definite: Defines a correlation function: Defines a RKHS of real-valued functions [1] [1] Aronszajn, N. (1950) Theory of reproducing kernels

Functions as Gaussian processes K=A => Animal Model [1] [1] de los Campos Gianola and Rosa (2008) Journal of Animal Sci.

RKHS Regression in BGLR1 ETA<-list( list(K=K,model='RKHS') ) fm<-BGLR(y=y,ETA=ETA,nIter=...) [1]: the algorithm is described in de los Campos et al. Genetics Research (2010)

Choosing the RK based on predictive ability Strategies Grid of Values of ө + CV Fully Bayesian: assign a prior to ө (computationally demanding) Kernel Averaging [1] [1] de los Campos et al. (2010) WCGALP & Genetics Research (In press)

Histograms of the off-diagonal entries of each of the three t kernels used (K1, K2, K3) in the RKHS models for the wheat dataset

How to Choose the Reproducing Kernel? [1] Pedigree-models K=A Genomic Models: - Marker-based kinship - Model-derived Kernel Predictive Approach Explore a wide variety of kernels => Cross-validation => Bayesian methods [1] Shawne-Taylor and Cristianini (2004)

Example 2

Example 2

Example 2

Example 2

Example 3 Kernel Averaging

Kernel Averaging Strategies Grid of Values of  + CV Fully Bayesian: assign a prior to  (computationally demanding) Kernel Averaging [1] [1] de los Campos et al., Genetics Research (2010)

Kernel Averaging

Example 4 (100th basis function)

Example 4 (100th basis function, h=)

Example 4 (KA: trace plot residual variance)

Example 4 (KA: trace plot kernel-variances)

Example 4 (KA: trace plot kernel-variances)

Example 4 (KA: prediction accuracy)