What are BLUP? and why they are useful?

Slides:



Advertisements
Similar presentations
15 The Genetic Basis of Complex Inheritance
Advertisements

Autocorrelation and Heteroskedasticity
Properties of Least Squares Regression Coefficients
Software for Incorporating Marker Data in Genetic Evaluations Kathy Hanford U.S. Meat Animal Research Center Agricultural Research Service U.S. Department.
Managerial Economics in a Global Economy
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Qualitative and Quantitative traits
Chapter 6: Quantitative traits, breeding value and heritability Quantitative traits Phenotypic and genotypic values Breeding value Dominance deviation.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
PBG 650 Advanced Plant Breeding Module 9: Best Linear Unbiased Prediction – Purelines – Single-crosses.
1 15 The Genetic Basis of Complex Inheritance. 2 Multifactorial Traits Multifactorial traits are determined by multiple genetic and environmental factors.
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
Quantitative Genetics
Linear and generalised linear models
Business Statistics - QBM117 Statistical inference for regression.
Quantitative Genetics
Introduction to Linear Regression and Correlation Analysis
Chapter 11 Simple Regression
Module 7: Estimating Genetic Variances – Why estimate genetic variances? – Single factor mating designs PBG 650 Advanced Plant Breeding.
Broad-Sense Heritability Index
Chapter 5 Characterizing Genetic Diversity: Quantitative Variation Quantitative (metric or polygenic) characters of Most concern to conservation biology.
Karri Silventoinen University of Helsinki Osaka University.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Planning rice breeding programs for impact Models, means, variances, LSD’s and Heritability.
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Council on Dairy Cattle Breeding April 27, 2010 Interpretation of genomic breeding values from a unified, one-step national evaluation Research project.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Strategies to Incorporate Genomic Prediction Into Population-Wide Genetic Evaluations Nicolas Gengler 1,2 & Paul VanRaden 3 1 Animal Science.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Breeding Value Estimation Chapter 7. Single information source What is the breeding value of this cow for milk production? A cow produces 9000 kg milk.
Regression Models for Linkage: Merlin Regress
Chapter 7. Classification and Prediction
NORMAL DISTRIBUTIONS OF PHENOTYPES
Regression Analysis AGEC 784.
Probability Theory and Parameter Estimation I
PBG 650 Advanced Plant Breeding
Ch3: Model Building through Regression
Analysis of Variance in Matrix form
NORMAL DISTRIBUTIONS OF PHENOTYPES
Quantitative Variation
Statistical Tools in Quantitative Genetics
Genome Wide Association Studies using SNP
Spring 2009: Section 5 – Lecture 1
Chapter 15 Panel Data Analysis.
Quantitative Traits in Populations
The Genetic Basis of Complex Inheritance
15 The Genetic Basis of Complex Inheritance
Genetics of qualitative and quantitative phenotypes
I271B Quantitative Methods
Review and Complete QTL analysis from Monday (lecture 15).
Basic concepts on population genetics
Correlation for a pair of relatives
OVERVIEW OF LINEAR MODELS
Simple Linear Regression
Statistical Tools in Quantitative Genetics
OVERVIEW OF LINEAR MODELS
Regression Usman Roshan.
Product moment correlation
Lecture 3: Resemblance Between Relatives
Lecture 9: QTL Mapping II: Outbred Populations
Chapter 7 Beyond alleles: Quantitative Genetics
MGS 3100 Business Analysis Regression Feb 18, 2016
The Basic Genetic Model
Presentation transcript:

What are BLUP? and why they are useful? Best Linear Unbiased Prediction (BLUP) are useful for two main reasons 1) they allow analysis of UNBALANCED data accumulated from performance tests 2) exploits information from RELATIVES. Inbreeding recycling in different crop species naturally lead to pedigree relationship among inbreds.

Henderson began his pioneer work on BLUP since 1940 In general we use BLUP to refer to the joint use of both BLUP and Best Linear Unbiased Estimation (BLUE) Fixed effects are estimated in BLUE. These effects are constants variables rather than random variables. Examples of these effects are the estimation of the overall mean, effect of soil type, effects of sites or environments, effect of a transgenic, etc. Do not have a variance covariance. The data needs to be corrected by possible environmental effects prior comparing the effects of genotypes.

Random effects are predicted in BLUP Random effects have a variance-covariance structure whereas fixed effects do not.  Resemblance among relatives, full-sibs, half-sibs  Soil trends due to spatial correlation. Plot-to-plot variability in the field are correlated due to distance. More distance plots are less correlated that close distance plots.  Variation in time has a covariance structure

LINEAR MIXED MODELS The presence of fixed and random effects leads to a MIXED MODEL. BLUP and BLUE refer to statistical properties of prediction and estimation rather than the procedure for obtaining such prediction and estimations BEST=sampling variance of what is estimated or predicted is MINIMIZED UNBIASED= in BLUE indicate that the expected value of the estimates are equal to their true value. UNBIASED= in BLUP indicate that the prediction have zero expectation

WHAT IS REQUIRED IN BLUP? Knowledge of the true value of the variance and covariance of the random effects. These are unknown so implementation of BLUP using estimates of these variances is always an approximation. In practice BLUP involves the simultaneous prediction of genetic effects and the estimation of genetic and non-genetic variance components

LINEAR MIXED MODEL

DATA=Two type of environments and four related genotypes (from Bernardo, 2010) Mega-Env 1 18 sites Morex 4.45 Mega-Env 1 18 sites Robust 4.61 Mega-Env 1 18 sites Stander 5.27 Mega-Env 2 9 sites Robust 5.00 Mega-Env 2 9 sites Excel 5.82 Mega-Env 2 9 sites Stander 5.79

MIXED MODEL EQUATIONS (MME)

Solution BLUEs and BLUPs

Properties of The estimates of in the mixed model equations are identical to the generalized least-squares solution for fixed effects. The following re-parameterization is required to make the coefficient matrix in the mixed-model equations to be non-singular. With the restriction that ti = 0 the estimates are unique (estimable functions)

Properties of The average among unrelated individuals in the base population = 0 The average among related individuals developed in the inbred recycling is expected to be non 0 due to selection and genetic drift. For example, for Morex in Mega-Env 1 but . For Excel in Mega-Env 2 but BLUP property of SHRINKAGE

Properties of Suppose the overall mean is the only fixed effects, all inbreds are unrelated and the data is balanced. For this case the breeding value of the jth individual is ; when heritability = 0 the breeding value = 0 and when heritability = 1 the breeding value is equal to the phenotypic value. This is the shrinkage of the BLUP towards the mean

GENOMIC SELECTION AND BLUP Marker-based selection consists on (1) identifying the marker with the significant effects for the trait of interest and (2) using these markers in QTL-introgression, F2 enrichment, marker-assisted recurrent selection (MARS), etc. Use of significant tests in linkage mapping or association mapping of QTL implies that only a subset of markers are used in subsequent marker-based selection

Marker assisted selection The use of significant tests to identify which markers to use in F2 enrichment or in MARS is somewhat arbitrary. Marker whose effects exceed the significant value (threshold) are included Markers whose effects are not exceeding the threshold value are assigned a value of 0 regardless how close the estimated effects were to the significant threshold value.

Genomic selection (GS) Genomic selection uses ALL AVAILABLE markers and is useful for traits that are likely to be controlled by many QTLs with small effects rather than by a few major QTLs GS predicts a continuum of effects across all markers -- some marker have large effects and other markers may have a effects close to zero; but markers with effects close to zero are still used in selection. GS can be described as marker-based selection without QTL mapping. NEED CHEAP AND ABUNDANT MARKERS!!

MARKER EFFECTS IN GS CAN BE CALCULATED BY BLUP Suppose n=150 F3 families from the cross of two inbreds are evaluated in similar environmental conditions for testcross performance and genotyped with p=384 SNPs. The linear mixed effects model for the performance of the testcrosses on an entry-mean basis is

where y is the vector of responses n x 1 (i=1,2, where y is the vector of responses n x 1 (i=1,2,..,n) X is the marker incidence matrix n x p g vector of marker random effects for each SNP p x 1(j=1,2,…p) with g ~ N(0, Ip x p V(marker)) e vector of random residual effects n x 1 with e ~ N(0, In x n V(e)) Elements of X for the jth SNP marker depend on whether the ith F3 family is homozygous for the marker allele from the first parental inbred (xij=1), heterozygous (xij=0) or homozygous for the marker allele from the second parental inbred (xij=-1). The effect of each marker SNP is defined as the effect associated with the marker allele from the first parental inbred

g~N(0, Ip x pV(marker)) We need to estimate the BLUPs of the SNP effects and in order to do so need to estimate the variance of the random effects (since this is assumed to be known in BLUP) Need to estimate V(g)=V(marker). Assume that this variance is the genetic variance expressed among the progeny being evaluated i.e., V(g). This is divided by the number of SNP markers (p) such that each marker has the same variance. Then V(markers)=V(g)/p g~N(0, Ip x pV(g)/p)

V(markers)=V(g)/p g ~ N(0, Ip x p V(g)/p) Two assumptions (1) Each marker account for equal amount of genetic variance V(markers)=V(g)/p  all markers jointly account for 100% of the genetic variance  each marker individually account for 1/n of the genetic variance (2) Epistasis is ignore for the prediction

MME for solving the marker effect in g where is the variance of an entry mean Genomic selection treats markers as a surrogate for the phenotype so that the individuals with the best predicted performance can be selected

Accuracy of Genomic Selection Prediction GS will be superior than MARS if it leads to more accurate predictions of genotypic values The accuracy is defined as the correlation between the true genotypic value and the genotypic value predicted from marker information. The true genotypic value is unobservable then the accuracy is estimated as the correlation between the observed and predicted performance divide by the squared root of the heritability (to correct for the influence of non-genetic effects on the observed performance)