Lecture 15: Analysis of Selection Experiments

Slides:



Advertisements
Similar presentations
Things to do in Lecture 1 Outline basic concepts of causality
Advertisements

Properties of Least Squares Regression Coefficients
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
The Simple Regression Model
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
x – independent variable (input)
Lecture 5 Artificial Selection R = h 2 S. Applications of Artificial Selection Applications in agriculture and forestry Creation of model systems of human.
Extensions of the Breeder’s Equation: Permanent Versus Transient Response Response to selection on the variance.
Short-Term Selection Response
Linear and generalised linear models
Quantitative Genetics
Lecture 2: Basic Population and Quantitative Genetics.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Relationships Among Variables
Lecture 10A: Matrix Algebra. Matrices: An array of elements Vectors Column vector Row vector Square matrix Dimensionality of a matrix: r x c (rows x columns)
Regression and Correlation Methods Judy Zhong Ph.D.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Regression Method.
Module 7: Estimating Genetic Variances – Why estimate genetic variances? – Single factor mating designs PBG 650 Advanced Plant Breeding.
CORRELATION & REGRESSION
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A perfect correlation implies the ability to predict one score from another perfectly.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
PCB6555 Spring 2009 PCB-6555 Introduction to Quantitative Genetics.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Applied Regression Analysis BUSI 6220
Multiple Regression Analysis: Inference
Chapter 15 Multiple Regression Model Building
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
NORMAL DISTRIBUTIONS OF PHENOTYPES
Regression Analysis AGEC 784.
PBG 650 Advanced Plant Breeding
Basic Estimation Techniques
NORMAL DISTRIBUTIONS OF PHENOTYPES
Quantitative Variation
Genome Wide Association Studies using SNP
Evgeniya Anatolievna Kolomak, Professor
Pure Serial Correlation
...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001
Inverse Transformation Scale Experimental Power Graphing
Genetics of qualitative and quantitative phenotypes
Basic Estimation Techniques
G Lecture 6 Multilevel Notation; Level 1 and Level 2 Equations
Two-Variable Regression Model: The Problem of Estimation
Lecture 2: Fisher’s Variance Decomposition
Geology Geomath Chapter 7 - Statistics tom.h.wilson
Undergraduated Econometrics
Lecture 7: Correlated Characters
OVERVIEW OF LINEAR MODELS
What are BLUP? and why they are useful?
Lecture 5 Artificial Selection
M248: Analyzing data Block D UNIT D2 Regression.
OVERVIEW OF LINEAR MODELS
The F2 Generation  1. F2 Population Mean and Variance (p = q = 0.5) 
Chapter 7: The Normality Assumption and Inference with OLS
Product moment correlation
Regression Forecasting and Model Building
Lecture 14 Short-Termss Selection Response
Lecture 9: QTL Mapping II: Outbred Populations
Lecture 16: Selection on Multiple Traits lection, selection MAS
One-Factor Experiments
MGS 3100 Business Analysis Regression Feb 18, 2016
The Basic Genetic Model
Lecture 7: Correlated Characters
Presentation transcript:

Lecture 15: Analysis of Selection Experiments

Variance in the Response to Selection R = h2S is just the expected value of the response, but there is a variance about this value. Hence, identically-selected replicate lines are still expected to show variation is response The major source of such variation is genetic drift

æ ( t ) = + z = π + g d e E ( z ) = π + g d æ ( t ) = M Consider the mean in generation t The effect of any major environmental trend in generation t The mean breeding value in generation t Mean of the original population Error in estimating the environmental-corrected mean breeding value from the mean phenotype of a sample z t = π + g d e Under this model, the mean of a replicate series of lines is R = h2S E ( z t ) = π + g d The variance is given by Variance in the environmental trend (mean is set to be zero) Variance in the breeding value at generation t æ 2 z ( t ) = g + e d æ 2 e ( t ) = z M In generation t, Mt individuals are measured. An upper bound on the error variance is

Variance in Breeding Values Two sources of variation (i) Sampling variance in the founding lines (ii) Genetic drift (inbreeding) within each line æ 2 g ( t ) = µ 1 M + f ∂ A h z Inbreeding in generation t Size of the founding population t ∂ µ " 2 f = 1 ° N e # ' o r < ( ) - The mean breeding values in different generations of the same replicate line are correlated, ( ) æ g t ; = z µ 1 M + 2 f ∂ h o r <

Variance-covariance structure within a line Assume the initial sample is sufficiently large so that we can ignore 1/M0 Variance: s2(gt) = (t/Ne)h2s2z Covariance: s2(gt, gx) = (x/Ne)h2s2z for x < t These expressions (which are often called the pure-drift approximations) will prove useful in the statistical analysis of selection response

The Realized Heritability Since R = h2 S, this suggests h2 = R/S, so that the ratio of the observed response over the observed differential provides an estimate of the heritability, the realized heritability Obvious definition for a single generation of response. What about for multiple generations of response? Cumulative selection response = sum of all responses R C ( t = i 1 ) X

R ( t ) = h S + e ( T ) R b h = S P b = S ( ) R Cumulative selection differential = sum of the S’s S C ( t ) = 1 i X (1) The Ratio Estimator for realized heritability = total response/total differential, ( T ) R b h 2 r = C S (2) The Regression Estimator --- the slope of the regression of cumulative response on cumulative differential P b = t S C ( ) R 2 Regression passes through the origin (R=0 when S=0). Slope = R C ( t ) = h 2 r S + e

Note x axis is differential, NOT generations Ratio estimator = 17.4/56.9 = 0.292 60 \ Cumulative Differential 5 10 15 20 Cumulative Response Slope = 0.270 = Regression estimator Note x axis is differential, NOT generations

Standard error of the Ratio Estimator Ratio Estimator, h2r = RT/ST Recall that the variance for the mean in generation t is s2(gt) + s2(e) + s2(d) Assume M0 >> 1 and that we can ignore the environmental trend variance, then s2(RT) = s2(gt) + s2(e) = (T/Ne)h2s2z + s2z /MT Number of individuals sampled in generation T s2(RT/ST ) = s2(RT ) /(ST )2 This follows since s2(ax) = a2s2 (x) Hence, s2(h2r) = [ (T/Ne)h2s2z + s2z /MT ] /(ST )2

SE for (OLS) Regression Estimator The basic linear model is X = Sc b = bc y = R R C ( t ) = h 2 r S + e b ° C ( O L S ) = ≥ X T ¥ 1 y R - Under the OLS framework (residuals homoscedastic and uncorrelated), the linear model has the design matrix X just the vector Sc of cumulative differential and y = R P = T i 1 S C ( ) ¢ R 2 ( ) - V a r h b C O L S i = æ 2 e ≥ X T ¥ ° 1 ¡ b æ 2 e = 1 T ° X i ≥ R C ( ) h r S ¥ -

Problems with OLS regression approach Although the OLS regression estimator for realized heritability is very widely used, it has fatal problems OLS assumes the residuals are homoscedastic and uncorrelated. In reality, the covariance structure is s2(ei) = (i/Ne)h2s2z + s2z /Mi s2(ek, ei) = (i/Ne)h2s2z for i < k Hence, the GLS regression is more appropriate The OLS gives unbiased estimates of the realized heritability, but it seriously underestimates its SE

GLS regression Estimate C ( t ) = h 2 r S + e X = Sc b = bc y = R The variance-covariance matrix V has elements h2 is what we are trying to estimate. Use an iterative approach. Try some initial value, use GLS to update this value, use the new value for next round of updating. continue until values stabilize We can directly estimate the phenotypic variance from the data Vii = (i/Ne)h2s2z + s2z /Mi Vji = Vij = (i/Ne)h2s2z for i < j - b C ( G L S ) = T V ° 1 R - V a r h b C ( G L S ) i = T ° 1

Just how well does the Breeders’ Equation work? Sheridan (1988) compared realized heritability estimates with estimates of heritability obtained from resemblances between relatives in the base populations Punch-line: Good, but not great, fit in many settings Problems with a wider meta-analysis is that standard errors are often not presented nor is the data presented in a form that allows their calculation.

Comparison of realized and (relative-based) Heritability estimates Species Significant Differences NS difference Total Drosophila 14 (23%) 47 (77%) 61 Tribolium 7 (27%) 19 (73%) 26 Mice/Rats 6 (18%) 28 (82%) 34 Poultry/Quail 5 (45%) 6 (55%) 11 Swine/Sheep 8 (53%) 7 (47%) 15

Asymmetric Selection Response Divergent Selection Experiment: Select some replicate lines for increased trait value, other for decreased value Expectation: roughly equal response in up and down directions, R = h2S Rc Sc Often an asymmetric response is observed, with a significant difference in the slope of up vs. down-selection lines

Potential Causes: I. Design Defects Different selection differentials (Plot is Rc vs. t, not Rc vs. Sc) Drift (sample size not sufficiently large) Scale effects Undetected environmental trends Transient effects from previous selection Decay of epistatic response Undetected selection on correlated traits

Scale effects Transform to a log scale When the trait biologically cannot go below a specific value (i.e., 0), as we down-select towards zero, expect less response.

Potential Causes: II. Nonlinear Parent-Offspring regression + h2S - S - h2S + S + h2S

Major gene with dominance What can cause a non-linear parent-offspring regression? Major gene with dominance G x E Departures from normality

Potential Causes: III. Inbreeding depression True genetic response in the absence of inbreeding Change in mean due to inbreeding depression. Depresses upward response, Enhances downward response

Potential Causes: IV. Genetic Asymmetry • Requires changes in allele frequencies. • The same absolute change in an allele frequency can result rather different changes in the variance in the + vs. - change direction. • This results in departures in the additive genetic variance in up vs. down-selected lines, and hence changes in h2 and response.

Additive variance, VA, with no dominance (k = 0) If p =1/2, then VA is the same for p+d and p-d d Allele frequency, p VA d If p = 1/2, VA different for p+d and p-d

Additive variance, VA, with complete dominance (k = 1)

Additive variance, VA, with overdominance (k = 10)

Control Populations z = π + g d e E ( z ° ) = g h S Until now, we have been ignoring the bias caused by not accounting for any environmental trend. One way to deal with this is to include an unselected control population in the design Mean of selection population in generation t Genetic mean of selection population, random effect, expected value = h2Sc(t), var = drift variance z s ; t = π + g d e c Shared environmental trend, random effect, mean 0, var = s2d Mean of control population in generation t Genetic mean of control population, random effect with expected value = 0 and var = drift variance Hence, E ( z s ; t ° c ) = g h 2 S C -

Estimating trends with a control population ° R t = ( z s ; c ) 1 C S - The use of a control also accounts for inbreeding depression Complication 1: If G x E is present, then E ( z s ; t ° c ) = h 2 S C + d - Complication 2: Selection inbreeds a population quicker, so control must to comparatively inbred to fully account for inbreeding depression

Divergent Selection Designs An alternative experimental design to remove a common environmental trend is the divergent selection design Mean of up-selected line z u ; t = π + g d e Mean of down-selected line Response estimated by R t = ( z u ; ° 1 ) d C S - Note that this design also accounts for inbreeding depression (assuming up/down lines equally inbred)

Variance in Response R ( t ) = z ° g + e R ( t ) = z ° π + g d e We have been assuming that we can ignore s2d. With a control line and/or divergent selection, don’t have to worry about this. Control: R C ( t ) = z s ; ° c π + g d e - The common dt term cancels R C ( t ) = z s u ; ° d g + e - Divergent design Again, common dt term cancels

æ [ R ( t ) ] = f + B h ' A Design ft A B (t > t’) æ [ R ( t ) ; ] The resulting variance and covariances in response become æ 2 [ R C ( t ) ] = f + B h z ' A æ [ R C ( t ) ; ] = 2 f + B h z ' A o r < Design ft A B (t > t’) Selection in a single direction, no control fs,t 1/Ns 1/Ms,t Selection in a single direction, with control fs,t + fc,t 1/Ns + + 1/Nc 1/Ms,t + 1/Mc,t Divergent Selection, no control fu,t + fd,t 1/Nu + + 1/Nd 1/Mu,t + 1/Md,t

Variance with a Control Control populations are not without a cost. When does the use of a control population result in a reduced variance? Variance w/ control - variance without control = æ 2 ( R C t ) = µ N + 1 M ∂ h z ° d - t æ 2 z h N > d Hence (ignoring M terms), However, this approach runs the risk of an undetected directional environmental trend compromising the estimated heritability. Regardless of the value of s2d, if sufficient generations are used, the optimal design (in terms of giving the smallest expected variance in response) is not to use a control.

Optimal Experimental Design The coefficient of variance (CV) provides one measure for comparing different designs C V [ R ( t ) ] = æ E Design E [ R(t) ] CV [ R(t) ] Selection in one direction, with control th2isz (2/Nt)1/2/hi Selection in one direction, no control (1/Nt)1/2/hi Divergent Selection, no control 2th2isz (1/2Nt)1/2 /hi CV scales with Nt = total # over the entire experiment

Example Suppose we plan to select the upper 5% of the population on a trait with h2 = 0.25 How large must N be to give a CV of 0.01 when no control is used? p = 0.05 --> i = 2.06. Assuming drift variance dominates s2d, then CV = 0.01 = (1/Nt)1/2/hi = (1/Nt)1/2/(0.5*2.06) or Nt = 1/(0.01*0.5*2.06)^2 = 9426 Hence, we must have at least 9,426 selected parents over the course of the experiment

Nicholas' Criterion Alternative criterion for choosing Nt suggested by Nicholas Suppose we wish a certain probability that the ACTUAL response will be at least b of the expected response This is just a unit normal µ P r ( R C t ) > Ø E [ ] = ° æ 1 ∂ U V - = 1/CV Add E [Rc(t)] to each side, divide each by s[Rc(t)] Solve for Nt to give desired probability Note for b = 1, that Pr(U > 0) = 1/2, so that 50% of the time the actual response exceeds the expected response.

Example Again suppose i = 2.05 and h2 =0.25. What value of NT is required for a 95% probability of the observed response is at least 90% of its expected value? Here b = 0.9 and since Pr( U > -1.65) = 0.95 We have (b -1)/CV = -0.01/CV = -1.65, or CV = 0.01/1.65 Here CV = (1/0.5*2.06)/(Nt)1/2 Solving CV = 0.01/1.65 gives Nt = 257

Mixed-Model estimation PROVIDED that we have the full pedigree of individuals in the selection experiment, we can use mixed-model methodology (e.g., BLUP & REML) Power: Mixed-model accounts for ALL the covariances in the sample, not just those between means in different generations, but also ALL of the covariances between related individuals.

Basic model: the so-called animal model Trait value of jth individual from generation i y i j = π + a e Additive genetic value Vectorize the data as y = B @ 1 2 . t C A ; w h e r i n y = 1 π + a e The (simple) model becomes Here, Aii = (1+fj), Aij = 2Qij With additional fixed effects, y = X b + a e

b a = 1 n X The estimated mean in generation k is the average of the estimated breeding values in generation k, b a k = 1 n X j Interesting complication: The BLUP estimate of a requires a prior estimate of the heritability h2. The relationship matrix A fully accounts for the effects of drift and the generation of linkage disequilibrium (assuming the infinitesimal model holds).