Lecture 15: Analysis of Selection Experiments

Lecture 15: Analysis of Selection Experiments

Variance in the Response to Selection
R = h2S is just the expected value of the response, but there is a variance about this value. Hence, identically-selected replicate lines are still expected to show variation is response The major source of such variation is genetic drift

æ ( t ) = + z = π + g d e E ( z ) = π + g d æ ( t ) = M
Consider the mean in generation t The effect of any major environmental trend in generation t The mean breeding value in generation t Mean of the original population Error in estimating the environmental-corrected mean breeding value from the mean phenotype of a sample z t = π + g d e Under this model, the mean of a replicate series of lines is R = h2S E ( z t ) = π + g d The variance is given by Variance in the environmental trend (mean is set to be zero) Variance in the breeding value at generation t æ 2 z ( t ) = g + e d æ 2 e ( t ) = z M In generation t, Mt individuals are measured. An upper bound on the error variance is

Variance in Breeding Values
Two sources of variation (i) Sampling variance in the founding lines (ii) Genetic drift (inbreeding) within each line æ 2 g ( t ) = 1 M + f ∂ A h z Inbreeding in generation t Size of the founding population t ∂ " 2 f = 1 N e # ' o r < ( ) - The mean breeding values in different generations of the same replicate line are correlated, ( ) æ g t ; = z 1 M + 2 f ∂ h o r <

Variance-covariance structure within a line
Assume the initial sample is sufficiently large so that we can ignore 1/M0 Variance: s2(gt) = (t/Ne)h2s2z Covariance: s2(gt, gx) = (x/Ne)h2s2z for x < t These expressions (which are often called the pure-drift approximations) will prove useful in the statistical analysis of selection response

The Realized Heritability
Since R = h2 S, this suggests h2 = R/S, so that the ratio of the observed response over the observed differential provides an estimate of the heritability, the realized heritability Obvious definition for a single generation of response. What about for multiple generations of response? Cumulative selection response = sum of all responses R C ( t = i 1 ) X

R ( t ) = h S + e ( T ) R b h = S P b = S ( ) R
Cumulative selection differential = sum of the S’s S C ( t ) = 1 i X (1) The Ratio Estimator for realized heritability = total response/total differential, ( T ) R b h 2 r = C S (2) The Regression Estimator --- the slope of the regression of cumulative response on cumulative differential P b = t S C ( ) R 2 Regression passes through the origin (R=0 when S=0). Slope = R C ( t ) = h 2 r S + e

Note x axis is differential, NOT generations
Ratio estimator = 17.4/56.9 = 0.292 60 \ Cumulative Differential 5 10 15 20 Cumulative Response Slope = 0.270 = Regression estimator Note x axis is differential, NOT generations

Standard error of the Ratio Estimator
Ratio Estimator, h2r = RT/ST Recall that the variance for the mean in generation t is s2(gt) + s2(e) + s2(d) Assume M0 >> 1 and that we can ignore the environmental trend variance, then s2(RT) = s2(gt) + s2(e) = (T/Ne)h2s2z + s2z /MT Number of individuals sampled in generation T s2(RT/ST ) = s2(RT ) /(ST )2 This follows since s2(ax) = a2s2 (x) Hence, s2(h2r) = [ (T/Ne)h2s2z + s2z /MT ] /(ST )2

SE for (OLS) Regression Estimator
The basic linear model is X = Sc b = bc y = R R C ( t ) = h 2 r S + e b C ( O L S ) = ≥ X T 1 y R - Under the OLS framework (residuals homoscedastic and uncorrelated), the linear model has the design matrix X just the vector Sc of cumulative differential and y = R P = T i 1 S C ( ) R 2 ( ) - V a r h b C O L S i = æ 2 e ≥ X T 1 b æ 2 e = 1 T X i ≥ R C ( ) h r S -

Problems with OLS regression approach
Although the OLS regression estimator for realized heritability is very widely used, it has fatal problems OLS assumes the residuals are homoscedastic and uncorrelated. In reality, the covariance structure is s2(ei) = (i/Ne)h2s2z + s2z /Mi s2(ek, ei) = (i/Ne)h2s2z for i < k Hence, the GLS regression is more appropriate The OLS gives unbiased estimates of the realized heritability, but it seriously underestimates its SE

GLS regression Estimate
C ( t ) = h 2 r S + e X = Sc b = bc y = R The variance-covariance matrix V has elements h2 is what we are trying to estimate. Use an iterative approach. Try some initial value, use GLS to update this value, use the new value for next round of updating. continue until values stabilize We can directly estimate the phenotypic variance from the data Vii = (i/Ne)h2s2z + s2z /Mi Vji = Vij = (i/Ne)h2s2z for i < j - b C ( G L S ) = T V 1 R - V a r h b C ( G L S ) i = T 1

Just how well does the Breeders’ Equation work?
Sheridan (1988) compared realized heritability estimates with estimates of heritability obtained from resemblances between relatives in the base populations Punch-line: Good, but not great, fit in many settings Problems with a wider meta-analysis is that standard errors are often not presented nor is the data presented in a form that allows their calculation.

Comparison of realized and (relative-based) Heritability estimates
Species Significant Differences NS difference Total Drosophila 14 (23%) 47 (77%) 61 Tribolium 7 (27%) 19 (73%) 26 Mice/Rats 6 (18%) 28 (82%) 34 Poultry/Quail 5 (45%) 6 (55%) 11 Swine/Sheep 8 (53%) 7 (47%) 15

Asymmetric Selection Response
Divergent Selection Experiment: Select some replicate lines for increased trait value, other for decreased value Expectation: roughly equal response in up and down directions, R = h2S Rc Sc Often an asymmetric response is observed, with a significant difference in the slope of up vs. down-selection lines

Potential Causes: I. Design Defects
Different selection differentials (Plot is Rc vs. t, not Rc vs. Sc) Drift (sample size not sufficiently large) Scale effects Undetected environmental trends Transient effects from previous selection Decay of epistatic response Undetected selection on correlated traits

Scale effects Transform to a log scale
When the trait biologically cannot go below a specific value (i.e., 0), as we down-select towards zero, expect less response.

Potential Causes: II. Nonlinear Parent-Offspring regression
+ h2S - S - h2S + S + h2S

Major gene with dominance
What can cause a non-linear parent-offspring regression? Major gene with dominance G x E Departures from normality

Potential Causes: III. Inbreeding depression
True genetic response in the absence of inbreeding Change in mean due to inbreeding depression. Depresses upward response, Enhances downward response

Potential Causes: IV. Genetic Asymmetry
• Requires changes in allele frequencies. • The same absolute change in an allele frequency can result rather different changes in the variance in the + vs. - change direction. • This results in departures in the additive genetic variance in up vs. down-selected lines, and hence changes in h2 and response.

Additive variance, VA, with no dominance (k = 0)
If p =1/2, then VA is the same for p+d and p-d d Allele frequency, p VA d If p = 1/2, VA different for p+d and p-d

Additive variance, VA, with complete dominance (k = 1)

Additive variance, VA, with overdominance (k = 10)

Control Populations z = π + g d e E ( z ° ) = g h S
Until now, we have been ignoring the bias caused by not accounting for any environmental trend. One way to deal with this is to include an unselected control population in the design Mean of selection population in generation t Genetic mean of selection population, random effect, expected value = h2Sc(t), var = drift variance z s ; t = π + g d e c Shared environmental trend, random effect, mean 0, var = s2d Mean of control population in generation t Genetic mean of control population, random effect with expected value = 0 and var = drift variance Hence, E ( z s ; t c ) = g h 2 S C -

Estimating trends with a control population
R t = ( z s ; c ) 1 C S - The use of a control also accounts for inbreeding depression Complication 1: If G x E is present, then E ( z s ; t c ) = h 2 S C + d - Complication 2: Selection inbreeds a population quicker, so control must to comparatively inbred to fully account for inbreeding depression

Divergent Selection Designs
An alternative experimental design to remove a common environmental trend is the divergent selection design Mean of up-selected line z u ; t = π + g d e Mean of down-selected line Response estimated by R t = ( z u ; 1 ) d C S - Note that this design also accounts for inbreeding depression (assuming up/down lines equally inbred)

Variance in Response R ( t ) = z ° g + e R ( t ) = z ° π + g d e
We have been assuming that we can ignore s2d. With a control line and/or divergent selection, don’t have to worry about this. Control: R C ( t ) = z s ; c π + g d e - The common dt term cancels R C ( t ) = z s u ; d g + e - Divergent design Again, common dt term cancels

æ [ R ( t ) ] = f + B h ' A Design ft A B (t > t’) æ [ R ( t ) ; ]
The resulting variance and covariances in response become æ 2 [ R C ( t ) ] = f + B h z ' A æ [ R C ( t ) ; ] = 2 f + B h z ' A o r < Design ft A B (t > t’) Selection in a single direction, no control fs,t 1/Ns 1/Ms,t Selection in a single direction, with control fs,t + fc,t 1/Ns + + 1/Nc 1/Ms,t + 1/Mc,t Divergent Selection, no control fu,t + fd,t 1/Nu + + 1/Nd 1/Mu,t + 1/Md,t

Variance with a Control
Control populations are not without a cost. When does the use of a control population result in a reduced variance? Variance w/ control - variance without control = æ 2 ( R C t ) = N + 1 M ∂ h z d - t æ 2 z h N > d Hence (ignoring M terms), However, this approach runs the risk of an undetected directional environmental trend compromising the estimated heritability. Regardless of the value of s2d, if sufficient generations are used, the optimal design (in terms of giving the smallest expected variance in response) is not to use a control.

Optimal Experimental Design
The coefficient of variance (CV) provides one measure for comparing different designs C V [ R ( t ) ] = æ E Design E [ R(t) ] CV [ R(t) ] Selection in one direction, with control th2isz (2/Nt)1/2/hi Selection in one direction, no control (1/Nt)1/2/hi Divergent Selection, no control 2th2isz (1/2Nt)1/2 /hi CV scales with Nt = total # over the entire experiment

Example Suppose we plan to select the upper 5% of the population
on a trait with h2 = 0.25 How large must N be to give a CV of 0.01 when no control is used? p = > i = Assuming drift variance dominates s2d, then CV = 0.01 = (1/Nt)1/2/hi = (1/Nt)1/2/(0.5*2.06) or Nt = 1/(0.01*0.5*2.06)^2 = 9426 Hence, we must have at least 9,426 selected parents over the course of the experiment

Nicholas' Criterion Alternative criterion for choosing Nt suggested by Nicholas Suppose we wish a certain probability that the ACTUAL response will be at least b of the expected response This is just a unit normal P r ( R C t ) > Ø E [ ] = æ 1 ∂ U V - = 1/CV Add E [Rc(t)] to each side, divide each by s[Rc(t)] Solve for Nt to give desired probability Note for b = 1, that Pr(U > 0) = 1/2, so that 50% of the time the actual response exceeds the expected response.

Example Again suppose i = 2.05 and h2 =0.25. What value
of NT is required for a 95% probability of the observed response is at least 90% of its expected value? Here b = 0.9 and since Pr( U > -1.65) = 0.95 We have (b -1)/CV = -0.01/CV = -1.65, or CV = 0.01/1.65 Here CV = (1/0.5*2.06)/(Nt)1/2 Solving CV = 0.01/1.65 gives Nt = 257

Mixed-Model estimation
PROVIDED that we have the full pedigree of individuals in the selection experiment, we can use mixed-model methodology (e.g., BLUP & REML) Power: Mixed-model accounts for ALL the covariances in the sample, not just those between means in different generations, but also ALL of the covariances between related individuals.

Basic model: the so-called animal model
Trait value of jth individual from generation i y i j = π + a e Additive genetic value Vectorize the data as y = B @ 1 2 . t C A ; w h e r i n y = 1 π + a e The (simple) model becomes Here, Aii = (1+fj), Aij = 2Qij With additional fixed effects, y = X b + a e

b a = 1 n X The estimated mean in generation k is the average
of the estimated breeding values in generation k, b a k = 1 n X j Interesting complication: The BLUP estimate of a requires a prior estimate of the heritability h2. The relationship matrix A fully accounts for the effects of drift and the generation of linkage disequilibrium (assuming the infinitesimal model holds).

Lecture 15: Analysis of Selection Experiments

Similar presentations

Presentation on theme: "Lecture 15: Analysis of Selection Experiments"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 15: Analysis of Selection Experiments

Similar presentations

Presentation on theme: "Lecture 15: Analysis of Selection Experiments"— Presentation transcript:

Similar presentations

About project

Feedback