Lecture 21: Quantitative Traits I Date: 11/05/02 Review: covariance, regression, etc Introduction to quantitative genetics
Joint Density Suppose you have two random variables X and Y. For each entity in the sample you take from the population you obtain a random pair (X i, Y i ). The joint density function for two random variables is given by p(x,y) such that
Conditional Density The conditional density function for the random variable Y conditional on a particular realization x of the random variable X is p(y|x). The joint and conditional densities are related: p(x,y) = p(y|x)p(x), where p(x) is the marginal density
Independent Random Variables If the random variables X and Y are independent, then p(x,y) = p(x)p(y). As a consequence, we also have p(y|x) = p(y) and p(x|y) = p(x). If X and Y are dependent, then the relationship between then is either linear or nonlinear. In either case, a linear relationship is a first approximation to the true relationship.
Conditional Expectation The expectation of the product is given by The conditional expectation is given by If X and Y are independent, then
Covariance Definition: The covariance of two random variables is defined as and an covariance estimator is given by
Meaning of Covariance The sign of the covariance implies something about how Y responds to changes in X or vice versa.
Regression and History The use of a linear function as a first approximation of the relationship between two random variables is termed regression. In fact, regression was first introduced in a genetics context. Galton (1889) studied the average height of parents X and the height of offspring Y.
Least Squares Method The least squares method finds estimates a and b for the coefficients of the assumed linear relationship by minimizing the mean squared error. Because means, variances and covariances are available from phenotypic data, these estimates are particularly useful.
Homoscedasticity and Heteroscedasticity Definition: If the variance in the residual error is constant regardless of the dependent variable x, then E(Y|X) is homoscedastic. Definition: There is heteroscedasticity in the data if the residual variance depends on the value of the dependent variable x. Transformations exist to achieve homoscedasticity.
Example: Regression Suppose Cov(X,Y) = 10, Var(X) = 10, Var(Y) = 15 and the means E(X) = E(Y) = 0. Regress X on Y and Y on X. Then the intercept estimates is and the slope estimates are
Correlation Definition: The correlation of two random variables X and Y is defined as with estimator Correlations are scale independent:
Correlations imply linear association. r is the standardized regression coefficient that is obtained if x and y are scaled to have unit variance. r 2 measures the proportion of the Var(Y) that is explained if E(Y|X) is linear. Correlation (cont)
Example: rats
Quantitative Traits Definition: A quantitative trait is one with a continuous distribution. In other words, it is a trait that is measured not counted. It is assumed that quantitative traits are controlled by many genes, each with small effect. Environmental effects are also important. Definition: A quantitative trait locus (QTL) is locus controlling a quantitative trait.
Quantitative Trait – A Model We start with a very simple scenario. Suppose there is one locus determining a quantitative trait. Suppose that there are only two alleles at this locus. We seek a model for this scenario. This model will have two parameters to account for the two degrees of freedom (when location is removed) among the 3 possible outcomes (genotypes).
Let the phenotypic value of a particular genotype be z. When environment has an effect, z is a consequence of both the underlying genotype and the environment. We can write z = G + E. Here G is the genotypic value and it is the expected phenotypic value averaged over all environments. Each of the three genotypes has an associated genotypic value. Phenotypic and Genotypic Value
Quantitative Trait – Model A 0(1+k)a2a2a
Quantitative Trait – Model B -a-ada
Model A – Parameter Meanings 0(1+k)a2a2a Value of kGenetic Interpretation 0 1 >1overdominance <-1underdominance
Example – Scaling Quantitative Trait The Booroola (B) gene influences fecundity in Merino sheep. GenotypeMean Litter Size BB2.66 Bb2.17 bb1.48
Gene Content Definition: The B 1 gene content of a genotype is the number of copies of the B 1 allele. The gene content for allele B 1 in genotype B 1 B 2 is 1. At a single locus, the genotypic value is not a linear relationship on gene content, unless k=0. 0 a 2a2a (1+k)a 2a2a
Partitioning Genotypic Value Let N 1 be the number of B 1 alleles in the genotype. Let N 2 be the number of B 2 alleles in the genotype. Then, multiply regress the genotypic value on independent variables N 1 and N 2. Assume again only two alleles, then N 1 = 2 – N 2. Call N 2 = N.
Predicted Genotypic Values The predicted genotypic values are given as
Weighted Mean of ’s is 0
Slope of Regression Line Recalling the formula for the slope of a regression line, we have We will now find expressions for the covariance and variance.
Derivation of Slope
Average Effect of Allelic Substitution The previous derivations were completed under the assumption of random mating and HWE. The slope is the change in genotypic value associated with the addition of one more allele. To add one more B 2 allele, one must replace another B 1 allele with B 2, so it is also called the average effect of allelic substitution. Except under additivity (k=0), this substitution effect can only be defined in terms of the population.
Partitioning Genetic Variance Because we now have a linear function for genotypic value G we can write the total genetic variance as but there is no covariance term.
Additive and Dominance Components The first term is additive genetic variance: the amount of variance of G that is explained by regression on N. The second term is dominance genetic variance: the residual variance for the regression. We seek an expression for both terms.
Derivation of Slope GenotypeB1B1B1B1 B1B2B1B2 B2B2B2B2 N012 G0(1+k)a2a2a Freq.* GN0(1+k)a4a4a N2N2 014
Derivation of Genetic Variance Components
Genetic Variance Components Both components depend on gene frequencies (conditional on population from which they are derived). When k=0 (purely additive effects), then additive genetic variance is maximized when heterozygosity is maximized. With dominance k>0, additive genetic variance is maximized at higher frequencies of the recessive allele. Rare recessive alleles cause little genetic variance because they are not often expressed.
Why? Why have we partitioned the genotypic value into additive and dominance components? When a parent transmits alleles to the offspring, the dominance deviation in the parent is irrelevant because only one gamete is transmitted. May think of additive genetic component as the heritable component of an individual’s genotypic value.
Average Excess There are multiple ways to measure the effect of an allele. The effect of allelic substitution is one. The additive effect i is another. Definition: The average excess of an allele is the difference between the mean genotypic value of individuals carrying at least one copy of the allele and the mean genotypic value of a random individual form the entire population.
Average Excess with Random Mating
Breeding Value Definition: The breeding value of an individual is the sum of the additive effects. GenotypeBreeding Value B1B1B1B1 2121 B1B2B1B2 1+21+2 B2B2B2B2 22
Breeding Value and Random Mating Consider the expected genotypic values of progeny produced by parental genotypes. GenotypeBreeding Value Progeny Expected Genotypic Value Deviation B1B1B1B1 2121 ap 2 (1+k) 11 B1B2B1B2 1+21+2 a[p 2 + (1+k)/2] 1 + 2 )/2 B2B2B2B2 22 a[2p 2 + p 1 (1+k)] 22
How to Use this Analysis A common approach today. Identify candidate loci that are potential contributors to the variation of the trait of interest. Genotype a random selection of individuals are identified by molecular markers. Determine average phenotypic values within each genotypic class. Estimate fraction of total phenotypic variation associated with candidate locus.
An Example Consider the Booroola gene example in two random mating populations where the gene B is present at gene frequencies 0.5. BBBbbb G ij P ij
Booroola (cont) ValueEstimate Mean genotypic value Additive Effects B = b = Breeding ValuesA BB = 0.59; A Bb = 0; A bb = 0.59 Dominance Deviations D BB = -0.05; D Bb = 0.05; D bb = -0.05
Booroola (cont) ValueEstimate Additive Genetic Variance Dominance Genetic Variance Total Genetic Variance0.1765