Introduction to Genetic Analysis Ecology and Evolutionary Biology, University of Arizona Adjunct Appointments Molecular and Cellular Biology Plant Sciences.

Introduction to Genetic Analysis Ecology and Evolutionary Biology, University of Arizona Adjunct Appointments Molecular and Cellular Biology Plant Sciences Epidemiology & Biostatistics Animal Sciences Bruce Walsh jbwalsh@u.arizona.edu

Outline Mendelian Genetics –Genes, Chromosomes & DNA –Mendel’s laws –Linkage –Linkage disequilibrium Quantitative Genetics –Fisher’s decomposition of Genetic value –Fisher decomposition of Genetic Variances –Resemblance between relatives –Searching for the underlying genes

Mendelian Genetics Following a single (or several) genes that we can directly score Phenotype highly informative as to genotype

Mendel’s Genes Genes are discrete particles, with each parent passing one copy to its offspring. Let an allele be a particular copy of a gene. In Diploids, each parent carries two alleles for every gene, one from each parent Each parent contributes one of its two alleles (at random) to its offspring For example, a parent with genotype Aa (a heterozygote for alleles A and a) has a 50% probability of passing an A allele onto its offspring and a 50% probability of passing along an a allele.

Example: Pea seed color Mendel found that his pea lines differed in seed color, with a single locus (with alleles Y and g) determining green vs. yellow YY (Y homozygote) --> yellow phenotype Yg (heterozygote) --> yellow phenotype gg (g homozygote) --> green phenotype Note that in this simple case, each genotype maps to a single phenotype Likewise, the phenotype can tell us about the underlying Genotype. Green = gg, Yellow = carries Y allele Y is dominant to g, g is recessive to Y Cross Yg x Yg. Offspring are 1/4 YY, 1/2 Yg, 1/4 gg 3/4 yellow peas, 1/4 green peas Cross Yg x gg. Offspring are 1/2Yg, 1/2 gg, 1/2 yellow, 1/2 green

Dealing with two (or more) genes For 7 pea traits, Mendel observed Independent Assortment The genotype at one locus is independent of the second RR, Rr - round seeds, rr - wrinkled seeds Pure round, green (RRgg) x pure wrinkled yellow (rrYY) F1 --> RrYg = all round, yellow What about the F2? YY, Yg - yellow seeds, gg - green seeds

Let R- denote RR and Rr. R- are round. Note in F2, Pr(R-) = 1/2 + 1/4 = 3/4, Pr(rr) = 1/4 Likewise, Y- are YY or Yg, and are yellow PhenotypeGenotypeFrequency Yellow, round Y-R- (3/4)*(3/4) = 9/16 Yellow, wrinkled Y-rr (3/4)*(1/4) = 3/16 Green, round ggR- (1/4)*(3/4) = 3/16 Green, wrinkled ggrr (1/4)*(1/4) = 1/16 Or a 9:3:3:1 ratio

Mendel was wrong: Linkage PhenotypeGenotypeObservedExpected Purple longP-L-284215 Purple roundP-ll2171 Red longppL-2171 Red roundppll5524 Bateson and Punnet looked at flower color: P (purple) dominant over p (red ) pollen shape: L (long) dominant over l (round) Excess of PL, pl gametes over Pl, pL Departure from independent assortment PPLL x ppll --> PL/pl F1

Chromosomal theory of inheritance It was soon postulated that Genes are carried on chromosomes, because chromosomes behaved in a fashion that would generate Mendel’s laws. Early light microscope work on dividing cells revealed small (usually) rod-shaped structures that appear to pair during cell division. These are chromosomes. We now know that each chromosome consists of a single double-stranded DNA molecule (covered with proteins), and it is this DNA that codes for the genes.

Humans have 23 pairs of chromosomes (for a total of 46) 22 pairs of autosomes (chromosomes 1 to 22) 1 pair of sex chromosomes -- XX in females, XY in males Humans also have another type of DNA molecule, namely the mitochondrial DNA genome that exists in tens to thousands of copies in the mitochondria present in all our cells mtDNA is usual in that it is strictly maternally inherited. Offspring get only their mother’s mtDNA.

Linkage If genes are located on different chromosomes they (with very few exceptions) show independent assortment. Indeed, peas have only 7 chromosomes, so was Mendel lucky in choosing seven traits at random that happen to all be on different chromosomes? Problem: compute this probability. However, genes on the same chromosome, especially if they are close to each other, tend to be passed onto their offspring in the same configuation as on the parental chromosomes.

Consider the Bateson-Punnet pea data Let PL / pl denote that in the parent, one chromosome carries the P and L alleles (at the flower color and pollen shape loci, respectively), while the other chromosome carries the p and l alleles. Unless there is a recombination event, one of the two parental chromosome types (PL or pl) are passed onto the offspring. These are called the parental gametes. However, if a recombination event occurs, a PL/pl parent can generate Pl and pL recombinant gametes to pass onto its offspring. Linkage --> excess of parental gametes

Let c denote the recombination frequency --- the probability that a randomly-chosen gamete from the parent is of the recombinant type (i.e., it is not a parental gamete). For a PL/pl parent, the gamete frequencies are Gamete typeFrequencyExpectation under independent assortment PL(1-c)/21/4 pl(1-c)/21/4 pLc/21/4 Plc/21/4 Parental gametes in excess, as (1-c)/2 > 1/4 for c < 1/2 Recombinant gametes in deficiency, as c/2 < 1/4 for c < 1/2 In Bateson data, Freq(ppll) = 55/381 =0.144. Freq(ppll) = [(1-c)/2] 2, Solving gives c = 0.24

Linkage is our friend While linkage (at first blush) may seem a complication, it is actually our friend, allowing us to map genes --- determining which genes are on which chromosomes and also fine-mapping their position on a particular chromosome Historically, the genes that have been mapped have direct effects on phenotypes (pea color, fly eye color, any number of simple human diseases, etc. ) In the molecular era, we are often concerned with molecular markers, variations in the DNA sequence that typically have no effect on phenotype

Molecular Markers You and your neighbor differ at roughly 22,000,000 nucleotides (base pairs) out of the roughly 3 billion bp that comprises the human genome Hence, LOTS of molecular variation to exploit SNP -- single nucleotide polymorphism. A particular position on the DNA (say base 123,321 on chromosome 1) that has two different nucleotides (say G or A) segregating STR -- simple tandem arrays. An STR locus consists of a number of short repeats, with alleles defined by the number of repeats. For example, you might have 6 and 4 copies of the repeat on your two chromsome 2s

Gametes and Gamete Frequencies freq(AABB)=freq(ABjfather)freq(ABjmother) freq(AaBB)=freq(ABjfather)freq(aBjmother) +freq(aBjfather)freq(ABjmother) When we consider two (or more) loci, we follow gametes Under random mating, gametes combine at random, e.g.

Linkage disequilibrium freq(AB)=freq(A)freq(B) freq(ABC)=freq(A)freq(B)freq(C) At LE, alleles in gametes are independent of each other: When linkage disequilibrium (LD) present, alleles are no longer independent --- knowing that one allele is in the gamete provides information on alleles at other loci freq(AB)6=freq(A)freq(B) The disequilibrium between alleles A and B is given by D AB =freq(AB)°freq(A)freq(B)

Forces that Generate LD Selection Drift Migration (admixture) Mutation Population structure (stratification)

freq(AB)=freq(A)freq(B)+D AB D(t)=D(0)(1c) t ° The Decay of Linkage Disequilibrium The frequency of the AB gamete is given by LE value Departure from LE If recombination frequency between the A and B loci is c, the disequilibrium in generation t is Initial LD value Note that D(t) -> zero, although the approach can be slow when c is very small Not surprising that very tightly-linked markers (c <<0.01) are often in LD

Key Mendelian Concepts Genes, Chromosomes & DNA “Classical” vs Molecular markers Linkage –Parental gametes in excess. Alleles at nearby loci tend to segregate together Linkage disequilibrium (LD) –Excess of parental gametes seen in any particular cross –LD implies in the population that there is a non- random association of allele –Unlinked alleles can show LD due to population structure

Quantitative Genetics The analysis of traits whose variation is determined by both a number of genes and environmental factors Phenotype is highly uninformative as to underlying genotype

Complex (or Quantitative) trait No (apparent) simple Mendelian basis for variation in the trait May be a single gene strongly influenced by environmental factors May be the result of a number of genes of equal (or differing) effect Most likely, a combination of both multiple genes and environmental factors. Example: Blood pressure, cholesterol levels –Known genetic and environmental risk factors

Phenotypic distribution of a trait Consider a specific locus influencing the trait For this locus, mean phenotype = 0.15, while overall mean phenotype = 0

Goals of Quantitative Genetics Partition total trait variation into genetic (nature) vs. environmental (nurture) components Predict resemblance between relatives –If a sib has a disease/trait, what are your odds? Find the underlying loci contributing to genetic variation – QTL -- quantitative trait loci Deduce molecular basis for genetic trait variation

Dichotomous (binary) traits Presence/absence traits (such as a disease) can still (and usually do) have a complex genetic basis Consider a DS locus underlying a disease, with alleles D and d, where allele D significantly increases your disease risk In particular, Pr(disease | DD) = 0.5, so that the Penetrance of genotype DD is 50% Suppose Pr(disease | Dd ) = 0.2, Pr(disease | dd) = 0.05 dd individuals can rarely display the disease, largely because of exposure to adverse environmental conditions

If freq(d) = 0.9, what is Prob (DD | show disease) ? freq(disease) = 0.1 2 *0.5 + 2*0.1*0.9*0.2 + 0.9 2 *0.05 = 0.0815 From Bayes’ theorem, Pr(DD | disease) = Pr(disease |DD)*Pr(DD)/Prob(disease) = 0.1 2 *0.5 / 0.0815 = 0.06 (6 %) dd individuals can give rise to phenocopies 5% of the time, showing the disease but not as a result of carrying the risk allele Pr(Dd | disease) = 0.442, Pr(dd | disease) = 0.497

Basic model of Quantitative Genetics Basic model: P = G + E Phenotypic value -- we will occasionally also use z for this value Genotypic value Environmental value G = average phenotypic value for that genotype if we are able to replicate it over the universe of environmental values, G = E[P] G x E interaction --- G values are different across environments. Basic model now becomes P = G + E + GE

Q1Q1Q1Q1 Q2Q1Q2Q1 Q2Q2Q2Q2 CC + a(1+k)C + 2a CC + a + dC + 2a C -aC + dC + a 2a = G(Q 2 Q 2 ) - G(Q 1 Q 1 ) d = ak =G(Q 1 Q 2 ) - [G(Q 2 Q 2 ) + G(Q 1 Q 1 ) ]/2 d measures dominance, with d = 0 if the heterozygote is exactly intermediate to the two homozygotes k = d/a is a scaled measure of the dominance Contribution of a locus to a trait

Example: Apolipoprotein E & Alzheimer’s GenotypeeeEeEE Average age of onset68.475.584.3 2a = G(EE) - G(ee) = 84.3 - 68.4 --> a = 7.95 ak =d = G(Ee) - [ G(EE)+G(ee)]/2 = -0.85 k = d/a = 0.10Only small amount of dominance

Fisher’s (1918) Decomposition of G One of Fisher’s key insights was that the genotypic value consists of a fraction that can be passed from parent to offspring and a fraction that cannot. π G = X G ij ¢freq(Q i Q j ) Mean value, with Average contribution to genotypic value for allele i Since parents pass along single alleles to their offspring, the  i (the average effect of allele i) represent these contributions G ij =π G +Æ i +Æ j +± ij b G ij =π G +Æ i +Æ j The genotypic value predicted from the individual allelic effects is thus G ij °G ij =± ij b Dominance deviations --- the difference (for genotype A i A j ) between the genotypic value predicted from the two single alleles and the actual genotypic value, Consider the genotypic value G ij resulting from an A i A j individual

G ij =π G +2Æ 1 +(Æ 2 °Æ 1 )N+± ij 2Æ 1 +(Æ 2 °Æ 1 )N= 8 > < > : 2Æ 1 forN=0;e.g,Q 1 Q 1 Æ 1 +Æ 1 forN=1;e.g,Q 1 Q 2 2Æ 1 forN=2;e.g,Q 2 Q 2 G ij =π G +Æ i +Æ j +± ij Fisher’s decomposition is a Regression Predicted value Residual error A notational change clearly shows this is a regression, Independent (predictor) variable N = # of Q 2 alleles Regression slope Intercept Regression residual

0 12 N G G 22 G 11 G 21 Allele Q 1 common,  2 >  1 Slope =  2 -  1 Allele Q 2 common,  1 >  2 Both Q 1 and Q 2 frequent,  1 =  2 = 0

GenotypeQ1Q1Q1Q1 Q2Q1Q2Q1 Q2Q2Q2Q2 Genotypic value 0a(1+k)2a Consider a diallelic locus, where p 1 = freq( Q 1 ) π G =2p 2 a(1+p 1 k) Mean Allelic effects Æ 2 =p 1 a[1+k(p 1 °p 2 )] Æ 1 =°p 2 a[1+k(p 1 °p 2 )] Dominance deviations ± ij =G ij °π G °Æ i °Æ j

Average effects and Additive Genetic Values A(G ij )=Æ i +Æ j A= n X k =1 ≥ Æ ( k ) i +Æ ( k ) k ¥ ( ) The  values are the average effects of an allele A key concept is the Additive Genetic Value (A) of an individual Why all the fuss over A? Suppose father has A = 10 and mother has A = -2 for (say) blood pressure Expected blood pressure in their offspring is (10-2)/2 = 4 units above the population mean. Offspring A = Average of parental A’s KEY: parents only pass single alleles to their offspring. Hence, they only pass along the A part of their genotypic Value G

Genetic Variances G ij =π g +(Æ i +Æ j )+± ij æ 2 (G)= n X k =1 æ 2 (Æ ( k ) i +Æ ( k ) j )+ n X k =1 æ 2 (± ( k ) ij ) æ 2 G =æ 2 A +æ 2 D æ 2 (G)=æ 2 (π g +(Æ i +Æ j )+± ij )=æ 2 (Æ i +Æ j )+æ 2 (± ij ) As Cov(  ) = 0 Additive Genetic Variance (or simply Additive Variance) Dominance Genetic Variance (or simply dominance variance)

Key concepts (so far)  i = average effect of allele i –Property of a single allele in a particular population (depends on genetic background) A = Additive Genetic Value (A) –A = sum (over all loci) of average effects –Fraction of G that parents pass along to their offspring –Property of an Individual in a particular population Var(A) = additive genetic variance –Variance in additive genetic values –Property of a population Can estimate A or Var(A) without knowing any of the underlying genetical detail (forthcoming)

æ 2 D =2E[± 2 ]= m X i =1 m X j =1 ± 2 ij p i p j æ 2 D =(2p 1 p 2 ak) 2 æ 2 A =2p 1 p 2 a 2 [1+k(p 1 °p 2 )] 2 One locus, 2 alleles: Q 1 Q 1 Q 1 Q 2 Q 2 Q 2 0 a(1+k) 2a Dominance effects additive variance When dominance present, asymmetric function of allele frequencies Equals zero if k = 0 This is a symmetric function of allele frequencies æ 2 A =2E[Æ 2 ]=2 m X i =1 Æ 2 i p i Since E[  ] = 0, Var(  ) = E[(  -  a ) 2 ] = E[  2 ]

Additive variance, V A, with no dominance (k = 0) Allele frequency, p VAVA

Complete dominance (k = 1) Allele frequency, p VAVA VDVD

Epistasis Additive Genetic value Dominance value -- interaction between the two alleles at a locus Additive x Additive interactions -- interactions between a single allele at one locus with a single allele at another Additive x Dominant interactions -- interactions between an allele at one locus with the genotype at another, e.g. allele A i and genotype B kj Dominance x dominance interaction --- the interaction between the dominance deviation at one locus with the dominance deviation at another. These components are defined to be uncorrelated, (or orthogonal), so that æ 2 G =æ 2 A +æ 2 D +æ 2 AA +æ 2 AD +æ 2 DD

Heritability Central concept in quantitative genetics Proportion of variation due to additive genetic values –h 2 = V A /V P –Phenotypes (and hence V P ) can be directly measured –Breeding values (and hence V A ) must be estimated Estimates of V A require known collections of relatives

Key observations The amount of phenotypic resemblance among relatives for the trait provides an indication of the amount of genetic variation for the trait. If trait variation has a significant genetic basis, the closer the relatives, the more similar their appearance

Genetic Covariance between relatives Genetic covariances arise because two related individuals are more likely to share alleles than are two unrelated individuals. Sharing alleles means having alleles that are identical by descent (IBD): both copies of can be traced back to a single copy in a recent common ancestor. No alleles IBD One allele IBD Both alleles IBD

Parent-offspring genetic covariance Cov(G p, G o ) --- Parents and offspring share EXACTLY one allele IBD Denote this common allele by A 1 G p =A p +D p =Æ 1 +Æ x +D 1 x G o =A o +D o =Æ 1 +Æ y +D 1 y IBD allele Non-IBD alleles

All white covariance terms are zero. By construction,  and D are uncorrelated By construction,  from non-IBD alleles are uncorrelated By construction, D values are uncorrelated unless both alleles are IBD

Cov(Æ x ;Æ y )= Ω 0ifx6=y;i.e.,notIBD Var(A)=2ifx=y;i.e.,IBD Var(A)=Var(Æ 1 +Æ 2 )=2Var(Æ 1 ) sothat Var(Æ 1 )=Cov(Æ 1 ;Æ 1 )=Var(A)=2 Hence, relatives sharing one allele IBD have a genetic covariance of Var(A)/2 The resulting parent-offspring genetic covariance becomes Cov(G p,G o ) = Var(A)/2

Half-sibs The half-sibs share one allele IBD occurs with probability 1/2 The half-sibs share no alleles IBD occurs with probability 1/2 Each sib gets exactly one allele from common father, different alleles from the different mothers Hence, the genetic covariance of half-sibs is just (1/2)Var(A)/2 = Var(A)/4

Full-sibs Paternal allele not IBD [ Prob = 1/2 ] Maternal allele not IBD [ Prob = 1/2 ] -> Prob(zero alleles IBD) = 1/2*1/2 = 1/4 Paternal allele IBD [ Prob = 1/2 ] Maternal allele IBD [ Prob = 1/2 ] -> Prob(both alleles IBD) = 1/2*1/2 = 1/4 Prob(exactly one allele IBD) = 1/2 = 1- Prob(0 IBD) - Prob(2 IBD) Each sib gets exact one allele from each parent

IBD alleles ProbabilityContribution 01/40 1 2Var(A)/2 2 4Var(A) +Var(D) Resulting Genetic Covariance between full-sibs Cov(Full-sibs) = Var(A)/2 + Var(D)/4

Genetic Covariances for General Relatives Let r = (1/2)Prob(1 allele IBD) + Prob(2 alleles IBD) Let u = Prob(both alleles IBD) General genetic covariance between relatives Cov(G) = rVar(A) + uVar(D) When epistasis is present, additional terms appear r 2 Var(AA) + ruVar(AD) + u 2 Var(DD) + r 3 Var(AAA) +

Sample Covariances Cov(monozygotic twins) = V A + V D + Cov(E) Cov(dizygotic twins) = V A /2 + V D /4 + Cov(E) Cov(parent, offspring) = V A /2 Hence, can estimate genetic variance components From phenotypic covariances using known sets of relatives More generally, use all comparisons between relatives in a complex pedigree (REML estimate of variances)

Relative risks for binary traits Let z 1 and z 2 denote the trait state (0,1) in two relatives. Recurrence risk, K R (for relatives of type R) = Prob(z 2 =1 | z 1 = 1) James’ identity: K R = K + Cov(z 1,z 2 )/K where K = Prob(z=1), i.e., the population prevalence Relative risk, R = K R /K Risch’s identity: R = 1 + Cov(z 1,z 2 )/K 2

Searching for QTLs: Marker-Trait Associations Key: With linkage = excess of parential gametes MQ/mq father -- M associated with QTL allele Q (which increases trait value over q). Comparing mean trait values in offspring for paternal-M vs. paternal-m will show (for sufficiently large sample) a significant difference. Since the phase may differ across parents (e.g., mother might be Mq/mQ), critical to contrast markers alleles from each parent separately I. Within a pedigree

Searching for QTLs: Marker-Trait Associations II. Population-level linkage disequilibrium Key: With LD, covariance between alleles For very tightly-linked markers (less than 1 cM), might expect some population-level disequilibrium Hence, can contrast (say) M vs. m grouped over all individuals to look for a difference in trait value btw the two groups. If marker locus is sufficiently close to a QTL, LD might be present and an marker-trait association detected. Complication: Population structure can generate a covariance btw unlinked markers

Key concepts P = G + E = A + D + I + E Var(G) = Var(A) + Var(D) + Var(I) Phenotypic covariances can be used to estimate components of Var(G) h 2 = Var(A)/Var(P) is the heritability of a trait, measure of how parents & offspring resemble each other Can use linkage (within a pedigree) or linkage disequilibrium (within a population) to search for QTLs via marker-trait associations

Introduction to Genetic Analysis Ecology and Evolutionary Biology, University of Arizona Adjunct Appointments Molecular and Cellular Biology Plant Sciences.

Similar presentations

Presentation on theme: "Introduction to Genetic Analysis Ecology and Evolutionary Biology, University of Arizona Adjunct Appointments Molecular and Cellular Biology Plant Sciences."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Genetic Analysis Ecology and Evolutionary Biology, University of Arizona Adjunct Appointments Molecular and Cellular Biology Plant Sciences.

Similar presentations

Presentation on theme: "Introduction to Genetic Analysis Ecology and Evolutionary Biology, University of Arizona Adjunct Appointments Molecular and Cellular Biology Plant Sciences."— Presentation transcript:

Similar presentations

About project

Feedback