Presentation is loading. Please wait.

Presentation is loading. Please wait.

Allele frequency Time.

Similar presentations


Presentation on theme: "Allele frequency Time."— Presentation transcript:

1 Allele frequency Time

2 Allele frequency

3 Ewens Watterson test Empirical sample of size N gives: 1) estimate of homozygosity = ∑xi2 2) Number of different alleles, n Apply these values to the expressions Compare fobs with fequilib by simulation (bootstrap resampling)

4

5

6

7 Allele frequency Time

8 Single Nucleotide Polymorphisms and site frequency
Ingman et al. 2000, Nature

9 Nine polymorphic sites in human Y-chromosomes
N=24, n = 9 haplotypes, S = 9 segregating sites π = , q =

10 Allele (Site) frequency distribution
Excess of low frequency alleles Excess of intermediate frequency alleles Neutral (Ewens) frequency spectrum Ewens (1972): expression for allele frequencies given sample size, number of observed alleles mutation - drift equilibrium (expected) Watterson (1977): compare Expected (Ewens) and observed homozygosity H=Sxi2 Slatkin (1994) Exact test of Ewens-Watterson test Nielsen (2005) Ann. Rev. Genet. 39:197 Sequence 20 alleles, record frequency of each SNP

11 Tajima’s Test p - q is negative p - q is positive π-ø < 0
Joe Mary Scott Anne David = Average number of differences between all pairs of sequences. Affected by site (allele) frequency  = Variation based on the number of segregating sites Independent of site frequency 1/i π-ø < 0 π-ø > 0 p - q is negative p - q is positive

12 External branches ~= 4Nem = q Selection affects # external branches
Human mtDNA tree (Ingman et al. 2000, Nature) Neutral, ancestral African Dtaj = -0.72, P >0.10 Admixture Africa + E. Asia Dtaj = +0.08, P >0.10 Founder event New World Dtaj = -2.45, P < 0.01 Tajima (1989): p - q Fu & Li (1993): External branches ~= 4Nem = q Selection affects # external branches Both are sensitive to demographic changes and admixture

13 Human MHC – Human Leukocye antigen (HLA) complex
Defficieny of homozygosity (excess heterozygosity) Positive Tajima’s D Balancing selection favoring heterozygotes, better able to mount diverse immune responses

14 Ewens-Watterson tests vs. Tajima’s D test
What’s the difference Allele frequency vs. sequence difference Ewens Watterson test uses alleles of ‘state’ (Fast, Slow, etc) Tajima’s D uses nucleotide divergence among alleles – more power Garrigan and Hedrick 2003

15 Tajima’s D, Selection and Demography
Polymorphism and divergence of mutations Tajima’s D, Selection and Demography Allele frequency Time Selective force qπ =/>/< qS Tajima’s D value Neutrality = 0 Deleterious mutations < negative Beneficial (sweep) < negative Balancing selection > positive Tajima’s D value Selective force Demographic force D < 0 Selective sweep Bottleneck; expansion Deleterious Bottleneck, expansion D > 0 Balancing selection Population mixture

16 Tajima’s D, Selection and Demography
Tajima’s D value Selective force Demographic force D < 0 Selective sweep Bottleneck; expansion Deleterious Bottleneck, expansion D > 0 Balancing selection Population mixture How do you use the genome to tell the difference? Compare multiple loci: selection acts on individual genes Demographic forces act on all genes Polymorphism and divergence of mutations Allele frequency Time

17 Polymorphism and divergence
polymorphism = difference within species divergence (fixed difference) = difference between species { A T A A C G A C Species 1 Species 2 A A C A { G T G G T G G T A G T A Fixed difference Polymorphic in species 1 Polymorphic in both species

18 Hudson Kreitman Aguadé 1987
Multilocus approach: Jody Hey’s web page Maximum Likelihood aaproach Wright & Charlesworth (2004) Demography affects entire genome Selection acts on single (few) loci

19 HKA Test The “footprint” of balancing selection at Adh in Drosophila
Kreitman and Hudson (1991) Genetics 127:565-82 polymorphism Adh locus Adh-dup Fast/Slow polymorphism HKA Test Adjacent silent sites in linkage disequilibrium Fast F Fast F Adh Adh-dup 16 20 Fast F Fast F Slow S 50 13 Slow S Slow S Slow S Distant sites in Linkage Equilibrium P < 0.02

20 Selective sweep in the domestication of maize from teosinte
Adapted from R.-L. Wang, et al., 1999.

21 The effect of recombination on levels of polymorphism
Marker loci # * 0.012 0.010 0.008 0.006 0.004 0.002 0.000 # Rate of re- combi- nation DNA polymorphism Aquadro, Begun & Kindahl, 1994 * Physical position along 3rd chromosome 0.1 0.05 0.000 * DNA divergence # Locus 1 33 1 26 6 Begun & Aquadro 1992 P < 0.05 Rate of recombination

22 * = beneficial mutation
Reduced polymorphism due to selective sweeps (adaptive mutations and hitchhiking) * * = beneficial mutation * No recombination:Polymorphism removed * Free recombination: Locally reduced variation Reduced polymorphism due to background selection eliminating deleterious mutations X X = deleterious mutation “mutation-free” chromosomes { No recombination:Polymorphism removed Free recombination: little effect

23 Recombination rate Supports hitchhiking model (weakly)
419 genes, 24 alleles per gene DNA polymorphism Tajima’s D Recombination rate Supports hitchhiking model (weakly) DNA divergence Recombination rate

24 Gene trees and evolutionary hypotheses
Mitochondrial “eve”-type estimations require rate constancy between species and neutrality within species What is a neutral references locus? Rand et al Genetics 138:

25 Testing for selection in mtDNA
The genetic code and DNA “phenotypes”

26 Polymorphism in mtDNA and the MK test
N S Population or family Sister species or unrelated strain B. dN/dS ‘within’ ‘between’ Polymorphism and divergence of mutations Allele frequency Time Type Polymorphic Fixed of within between mutation populations? species? Neutral Yes Yes Beneficial No Yes Deleterious Yes No Balanced Yes Yes & No dN/dS ‘within’ ‘between’ = Neutrality Index (NI) NI < 1.0 implies positive selection NI > 1.0 implies negative selection (opposite of simple dN/dS) Rand & Kann (1996) MBE Rand (2008) PLoS Biology

27 Polymorphism and Divergence at Silent and Replacement Sites
Rand&Kann 1996 MBE 13: => mildly deleterious => advantageous

28

29 N.I. is very sensitive to selection
Sawyer&Hartl (1992) Akashi (1995) Nachman (1998) Weinreich & Rand (2000) Kimura (1983) Negative Neutral Positive Stopped 3/8/11

30 MK tests for mtDNA have excess amino acid polymorphism
NI > 1 (Negative selection) Rand & Kann 1996, 1998; Weinreich&Rand 2000

31 Drift, Draft, and apparent positive selection
Sweeps remove variation, but fix common variants ND (complex I) genes have lower constraint (more variation) Sweeps and draft may fix an excess of mildly deleterious variants Meiklejohn, Montooth, Rand 2007 Trends in Genetics.

32 Fay & Wu (2002): A frequency twist to the MK test
The distribution of site frequencies is important Type of mutation Rare? Intermediate? Common? Yes Yes yes No More likely Yes No No No Yes No

33 419 genes, 24 alleles sequenced/gene, compared to D. simulans

34

35 Human Chimp How does DNA evolve?
1 ATGCCCCAACTAAATACTACCGTATGGCCCACCATAATTACCCCCATACT 50 ||||||||||||||||| ||||||| ||||||||||||||||||||||| 1 atgccccaactaaataccgccgtatgacccaccataattacccccatact 50 51 CCTTACACTATTCCTCATCACCCAACTAAAAATATTAAACACAAACTACC 100 ||| |||||||| ||| |||||||||||||||||||||| |||| |||| 51 cctgacactatttctcgtcacccaactaaaaatattaaattcaaattacc 100 101 ACCTACCTCCCTCACCAAAGCCCATAAAAATAAAAAATTATAACAAACCC 150 | ||||| ||||||||||| ||||||||||||||||| || || |||||| 101 atctacccccctcaccaaaacccataaaaataaaaaactacaataaaccc 150 151 TGAGAACCAAAATGAACGAAAATCTGTTCGCTTCATTCATTGCCCCCACA 200 ||||||||||||||||||||||||| |||||||||||| |||||||||| 151 tgagaaccaaaatgaacgaaaatctattcgcttcattcgctgcccccaca 200 201 ATCC 204 |||| 201 atcc 204

36 Measuring DNA Evolution
Align sequences between species Determine length of sequences, L Count number of differences Divergence = proportion of differences D = p-distance = (number of differences) / (length of sequence) Rate of divergence  = (sequence divergence) / (age of common ancestor)  = D / time Rate of substitution  = D / 2 x time time Example: 5 differences in 100 D = 0.05, t = 6 million years Divergence = 0.05/6x106 Divergence = 8.3 x 10-9

37 Jukes Cantor One parameter model
= rate of substitution PA(t) = ¼ + ¾ e-4at = probability that A remains A at time t PNN = ¼ + ¾ e-8at = probability that two sequences have the same nucleotide at N D = proportion of different nucleotides = 1 - PNN Dhat = 3/4(1-e-8t) K = - ¾ ln (1-4/3p) where p = proportion of nucleotide differences (# diffs./total bp)

38 Kimura two-parameter model
b a = rate of transition substitution b = rate of transversion substitution PAA(t) = ¼ + ¾ e-4bt + ½ e-2(a+b)t = probability that A remains A at time t K = ½ ln(1/[1- 2P-Q]) + ¼ ln(1/[1-2Q]) where P = proportion of transitional differences Q = proportion of transversional differences

39

40 Comparison of models P-distance Jukes Cantor Kimura 2-parameter Tamura-Nei Etc…

41 Molecular clocks Approximately constant Divergence of proteins K = •f0 Rate of substitution = Mutation rate x proportion of neutral mutations “Saturation” due to multiple Hits in DNA evolution

42 Anatomy of a phylogenetic tree
Terminal (external) nodes Taxa = OTUs = Operational taxonomic units Taxon1 Taxon2 Taxon3 Taxon4 Taxon5 Taxon6 Polytomy Non-dichotomous splitting External branch Internal branch Internal nodes Root

43 Relative rate test KAC = KBC KOC is shared Tajima test
(m1-m2)2 / (m1+m2) Chi square, df=1 Species O m1 m2 Species A Species B Species C

44

45 DNA test of neutrality Antigen binding sites: dN/dS > 1
“positive” selection Neutral prediction: amino acid (nonsynonymous) substitution rate (dN) should be lower than silent (synonymous) substitution rate (dS) True for most genes Follows from functional constraint argument Different for Major Histocompatibility Complec (MHC) loci Antigen recognition sequence shows dN > dS Rest of molecule shows dN > dS, as expected Amino acid mutations are favored in antigen recognition region Promotes diversity, better recognition of foreign peptides Rest of molecule: dN/dS < 1 Negative (purifying) selection

46 The coalescent: the genealogy of alleles as descendants
from a single common ancestral DNA strands Last parents ‘picked’ = most recent common ancestor MRCA Same parents ‘picked’ = coalescence Randomly ‘pick’ parents Sampled lineages Rosenberg & Nordborg (2003) More lineages picking parents, faster coalescence More parents to pick from = larger population size, slower coalescence

47 * * * * Drift vs. coalescence Drift Coalescence
1 allele drifts to fixation in 5 generations, going forward 5 alleles coalesce (*) to a common ancestral allele 5 generations ago

48 Expected time that n alleles persist
E(Tn) = 4N/n(n-1)

49 Felsenstein’s box of bugs model
Size of box ~ Ne number of bugs ~ number of alleles

50 Time to next coalescent event
Ne = effective population size n = number of alleles in sample T = 4Ne/n(n-1) T2 = 4Ne/2(1) = 2Ne T3 = 4Ne/3(2) = 2Ne/3 T4 = 4Ne/4(3) = 2Ne/6 T5 = 4Ne/5(4) = 2Ne/10

51 Generating random coalescent trees
i = 6 alleles Generate i-1 random numbers (x) , 0< x < 1 Rank them, and assign coalescent times Ti = -2ln(1-x) /i(i-1) Hedrick, pg 352: x6 = 0.22, T6 = X5 = 0.57, T5 = Units of 2N generations back in time.

52

53 Mutation on genealogy Allele frequency in the sample of 5 alleles: F(A2) = 1/5, f(A1) = 2/5, f(A3) = 2/5

54

55

56

57

58 Linkage and Linkage Disequilibrium
AB = 25% Ab = 25% aB = 25% ab = 25% f(Ab) = f(A) x f(b) A B B b a b A a AB = 50% Ab = 0% aB = 0% ab = 50% A B f(Ab) ≠ f(A) x f(b) B b a b A a

59

60 A locus and B locus are in Linkage Disequilibrium
D = f(AB) x f(ab) - f(Ab) x f(aB) Maximum with no recombination D -> 0 with free recombination (linkage equilibrium) When allele frequencies are intermediate: f(A) = f(a) = f(B) = f(b) = 0.5, and maximal LD occurs so that no recombinants are present: f(AB) = f(ab) = 0.5, so D = 0.5 x 0.5 – 0.0 x 0.0 = 0.25 When allele frequencies are skewed: f(A) = 0.9, f(a) = 0.1; f(B) = 0.9, f(b) = 0.1 and maximal LD occurs so that no recombinants are present, D is less than 0.25: f(AB) = 0.9, and f(ab) = 0.1, so D = 0.9 x 0.1 – 0.0 x 0.0 = 0.09

61 Linkage disequilibrium (LD) decays with distance and time
AB = (1-r)/2 Ab = r/2 aB = r/2 ab = (1-r)/2 A B a b r = Rate of recombination

62 Mutations on a genealogy
Mutation events are proportional to branch length Mutations on external branches define “singletons” Mutations on internal branches define “non-singletons”

63 Heterozygosity on Exernal vs. Internal Branches
Fu and Li test: Heterozygosity on Exernal vs. Internal Branches Total length of external branches = 4Ne Total mutations on external branches = 4Ne•u Total length of ALL branches = 4Ne • a Total mutations on ALL branches = 4Ne•a•u Total mutations on internal branches = 4Ne•u•a - 4Ne•u = 4Neu(a-1) Fu & Li statistic ~ Difference between expected number of external and internal mutations A measure of heterozygosity contributed by singleton vs. heterozygosity contributed by non-singletons 1/i E(external) - E(internal)/(a-1) G = √(Variance (E(external) - E(internal)/(a-1)))


Download ppt "Allele frequency Time."

Similar presentations


Ads by Google