Polymorphism Polymorphism: when two or more alleles at a locus exist in a population at the same time. Nucleotide diversity: P = xixjpij considers # differences and allele frequency ij Freq (x) 1 2 3 Seq 1 G A G G T G C A A C 0.4 Seq 2 G A G G A C C A A C 0.5 Seq 3 G A G C T G G A A G 0.1 1 p12 p13 2 0.2 p23 3 0.3 0.5 P = (0.4)(0.5)(0.2) + (0.4)(0.1)(0.3) + (0.5)(0.1)(0.5) = 0.077 p12 p13 p23
In Theory: Under infinite-sites model: Expectation (P 4Nem = frequency of heterozygotes per nucleotide site
Nucleotide diversity is low in humans
Expectation (K Polymorphism is also estimated by: ATCCGGCTTTCGA K = 3 for-->ATCCGAATTTCGA ATTCGCCTTTCGA K= Number of segregating (variable) sites in a sample of alleles. In Theory: Expectation (K Where a = 1 + 1/2 + 1/3 +……..1/n-1
Coalescent Process t2 t3 t4 t5 Gene Tree tm is time for coalescence from m to m-1 sequences t3 t4 t5 Gene Tree
Coalescent Process a b c d e f g h Gene Tree The geneology of n sequences has 2(n-1) branches. n = number of external branches. c d n-2 are internal e f g h Gene Tree
How long will the coalescence process take? Simplest case: If pick two random gene copies, probability that the second is the same as the first is 1 / (2N). This is the probability that two alleles coalesce in previous generation. It follows that 1 - 1 / (2N) is the probability that two sequences were derived from different sequences in the preceding generation. Therefore, the probability that 2 sequences derived from the same ancestor 2 generations ago (grandparent) is 1 - 1 / (2N) x 1 / (2N). It can be shown that the probability that two sequences were derived from the same ancestor t generations ago is: [1 - 1 / (2N)t x (1 / (2N)] ~ (1 / (2N) e(-t/(2N)
Consider probability of common ancestry for: [1 - 1 / (2N)g-1 x (1 / (2N)] Because N is in denominator, the probability will depend on sample size Consider probability of common ancestry for: Generations ago Prob(N=5) Prob(N=10) 1 0.400 0.200 2 0.320 0.182 3 0.256 0.162 It can be shown that the average time back to common ancestry of a pair of genes in a diploid population is 2N, and the average time back to common ancestry of all gene copies is 4N.
Large pop Time back to common ancestor Small pop
Coalescence with no mutation The average degree of relationship increases with time. All of the gene copies in a population can be traced back to a single ancestral gene. A population will eventually become monomorphic for one allele or another, with this probability determined by initial allele frequencies.
Coalescence with mutation If each lineage experiences m mutations per generation, then the number of base pair differences between them will be #dif = 2mtca. If the average time to coalescence is 2N for two randomly chosen gene copies, then #dif = 2 m (2N). Therefore, expect the average number of base pair differences between gene copies to be greater in a larger population.
Total length of branches of gene tree I + L = J Internal branches External branches Total time length + = Now consider mutation among branches during the coalescent process. i) + e) = Mutations internal branches Mutations external branches Total number of mutations in gene tree + = In theory: total number of mutations equals the number of segregating sites (K)
Testing for Selective Neutrality Using the difference in estimates of polymorphism to detect deviation from neutrality. Tajima’ s Test (1989): P- K / a D = V(P- K/a) Normalizing factor Rationale: Pand K are differentially influenced by the frequency of alleles.
P K/a Few alleles at intermediate frequency > < Many low frequency, variable alleles D = 0 neutral prediction D > 0 balancing selection D < 0 directional selection
Fu and Li’s Test (1993): Using the difference in # mutations in gene tree to detect deviation from neutrality. i - e / (a - 1) D = V[i - e / (a - 1) Rationale: An equivalent number of mutations is expected between interior verses exterior branches of a neutral gene tree.
i e Few alleles at intermediate frequency > < Many low frequency, variable alleles D = 0 neutral prediction D > 0 balancing selection D < 0 directional selection
Gene genealogies under no selection, positive selection, balancing selection, and background selection. No Selection : 7 neutral mutations accumulate since the time of the last common ancestor. D = 0
Consider the Effects of Selection on Neutral Sites Linked to a Selected Site Positive Selection : neutral variation at linked sites will be eliminated (swept away) as the advantageous allele quickly is fixed in the population. This process is also called hitch-hiking. Time D < 0
Consider the Effects of Selection on Neutral Sites Linked to a Selected Site Balancing Selection : neutral variation at linked sites accumulates during the long period of time that both allele lineages are maintained. Time D > 0
Consider the Effects of Selection on Neutral Sites Linked to a Selected Site Background Selection : gene lineages become extinct not only by chance, but because of deleterious mutations to which they are linked, which eliminates some gene copies. Time D < 0
Problem: Background selection and hitchhiking are contrasting processes that lead to the same pattern. How to differentiate? Dramatic examples of reduced polymorphism=hitchhiking. Less dramatic examples=background selection.