The Neutral Theory M. Kimura, 1968

Slides:



Advertisements
Similar presentations
IMPRS workshop Comparative Genomics 18 th -21 st of February 2013 Lecture 4 Positive selection.
Advertisements

Chapter 19 Evolutionary Genetics 18 and 20 April, 2004
MIGRATION  Movement of individuals from one subpopulation to another followed by random mating.  Movement of gametes from one subpopulation to another.
Random fixation and loss of heterozygosity
Change in frequency of the unbanded allele (q) as a function of q for island populations. Equilibrium points a)Strong selection for q, little migration.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Atelier INSERM – La Londe Les Maures – Mai 2004
Signatures of Selection
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
14 Molecular Evolution and Population Genetics
From population genetics to variation among species: Computing the rate of fixations.
Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Scott Williamson and Carlos Bustamante
Positive selection A new allele (mutant) confers some increase in the fitness of the organism Selection acts to favour this allele Also called adaptive.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Hidenki Innan and Yuseob Kim Pattern of Polymorphism After Strong Artificial Selection in a Domestication Event Hidenki Innan and Yuseob Kim A Summary.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Lecture 21: Tests for Departures from Neutrality November 9, 2012.
MIGRATION  Movement of individuals from one subpopulation to another followed by random mating.  Movement of gametes from one subpopulation to another.
Models of Molecular Evolution I Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.1 – 7.2.
Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012.
PHYLOGENETICS CONTINUED TESTS BY TUESDAY BECAUSE SOME PROBLEMS WITH SCANTRONS.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Selectionist view: allele substitution and polymorphism
Lecture 20 : Tests of Neutrality
NEW TOPIC: MOLECULAR EVOLUTION.
Molecular evolution Part I: The evolution of macromolecules.
Genomics of Adaptation
The plant of the day Pinus longaevaPinus aristata.
Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
BME 130 – Genomes Lecture 20 Population Genomics I.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
LBA ProtPars. LBA Prot Dist no Gamma and no alignment.
Lecture 6 Genetic drift & Mutation Sonja Kujala
Evolution and Population Genetics
Hudson Kreitman Aguadé 1987
Data analyses Course code: ZOO560 Week 3
Evolutionary genomics can now be applied beyond ‘model’ organisms
Genetic Linkage.
Population Genetics Chapter 4.
Detection of the footprint of natural selection in the genome
MIGRATION Movement of individuals from one subpopulation to another followed by random mating. Movement of gametes from one subpopulation to another followed.
Polymorphism Polymorphism: when two or more alleles at a locus exist in a population at the same time. Nucleotide diversity: P = xixjpij considers.
Evolution of gene function
Signatures of Selection
The neutral theory of molecular evolution
Allele frequency Time.
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Linkage and Linkage Disequilibrium
Mechanisms of Evolution
Diversity and selection of the MHC class II genes in canids
Genetic Linkage.
Models of Sequence Evolution
Detection of the footprint of natural selection in the genome
Testing the Neutral Mutation Hypothesis
The ‘V’ in the Tajima D equation is:
1. "HARD" Selection can 'cost' a population individuals:
MIGRATION Movement of individuals from one subpopulation to another followed by random mating. Movement of gametes from one subpopulation to another followed.
Molecular evolution: traditional tests of neutrality
The Evolution of Populations
Genetic Drift, followed by selection can cause linkage disequilibrium
Genetic Linkage.
Genetic drift in finite populations
Testing for Selective Neutrality
Evolution by Genetic Drift : Main Points (p. 231)
Evolution by Genetic Drift : Main Points (p. 231)
Presentation transcript:

The Neutral Theory M. Kimura, 1968 Most observed DNA substitutions are likely to be neutral = no effect on fitness. Polymorphisms arise through new mutations. - deleterious mutations are rapidly removed by purifying selection, which must be pervasive - positive selection is likely to be rare - thus, remaining observed polymorphisms likely to be neutral Given a ~constant mutation rate, can convert the # of substitutions into time of divergence since speciation = molecular clock theory. Neutral changes evolve by genetic drift, not natural selection. * Most are probably lost, some can become fixed in the population

The Nearly-Neutral Theory T. Ohta, 1973 Many slightly deleterious (or slightly advantageous) substitutions are not selected against efficiently if population sizes are large. * Small populations are more subject to drift (e.g. random events). * Selection is ‘slower’ in large populations … therefore many weakly deleterious substitutions have yet to be removed by selection. ** Therefore, considering the population size N is an important concept (or more importantly, the ‘effective’ population size Ne .. see more later) Thus, many substitutions that are nearly neutral can evolve mostly by drift ** Practically what this means is that SOME substitutions found in extant sequences can be slightly deleterious & have yet to be removed

Theory of Darwinian Evolution (Selectionism) This opposing theory states that much of inter-specific variation may NOT be neutral but selected for through positive selection.

Theory of Darwinian Evolution (Selectionism) This opposing theory states that much of inter-specific variation may NOT be neutral but selected for through positive selection. Deleterious mutation Compensatory changes can be positively selected (& context dependent)

Theory of Darwinian Evolution (Selectionism) This opposing theory states that much of inter-specific variation may NOT be neutral but selected for through positive selection. The relative importance/frequency of neutral mutations vs. selected advantageous alleles is still debated. Clearly both are at work, likely fluctuate over time (A. Wagner: cycles of neutral change & positive selection:) Much attention is devoted to cases of positive selection, since it shapes novelty

Signatures of Selection Different types of selection leave behind different signatures on the genome Negative selection: reduces variation at the affected site(s) but also at neighboring sites through background selection

Signatures of Selection Different types of selection leave behind different signatures on the genome Negative selection: reduces variation at the affected site(s) but also at neighboring ‘linked’ neutral sites through background selection Positive selection through recent selective sweep: reduces variation flanking the selected site (even if neutral) due to hitchhiking * background and hitchiking selection are important concepts, because it means not all neutral-site evolution is stochastic (drift)

Recent selective sweeps can reduce variation flanking a selected locus Variation can recover over time due to new mutation and recombination Strength of selection & time since sweep affect degree of reduced var. If sweep goes to completion, the polymorphism is said to be ‘fixed’ in the species (otherwise a ‘partial sweep’).

Recombination rates have a major influence on selection signatures After many generations, recombination acts to shorten the region of ‘linked’ sequence. Gametes

Signatures of Selection Different types of selection leave behind different signatures on the genome Negative selection: reduces variation at the affected site(s) but also at neighboring sites through background selection Positive selection through recent selective sweep: reduces variation flanking the selected site (even if neutral) due to hitchhiking Diversifying selection can increase variation since >1 extreme alleles selected e.g. selection for diverse viral antigens to evade host immune system Balancing selection can increase variation by maintaining >1 allele in population e.g. maintained heterozygosity (sickle cell anemia) OR different alleles in different subpopulations due to fluctuating environments

Signatures of Selection Also different methods of looking for these signatures 1. Evolutionary rate within species vs. between species e.g. Ka/Ks ratio … McDonald-Kreitman test … HKA test 2. Frequency spectrum: frequency of different alleles in the population e.g. Tajima’s D … Fay & Wu’s H … Fu & Li’s D* 3. Linkage disequillibrium & Haplotype structure For all of these tests: compare REAL DATA to a MODEL of what data should look like under neutral evolution

If we assume that Ks reflects the underlying neutral rate of change: Ks (sometimes called Ds ): # of silent substitutions that encode SAME (synonymous) codon * often these changes are ASSUMED to be neutral** * given a constant rate of point mutations, Ks can be used to date a sequence ** now people realize that Ks can also be constrained by other things besides codon Ka (sometimes called DN ): # of substitutions encoding a nonsynonymous codon If we assume that Ks reflects the underlying neutral rate of change: Ka/Ks = 1 …. Rate of codon changes is the same as rate of silent changes * taken to mean NO constraint on gene sequence Ka/Ks < 1 …. Rate of codon changes is LESS than the rate of neutral change * implies deleterious codon changes were removed by purifying selection * therefore implies constraint on gene sequence Ka/Ks > 1 …. Rate of codon changes is the GREATER than rate of silent changes * implies codon changes have been selected for by positive selection

Ka/Ks test is VERY conservative in identifying positive selection Ks (sometimes called Ds ): # of silent substitutions that encode SAME (synonymous) codon * often these changes are ASSUMED to be neutral** * given a constant rate of point mutations, Ks can be used to date a sequence ** now people realize that Ks can also be constrained by other things besides codon Ka (sometimes called DN ): # of substitutions encoding a nonsynonymous codon Ka/Ks test is VERY conservative in identifying positive selection Ka/Ks = 1 …. Rate of codon changes is the same as rate of silent changes * taken to mean NO constraint on gene sequence Ka/Ks < 1 …. Rate of codon changes is LESS than the rate of neutral change * implies deleterious codon changes were removed by purifying selection * therefore implies constraint on gene sequence Ka/Ks > 1 …. Rate of codon changes is the GREATER than rate of silent changes * implies codon changes have been selected for by positive selection

McDonald-Kreitman (MK) Test for Coding Regions Polymorphism within species vs. Divergence between species Under neutrality, expect the evolutionary rate of a gene to be constant. Therefore, evolutionary rate within the species should ~ rate between species. McDonald-Kreitman (MK) Test for Coding Regions McDonald & Kreitman, 1991 Ka/Ks within species should be ~ Ka/Ks between species for neutrally evolving sequences Within species differences Between species differences Polymorphic Fixed Number of Non-synonymous changes Number of Synonymous changes

Polymorphism data Outgroup sequence(s) 4 Fixed differences between species (1 non-synonymous 3 synonymous) 3 Polymorphic sites within species (2 non-synonymous 1 synonymous)

McDonald-Kreitman (MK) Test for Coding Regions Polymorphism within species vs. Divergence between species Under neutrality, expect the evolutionary rate of a gene to be constant. Therefore, evolutionary rate within the species should ~ rate between species. McDonald-Kreitman (MK) Test for Coding Regions McDonald & Kreitman, 1991 Ka/Ks within species should be ~ Ka/Ks between species for neutrally evolving sequences Within species differences Between species differences Polymorphic Fixed Number of Non-synonymous changes 2 7 Number of Synonymous changes 42 17 Ratio within species (2/42 = 0.048) is LESS than ratio between species (7/17 = 0.41) Is this significant? Use statistics to assign p-value (G-test or Fisher’s exact) here p = 0.007

MK test for Positive Selection If rate of evolution is ~equal WITHIN and BETWEEN species (i.e. insignificant p-value): suggests Neutral Evolution. If rate of evolution WITHIN species is LESS than rate BETWEEN species:: - could be Positive Selection within the species being considered. Because rate of variation has been reduced within species due to sweep - could also be relaxed constraint in outgroup

Polymorphism data Outgroup sequence(s)

MK test for Positive Selection If rate of evolution is ~equal WITHIN and BETWEEN species (i.e. insignificant p-value): suggests Neutral Evolution. If rate of evolution WITHIN species is LESS than rate BETWEEN species:: - could be Positive Selection within the species being considered. Because rate of variation has been reduced within species due to sweep - could also be relaxed constraint in outgroup If rate of evolution WITHIN species is GREATER than rate BETWEEN species: could be several things: - loss of selective constraint: no more purifying selection to remove variation - diversifying selection: selection to increase variation - balancing selection: may be selection to maintain >1 allele - * OR can actually reflect deleterious alleles yet to be removed from the population * Deleterious alleles likely at low frequency - can redo test only on polymorphisms at medium frequency (eg. >10% in population)

A related approach generalized beyond coding sequences: The HKA test Hudson, Kreitman, & Aguade, 1987 Assumes that # segregating sites S within the species is related to the # of divergent sites D between species (dependent on the mutation rate  and Time between species) HKA compares estimated parameters at 2 (or more) genes  = 4 Ne  Mutation rate per gene per generation ‘Effective’ population size (accounting for skews in pop. dynamics, sex ratios, breeding skews) Neutral model: all genes share same Ne and have their own 

 = 4 Ne where  calculated from S1 and  = D1/2T HKA Neutral model: all genes have their own and should share same Ne Gene 1: S (# variable sites WITHIN species 1) = 3 Gene 1: D (# variable sites fixed BETWEEN species) = 4  = 4 Ne where  calculated from S1 and  = D1/2T Gene 2: S (# variable sites WITHIN species 1) = 8 Gene 2: D (# variable sites fixed BETWEEN species) = 9  = 4 Ne where  calculated from S2 and  = D2/2T and If Ne is the same at both genes, can’t reject the null model. But if Ne is Different indicates each gene has a distinct history, reject null. p-values come from coalescence simulations

We can imagine that choice of genes could have a big impact here … Multi-locus HKA test (Wright & Charlesworth, Genetics 2004): uses maximum likelihood (ML) to compare many different genes (software available) Compares the log-likelihood of a model in which ALL genes are evolving neutrally, vs. model in which one or more chosen genes are evolving non-neutrally compared to rest.

We want to USE the signatures to IDENTIFY loci subject to selection BUT …. some of these signatures can emerge due to demography: * unknown population structure * population expansions that change allele frequencies and change the selection efficiency * population contractions (‘bottlenecks’) that eliminate genetic variation This violates assumptions for many neutral models: * Random mating within population (panmictic) * Mating rate according to Poisson distribution * Constant population size

Population structure can mimic balancing selection This might appear as excess of intermediate-frequency alleles …

Population structure can mimic balancing selection … when in fact it’s really two distinct, non-mating populations

Changes in population size can mimic selection as well Population expansion can mimic positive selection (increasing frequency of rare alleles)

Changes in population size can mimic selection as well Historical bottleneck can mimic balancing selection (excess of intermediate-frequency alleles)

The power of genome-wide scans: Demographic effects should be constant across each genome.

Methods based on the Allele Frequency Spectrum 1. For each allele at a given locus, calculate the frequency. Some alleles will be at high frequencies in the population, some at low frequencies (i.e. very uncommon) Make a histogram of the % individuals that have alleles of different frequencies e.g. how many individuals have (any of the) low-frequency alleles? From Nielsen Nat Rev Gen 2005 review

Methods based on the Allele Frequency Spectrum From “A Haplotype Map of the Human Genome” Nature 2005

Methods based on the Allele Frequency Spectrum Tajima’s D (F. Tajima, 1989): takes the # of segregating sites within species (S) and also the average # difference between each pair of sequences () S = 3  = (2 + 2 + 1 + 2) + (2 + 1 + 0) + (1 + 2) +(1) = 1.4 avg. # difs between each pair of sequences 10 pairwise comparisons Tajima’s D compares S and  to estimate the proportion of low/high-frequency alleles

Methods based on the Allele Frequency Spectrum Tajima’s D (F. Tajima, 1989): takes the # of segregating sites within species (S) and also the average # difference between each pair of sequences () Negative Tajima’s D = excess of low-frequency alleles (= reduced variation) Indicates positive selection, OR nearly-neutral alleles, OR population expansion** Positive Tajima’s D = excess of intermediate-frequency alleles (low amounts of both high- and low-frequency alleles) Indicates balancing selection OR partial sweep OR population bottleneck** How can you get a p-value? Difficult to estimate - best to compare across loci

Empirical model for significance of Tajima’s D Sliding window across a locus OR Compare to several other loci From Nielsen Nat Rev Gen 2005 review From Will et al. PLoS Genetics 2010

Methods based on the Allele Frequency Spectrum Tajima’s D (F. Tajima, 1989): takes the # of segregating sites within species (S) and also the average # difference between each pair of sequences () Scores an excess of low-frequency alleles or dearth of low/high frequency alleles Fay & Wu’s H (Justin Fay & Chung-I Wu, 2000): looks for an excess of derived alleles (i.e. those not found in the ancestral state) at high frequency Designed to look for hitchhiking after positive selection Fu & Li methods (Fu & Wen-Hsiung Li, 2000): * Each test is designed to find skews at different windows of the frequency spectrum