Download presentation
1
Scott Williamson and Carlos Bustamante
Non-stationary population genetic models with selection: Theory and Inference Scott Williamson and Carlos Bustamante Cornell University
2
Inferring natural selection from samples
Statistical tests of the neutral theory (lots) Methods for detecting selective sweeps (lots) Parametric inference: estimating selection parameters, etc. Quantification of selective constraint, deleterious mutation
3
The demography problem
Many existing methods assume random mating, constant population size These assumptions don’t apply in most natural populations The effect of demography can mimic the effect of natural selection
4
Natural selection and population growth
Inferring selection from the frequency spectrum while correcting for demography The McDonald-Kreitman test: does recent population growth cause you to misidentify negative selection as adaptive evolution?
5
The frequency spectrum: an example
Site A G C T 163 975 1972 2188 3529 4424 4961 5286 7019 1 2 3 4 5 Sequence Count Frequency class: Frequency class Ancestral Derived
6
Natural selection and the frequency spectrum
Equilibrium neutral and positively selected frequency spectra Neutral 2Ns=2 Count Frequency class
7
Natural selection and the frequency spectrum
Equilibrium neutral and negatively selected frequency spectra Neutral 2Ns=-2 Count Frequency class
8
Natural selection vs. demography
Non-stationary neutral and equilibrium selected frequency spectra Population growth, neutral Equilibrium, 2Ns=-2 Count Frequency class
9
How do we distinguish selection from demography?
McDonald-Kreitman approach: Use a priori information to classify changes as “neutral” (e.g. synonymous, non-coding) or “potentially selected” (e.g. non-synonymous) Putatively neutral changes are treated as a standard for patterns of neutral evolution in a particular sample Potentially selected sites are compared to the neutral standard Can we develop a neutral standard for the frequency spectrum?
10
Comparing frequency spectra for different classes of mutation
Observed frequency spectra This talk: Likelihood ratio test of neutrality at potentially selected sites, using information from the neutral sites Biologically meaningful measure of the difference between the two spectra Putatively neutral Potentially selected Count Frequency class
11
Comparing frequency spectra for different classes of mutation
Observed frequency spectra A model-based approach: Fit a neutral demographic model to estimate demographic parameters Putatively neutral Potentially selected Count Given those parameter estimates, fit a selective demographic model to estimate selection parameters, test hypotheses Frequency class
12
Comparing frequency spectra for different classes of mutation
Observed frequency spectra Requirements: Demographic model Frequency spectrum predictions from the model under neutrality Frequency spectrum predictions from the model subject to natural selection Putatively neutral Potentially selected Count Frequency class
13
Theory: population growth model
2-epoch model NC Population size NA =NA/NC time now Model parameters: ,
14
Theory: predicting the frequency spectrum
Definitions: xi Number of sites in frequency class i f(q,t;) Distribution of allele frequency, q, at time t n Sample size Predictions:
15
Theory: the distribution of allele frequency
Poisson Random Field approach (Sawyer and Hartl 1992): Use single-locus diffusion theory to predict the distribution of allele-frequency If sites are independent (i.e. in linkage equilibrium) and identically distributed, then the single-locus theory applies across sites To get f, we need to solve the diffusion equation:
16
Theory: time-dependent solution, neutral case
The forward equation under neutrality: Kimura’s (1964) solution, given some initial allele frequency, p:
17
Theory: time-dependent solution, neutral case
Applying Kimura’s solution to the 2-epoch model: ancestral mutations Kimura’s (1964) solution, given some initial allele frequency, p: Distribution of allele frequency:
18
Theory: time-dependent solution, neutral case
Expected frequency spectrum after a change in population size (=0.01) 0.8 0.6 P(i,n;,0.01) 0.4 0.2 1 2 3 4 5 6 7 8 9 frequency class
19
Theory: time-dependent solution, neutral case
Multinomial likelihood: Maximum likelihood estimates of and Likelihood ratio test of population growth
20
Comparing frequency spectra for different classes of mutation
Observed frequency spectra Requirements: Demographic model Frequency spectrum predictions from the model under neutrality Frequency spectrum predictions from the model subject to natural selection Putatively neutral Potentially selected Count Frequency class
21
Theory: time-dependent solution, selected case
The forward equation with selection: where =2NCs Initial condition:
22
Theory: time-dependent solution, selected case
Numerically solve the forward equation using the Crank-Nicolson finite differencing scheme Use this approximation of f to evaluate the likelihood function: Fix and to their MLEs from the neutral data Optimize the likelihood for . Likelihood ratio test of neutrality:
23
Theory: time-dependent solution, selected case
How can we be sure that the numerical solution actually works? Von Neumann stability analysis: solution is unconditionally stable Numerical solution converges to the stationary distribution after ~4NC generations Comparison with time-dependent neutral predictions: Kimura, Crank, and Nicolson all agree with each other
24
Human Polymorphism Data
From Stephens et al. (2001) 80 individuals, geographically diverse ancestry 313 genes, 720 kb sequenced ~3000 SNPs (72% non-coding, 13% synonymous, 15% non-synonymous)
25
Results for non-coding changes, assuming neutrality
Model MLEs ln(L) 2-epoch = 0.016 = 0.13 Equilibrium neutral (P0, d.f. 2) Goodness-of-fit (P=0.54, d.f. 76)
26
Results for non-synonymous changes, categorized by Grantham’s distance
Category S P-value conservative 136 -2.24 0.52 moderate 137 -6.08 0.07 radical 107 -8.44 0.02 all nonsyn 380 -4.88 0.10
27
Ongoing work and future directions
Simulate, simulate, simulate How robust is the method to different types of demographic forces? How does linkage among some sites affect the analysis? How does estimation error affect the LRTs? Numerical solution for different demographic scenarios (e.g. bottleneck, population structure) Variable selective effects among new mutations
28
The McDonald-Kreitman test
Sn Number of non-synonymous segregating sites Dn Number of non-synonymous fixed differences Ss Number of synonymous segregating sites Ds Number of synonymous fixed differences Adaptive evolution Negative selection Extensions: Sawyer and Hartl (1992), Rand and Kann (1996), Smith and Eyre-Walker (2002), Bustamante et al. (2002), others
29
Demography and the McDonald-Kreitman test
Robust to different demographic scenarios because it implicitly conditions on the underlying genealogy (see Nielsen 2001) However, under some demographic scenarios it’s possible to misidentify the type of selection Weak negative selection with population growth When the population size is small, non-synonymous deleterious mutations might be fixed by drift Once the population size becomes large, the level of non-synonymous polymorphism would be reduced (relative to the level of synonymous polymorphism)
30
Demography and the McDonald-Kreitman test
Over what range of parameter values might you misidentify negative selection as adaptive evolution? How large is the effect? Eyre-Walker (2002): Addressed these questions, finding that recent population growth or bottlenecks can cause you to misidentify negative selection Assumed that levels of polymorphism and fixation rates changed instantaneously with population size
31
Demography and the McDonald-Kreitman test
where tdiv is the divergence time, measured in 2NC generations
32
Demography and the McDonald-Kreitman test
=0.1, tdiv=10 10 10 =0.1, tdiv=4 1 1 0.01 0.1 1 0.01 0.1 1 Expected Neutrality Index (NI) =1, tdiv=4 =1, tdiv=10 10 10 1 1 0.01 0.1 1 0.01 0.1 1 (=NA/NC)
33
Demography and the McDonald-Kreitman test: Preliminary results
It is possible to misidentify negative selection for some parameter combinations But…the parameter range over which this is true is probably smaller than previously thought, as is the magnitude of the effect
34
Summary Model-based approach to correcting for demography while inferring selection Evidence for very recent population growth in humans Reasonable estimates of selection parameters for classes of non-synonymous changes McDonald-Kreitman test: negative selection + population growth problem not as severe as previously thought Numerical methods for solving the diffusion are fast, accurate, and fun!
35
Acknowledgements Collaborator: Carlos Bustamante Data: Genaissance Pharmaceuticals Helpful discussions: Bret Payseur, Rasmus Nielsen, Matt Dimmic, Jim Crow, Hiroshi Akashi, Graham Coop
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.