Download presentation
Presentation is loading. Please wait.
Published byAbigail Newman Modified over 9 years ago
1
Sampling distributions of alleles under models of neutral evolution
2
1. Genetic drift and mutation 2. Coalescent 3. Pairwise differences and numbers of segregating sites 4. Population with time-varying size Plan
3
Mathematical model for sampling distributions of alleles Genetic drift Mutation
4
Genetic drift Alleles:A1:A1: A2:A2: Replication = sampling with replacement A 1 – becomes fixed A 2 – becomes lost G1G1 G2G2 GnGn...
5
Mutation GkGk G k+1 Mutation introduces genetic variability to the evolution process
6
Mutation Mutation follows a Poisson process with intensity measured per locus (per site) per generation. Spatial characterization of places and effects caused, further specifies a mutation model. Most often applied are: infinite sites model, where it is assumed that each mutation takes place at a DNA site that never mutated before; infinite alleles model, where each mutation produces an allele never present in a population before; recurrent mutation model, where multiple changes of the nucleotide at a site are possible; stepwise mutation model, where mutation acts bidirectionally, increasing or reducing the number of repeats of a fixed DNA motif.
7
Infinite sites model Mutation configuration in the infinite sites model is fully described by a map between numbers of sequences and numbers of mutations 1 2 3 4 5 Mutations 123456 Sequences
8
Statistics of mutations (segregating sites)
9
Number of segregating sites 1 2 3 4 5 Mutations 123456 Sequences S=6
10
Pairwise differences 1 2 3 4 5 123456 Sequences No of differences d 23 = 3 Mutations Average number of pairwise differences = 3
11
Histogram of pairwise differences No of pairs No of differences 0 1 2 0 123456 3
12
Classes of mutations 1 2 3 4 5 123456 Mutation of class 2 Sequences Mutations
13
Histogram of classes of mutations Frequency Class of mutation 0 0.5 12 1
14
Coalescence method One looks at the past of an n - sample of sequences taken at present. Possible events that happen in the past are coalescences leading to common ancestors of sequences, and mutationsalong branches of ancestral tree.
15
Coalescence method Present Past Generation 1, ( =1) Generation 2, ( =2) Generation k, ( =k).... ……. n - sample Population size 2N
16
Coalescence – pairwise statistics Two sequences. For each sequence draw randomly a parent in generation 1 ( =1), then for each parent draw randomly a (grand) parent in generation 2, ( =2) ….. COMMON ANCESTOR 2 (i) - probability that a COMMON ANCESTOR of the two sequences lived in generation i ( =i)
18
Coalescence – continuous time approximations Population time scale 1 unit = 2N generations Mutational time scale 1 unit = 1/2 generations
19
Coalescence n-sample k independent, exponentially distributed random variables mutation intensity N population's effective size = 4 Nproduct parameter t = 2 mutational time scale ( - is time in number of generations).
20
Coalescence method The use of coalescence theory allows efficient formulation of appropriate models and gives a good basis for approaching model analysis problems, like hypotheses testing or parameter estimation. s5s5 s4s4 s3s3 6 5 4 3 2 1 t4t4 t2t2 123 4 5 s2s2 t3t3 t5t5
21
Independence of metrics (coalescence times) and topology Topologies of trees (with ordered branches) are all equally probable. Metrics (distributions of branch lengths) of trees are determined by coalescence process which, in turn, depends on population parameters.
22
Coalescence – statistics of pairwise differences Assume mutational time – scale. Then mutations occur with intensity = 1/2. Let A 2 denote a Z + random variable defined by number of segregating sites between sample 1 and sample 2. T – random variable given by coalescence time t. Conditional probability that A 2 =n is Poisson with =t P[A 2 =n | T=t] =
24
Coalescence – population with time varying size
25
Population with time-varying size Population's effective size N(t) changes in time, then product parameter is also a time function (t)= 4 N(t) Joint probability density function:
26
How the history of population size N(t) (t) is encoded in histograms of pairwise differences and mutation classes ?
27
Pairwise differences
28
no of differences 051015 0 1 2 3 4 5 6 7 time t (t) Pairwise differences I 0510152025 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 frequency
29
no of differences frequency Pairwise differences II 051015202530 0 0.02 0.04 0.06 0.08 0.1 0.12 time t (t) 051015202530 0 20 40 60 80 100 120
30
no of differences frequency Pairwise differences III 051015202530 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 time t (t) 051015 0 50 100 150 200 250
31
Mutation classes Frequencies are computed under the assumption that mutaion intensity is low
32
Mutation classes I 051015 0 1 2 3 4 5 6 7 time t SNP type N(t) frequency 12345678910 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.3 N(t)=const
33
SNP type time t N(t) frequency 12345678910 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 051015 0 50 100 150 200 250 N(t)=N 0 exp(rt) 0.5 N 0 r=10 Mutation classes II
34
time t SNP type N(t) frequency 12345678910 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 051015202530 0 20 40 60 80 100 120 0.6 Mutation classes III
35
Conclusions Different histories of population sizes lead to different sampling distributions of alleles Parametric models of different form (exponential, stepwise, logistic) can lead to similar (difficult to distinguish) distributions of alleles Estimation of population size history from DNA data can be unstable
36
Models versus data Parametric and nonparametric estimation of population size histories from DNA samples Testing hypotheses on values of parameters under parametric models, testing hypotheses of time constant versus time varying scenario
37
Models versus data 02468101214161820 0 50 100 150 200 250 300 350 400 450 051015202530 0 0.02 0.04 0.06 0.08 0.1 0.12 Data on worldwide distribution of mtDNA pairwise differences R. Cann et. al. 1987 Estimation of history of human population size
38
Models versus data II 2468101214161820 0 0.1 0.2 0.3 0.4 0.5 0.6 Histogram of classes of mutations. Data on worldwide distribution of mtDNA pairwise differences R. Cann et. al. 1987
39
Models versus data III Data on types of 44 SNPs randomly located in the genome Picoult, Newberg 2000 0.50.550.60.650.70.750.80.850.90.951 0 1 2 3 4 5 6 7 8 00.511.522.533.544.55 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Parametric estimates of N(t) based on the above data
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.