Sampling distributions of alleles under models of neutral evolution.

Sampling distributions of alleles under models of neutral evolution

1. Genetic drift and mutation 2. Coalescent 3. Pairwise differences and numbers of segregating sites 4. Population with time-varying size Plan

Mathematical model for sampling distributions of alleles  Genetic drift  Mutation

Genetic drift Alleles:A1:A1: A2:A2: Replication = sampling with replacement A 1 – becomes fixed A 2 – becomes lost G1G1 G2G2 GnGn...

Mutation GkGk G k+1 Mutation introduces genetic variability to the evolution process

Mutation Mutation follows a Poisson process with intensity  measured per locus (per site) per generation. Spatial characterization of places and effects caused, further specifies a mutation model. Most often applied are:  infinite sites model, where it is assumed that each mutation takes place at a DNA site that never mutated before;  infinite alleles model, where each mutation produces an allele never present in a population before;  recurrent mutation model, where multiple changes of the nucleotide at a site are possible;  stepwise mutation model, where mutation acts bidirectionally, increasing or reducing the number of repeats of a fixed DNA motif.

Infinite sites model Mutation configuration in the infinite sites model is fully described by a map between numbers of sequences and numbers of mutations 1 2 3 4 5 Mutations 123456 Sequences

Statistics of mutations (segregating sites)

Number of segregating sites 1 2 3 4 5 Mutations 123456 Sequences S=6

Pairwise differences 1 2 3 4 5 123456 Sequences No of differences d 23 = 3 Mutations Average number of pairwise differences = 3

Histogram of pairwise differences No of pairs No of differences 0 1 2 0 123456 3

Classes of mutations 1 2 3 4 5 123456 Mutation of class 2 Sequences Mutations

Histogram of classes of mutations Frequency Class of mutation 0 0.5 12 1

Coalescence method One looks at the past of an n - sample of sequences taken at present. Possible events that happen in the past are coalescences leading to common ancestors of sequences, and mutationsalong branches of ancestral tree.

Coalescence method Present Past Generation 1, (  =1) Generation 2, (  =2) Generation k, (  =k).... ……. n - sample Population size 2N

Coalescence – pairwise statistics Two sequences. For each sequence draw randomly a parent in generation 1 (  =1), then for each parent draw randomly a (grand) parent in generation 2, (  =2) ….. COMMON ANCESTOR  2 (i) - probability that a COMMON ANCESTOR of the two sequences lived in generation i (  =i)

Coalescence – continuous time approximations Population time scale 1 unit = 2N generations Mutational time scale 1 unit = 1/2  generations

Coalescence n-sample  k independent, exponentially distributed random variables  mutation intensity N population's effective size  = 4  Nproduct parameter t = 2  mutational time scale (  - is time in number of generations).

Coalescence method The use of coalescence theory allows efficient formulation of appropriate models and gives a good basis for approaching model analysis problems, like hypotheses testing or parameter estimation. s5s5 s4s4 s3s3 6 5 4 3 2 1 t4t4 t2t2 123 4 5 s2s2 t3t3 t5t5

Independence of metrics (coalescence times) and topology  Topologies of trees (with ordered branches) are all equally probable.  Metrics (distributions of branch lengths) of trees are determined by coalescence process which, in turn, depends on population parameters.

Coalescence – statistics of pairwise differences Assume mutational time – scale. Then mutations occur with intensity = 1/2. Let A 2 denote a Z + random variable defined by number of segregating sites between sample 1 and sample 2. T – random variable given by coalescence time t. Conditional probability that A 2 =n is Poisson with =t P[A 2 =n | T=t] =

Coalescence – population with time varying size

Population with time-varying size Population's effective size N(t) changes in time, then product parameter is also a time function  (t)= 4  N(t) Joint probability density function:

How the history of population size N(t)  (t) is encoded in histograms of pairwise differences and mutation classes ?

Pairwise differences

no of differences 051015 0 1 2 3 4 5 6 7 time t  (t) Pairwise differences I 0510152025 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 frequency

no of differences frequency Pairwise differences II 051015202530 0 0.02 0.04 0.06 0.08 0.1 0.12 time t  (t) 051015202530 0 20 40 60 80 100 120

no of differences frequency Pairwise differences III 051015202530 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 time t  (t) 051015 0 50 100 150 200 250

Mutation classes Frequencies are computed under the assumption that mutaion intensity  is low

Mutation classes I 051015 0 1 2 3 4 5 6 7 time t SNP type N(t) frequency 12345678910 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.3 N(t)=const

SNP type time t N(t) frequency 12345678910 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 051015 0 50 100 150 200 250 N(t)=N 0 exp(rt) 0.5 N 0 r=10 Mutation classes II

time t SNP type N(t) frequency 12345678910 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 051015202530 0 20 40 60 80 100 120 0.6 Mutation classes III

Conclusions  Different histories of population sizes lead to different sampling distributions of alleles  Parametric models of different form (exponential, stepwise, logistic) can lead to similar (difficult to distinguish) distributions of alleles  Estimation of population size history from DNA data can be unstable

Models versus data  Parametric and nonparametric estimation of population size histories from DNA samples  Testing hypotheses on values of parameters under parametric models, testing hypotheses of time constant versus time varying scenario

Models versus data 02468101214161820 0 50 100 150 200 250 300 350 400 450 051015202530 0 0.02 0.04 0.06 0.08 0.1 0.12 Data on worldwide distribution of mtDNA pairwise differences R. Cann et. al. 1987 Estimation of history of human population size

Models versus data II 2468101214161820 0 0.1 0.2 0.3 0.4 0.5 0.6 Histogram of classes of mutations. Data on worldwide distribution of mtDNA pairwise differences R. Cann et. al. 1987

Models versus data III Data on types of 44 SNPs randomly located in the genome Picoult, Newberg 2000 0.50.550.60.650.70.750.80.850.90.951 0 1 2 3 4 5 6 7 8 00.511.522.533.544.55 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Parametric estimates of N(t) based on the above data

Sampling distributions of alleles under models of neutral evolution.

Similar presentations

Presentation on theme: "Sampling distributions of alleles under models of neutral evolution."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sampling distributions of alleles under models of neutral evolution.

Similar presentations

Presentation on theme: "Sampling distributions of alleles under models of neutral evolution."— Presentation transcript:

Similar presentations

About project

Feedback