Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of sequenced data Parameter Estimation Model Testing Coalescent Theory in Biology www. coalescent.dk TGTTGT CATAGT CGTTAT
Haploid Model Diploid Model Wright-Fisher Model of Population Reproduction i. Individuals are made by sampling with replacement in the previous generation. ii. The probability that 2 alleles have same ancestor in previous generation is 1/2N Individuals are made by sampling a chromosome from the female and one from the male previous generation with replacement Assumptions 1.Constant population size 2.No geography 3.No Selection 4.No recombination
P(k):=P{k alleles had k distinct parents} 1 2N 1 2N *(2N-1) *..* (2N-(k-1)) =: (2N) [k] (2N) k k -> any k -> k k -> k-1 Ancestor choices: k -> j For k << 2N: S k,j - the number of ways to group k labelled objects into j groups.(Stirling Numbers of second kind.
Mean, E(X 2 ) = 2N. Ex.: 2N = , Generation time 30 years, E(X 2 ) = years. Waiting for most recent common ancestor - MRCA P(X 2 = j) = (1-(1/2N)) j-1 (1/2N) Distribution until 2 alleles had a common ancestor, X 2 ?: P(X 2 > j) = (1-(1/2N)) j P(X 2 > 1) = (2N-1)/2N = 1-(1/2N) 1 2N j j
10 Alleles’ Ancestry for 15 generations
1. Simultaneous Events 2. Multifurcations. 3. Underestimation of Coalescent Rates Multiple and Simultaneous Coalescents
corresponds to 2N generations N 0 6 6/2N e t c :=t d /2N e Discrete Continuous Time
The Standard Coalescent Two independent Processes Continuous: Exponential Waiting Times Discrete: Choosing Pairs to Coalesce WaitingCoalescing (4,5) (1,2)--(3,(4,5)) 1--2 {1}{2}{3}{4}{5} {1,2}{3,4,5} {1,2,3,4,5} {1}{2}{3}{4,5} {1}{2}{3,4,5}
Expected Height and Total Branch Length Expected Total height of tree: H k = 2(1-1/k) i.Infinitely many alleles finds 1 allele in finite time. ii. In takes less than twice as long for k alleles to find 1 ancestors as it does for 2 alleles. Expected Total branch length in tree, L k : 2*(1 + 1/2 + 1/ /(k-1)) ca= 2*ln(k-1) k 1/ /(k-1) Time Epoch Branch Lengths
Effective Populations Size, N e. In an idealised Wright-Fisher model: i. loss of variation per generation is 1-1/(2N). ii. Waiting time for random alleles to find a common ancestor is 2N. Factors that influences N e : i. Variance in offspring. WF: 1. If variance is higher, then effective population size is smaller. ii. Population size variation - example k cycle: N 1, N 2,..,N k. k/N e = 1/N /N k. N 1 = 10 N 2 = 1000 => N e = 50.5 iii. Two sexes N e = 4N f N m /(N f +N m )I.e. N f - 10 N m N e - 40
6 Realisations with 25 leaves Observations: Variation great close to root. Trees are unbalanced.
Sampling more sequences The probability that the ancestor of the sample of size n is in a sub-sample of size k is Letting n go to infinity gives (k-1)/(k+1), i.e. even for quite small samples it is quite large.
Probability for two genes being identical: P(Coalescence < Mutation) = 1/(1+ ). m mutation pr. nucleotide pr.generation. L: seq. length µ = m*L Mutation pr. allele pr.generation. 2N e - allele number. := 4N*µ -- Mutation intensity in scaled process. Adding Mutations sequence time Discrete time Discrete sequence Continuous time Continuous sequence 1/L 1/(2N e ) time sequence /2 mutation coalescence Note: Mutation rate and population size usually appear together as a product, making separate estimation difficult. 1
Three Models of Alleles and Mutations. Infinite Allele Infinite Site Finite Site acgtgctt acgtgcgt acctgcat tcctgcat acgtgctt acgtgcgt acctgcat tcctggct tcctgcat i. Only identity, non-identity is determinable ii. A mutation creates a new type. i. Allele is represented by a line. ii. A mutation always hits a new position. i. Allele is represented by a sequence. ii. A mutation changes nucleotide at chosen position.
Infinite Allele Model
Final Aligned Data Set: Infinite Site Model
{},, Ignoring mutation position Ignoring sequence label Ignoring mutation position Ignoring sequence label Labelling and unlabelling:positions and sequences 9 coalescence events incompatible with data 4 classes of mutation events incompatible with data The forward-backward argument
Infinite Site Model: An example Theta=
Impossible Ancestral States
Final Aligned Data Set: acgtgctt acgtgcgt acctgcat tcctgcat s s s Finite Site Model
Diploid Model with Recombination An individual is made by: 1.The paternal chromosome is taken by picking random father. 2.Making that father’s chromosomes recombine to create the individuals paternal chromosome. Similarly for maternal chromosome.
A recombinant sequence will have have two different ancestor sequences in the grandparent. The Diploid Model Back in Time.
1- recombination histories I: Branch length change
1- recombination histories II: Topology change
1- recombination histories III: Same tree
1- recombination histories IV: Coalescent time must be further back in time than recombination time c r
Recombination-Coalescence Illustration Copied from Hudson 1991 Intensities Coales. Recomb. 1 2 3 2 6 2 3 (2+b) 1 (1+b) 0 b
Age to oldest most recent common ancestor From Wiuf and Hein, 1999 Genetics Scaled recombination rate - Age to oldest most recent common ancestor 0 kb 250 kb
S – number of Segments E(S ) = 1 + Number of genetic ancestors to the Human Genome sequence time R R R C C C Statements about number of ancestors are much harder to make. Simulations
A randomly picked ancestor: (ancestral material comes in batteries!) Mb Mb * kb *250 Parameters used 4N e Chromos. 1: 263 Mb. 263 cM Chromosome 1: Segments Ancestors All chromosomes Ancestors Physical Population Mill. Applications to Human Genome (Wiuf and Hein,97)
Ignoring recombination in phylogenetic analysis Mimics decelerations/accelerations of evolutionary rates. No & Infinite recombination implies molecular clock. General Practice in Analysis of Viral Evolution!!! RecombinationAssuming No Recombination
Simulated Example
Genotype and Phenotype Covariation: Gene Mapping Time Result:The Mapping Function Reich et al. (2001) Decay of local dependency A set of characters. Binary decision (0,1). Quantitative Character. Dominant/Recessive. Penetrance Spurious Occurrence Heterogeneity genotype Genotype Phenotype phenotype Genetype -->Phenotype Function Sampling Genotypes and Phenotypes