Download presentation
Presentation is loading. Please wait.
Published byCarol Patterson Modified over 8 years ago
1
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of sequenced data Parameter Estimation Model Testing Coalescent Theory in Biology www. coalescent.dk TGTTGT CATAGT CGTTAT
2
Haploid Model Diploid Model Wright-Fisher Model of Population Reproduction i. Individuals are made by sampling with replacement in the previous generation. ii. The probability that 2 alleles have same ancestor in previous generation is 1/2N Individuals are made by sampling a chromosome from the female and one from the male previous generation with replacement Assumptions 1.Constant population size 2.No geography 3.No Selection 4.No recombination
3
P(k):=P{k alleles had k distinct parents} 1 2N 1 2N *(2N-1) *..* (2N-(k-1)) =: (2N) [k] (2N) k k -> any k -> k k -> k-1 Ancestor choices: k -> j For k << 2N: S k,j - the number of ways to group k labelled objects into j groups.(Stirling Numbers of second kind.
4
Mean, E(X 2 ) = 2N. Ex.: 2N = 20.000, Generation time 30 years, E(X 2 ) = 600000 years. Waiting for most recent common ancestor - MRCA P(X 2 = j) = (1-(1/2N)) j-1 (1/2N) Distribution until 2 alleles had a common ancestor, X 2 ?: P(X 2 > j) = (1-(1/2N)) j P(X 2 > 1) = (2N-1)/2N = 1-(1/2N) 1 2N 1 1 1 2 j 1 1 2 j
5
10 Alleles’ Ancestry for 15 generations
6
1. Simultaneous Events 2. Multifurcations. 3. Underestimation of Coalescent Rates Multiple and Simultaneous Coalescents
7
2 563 0.0 1.0 1.0 corresponds to 2N generations 1 4 0 2N 0 6 6/2N e t c :=t d /2N e Discrete Continuous Time
8
The Standard Coalescent Two independent Processes Continuous: Exponential Waiting Times Discrete: Choosing Pairs to Coalesce. 12345 WaitingCoalescing 4--5 3--(4,5) (1,2)--(3,(4,5)) 1--2 {1}{2}{3}{4}{5} {1,2}{3,4,5} {1,2,3,4,5} {1}{2}{3}{4,5} {1}{2}{3,4,5}
9
Expected Height and Total Branch Length Expected Total height of tree: H k = 2(1-1/k) i.Infinitely many alleles finds 1 allele in finite time. ii. In takes less than twice as long for k alleles to find 1 ancestors as it does for 2 alleles. Expected Total branch length in tree, L k : 2*(1 + 1/2 + 1/3 +..+ 1/(k-1)) ca= 2*ln(k-1) 1 2 3 k 1/3 1 2 1 2/(k-1) Time Epoch Branch Lengths
10
Effective Populations Size, N e. In an idealised Wright-Fisher model: i. loss of variation per generation is 1-1/(2N). ii. Waiting time for random alleles to find a common ancestor is 2N. Factors that influences N e : i. Variance in offspring. WF: 1. If variance is higher, then effective population size is smaller. ii. Population size variation - example k cycle: N 1, N 2,..,N k. k/N e = 1/N 1 +..+ 1/N k. N 1 = 10 N 2 = 1000 => N e = 50.5 iii. Two sexes N e = 4N f N m /(N f +N m )I.e. N f - 10 N m -1000 N e - 40
11
6 Realisations with 25 leaves Observations: Variation great close to root. Trees are unbalanced.
12
Sampling more sequences The probability that the ancestor of the sample of size n is in a sub-sample of size k is Letting n go to infinity gives (k-1)/(k+1), i.e. even for quite small samples it is quite large.
13
Probability for two genes being identical: P(Coalescence < Mutation) = 1/(1+ ). m mutation pr. nucleotide pr.generation. L: seq. length µ = m*L Mutation pr. allele pr.generation. 2N e - allele number. := 4N*µ -- Mutation intensity in scaled process. Adding Mutations sequence time Discrete time Discrete sequence Continuous time Continuous sequence 1/L 1/(2N e ) time sequence /2 mutation coalescence Note: Mutation rate and population size usually appear together as a product, making separate estimation difficult. 1
14
Three Models of Alleles and Mutations. Infinite Allele Infinite Site Finite Site acgtgctt acgtgcgt acctgcat tcctgcat acgtgctt acgtgcgt acctgcat tcctggct tcctgcat i. Only identity, non-identity is determinable ii. A mutation creates a new type. i. Allele is represented by a line. ii. A mutation always hits a new position. i. Allele is represented by a sequence. ii. A mutation changes nucleotide at chosen position.
15
1 2 3 45 Infinite Allele Model
16
Final Aligned Data Set: Infinite Site Model
17
1 3 4 5 2 1 3 4 5 2 {},, Ignoring mutation position Ignoring sequence label Ignoring mutation position Ignoring sequence label Labelling and unlabelling:positions and sequences 9 coalescence events incompatible with data 4 classes of mutation events incompatible with data The forward-backward argument
18
Infinite Site Model: An example Theta=2.12 2 3 2 3 5 5 4 9 10 5 19 14 33
19
Impossible Ancestral States
20
Final Aligned Data Set: acgtgctt acgtgcgt acctgcat tcctgcat s s s Finite Site Model
21
Diploid Model with Recombination An individual is made by: 1.The paternal chromosome is taken by picking random father. 2.Making that father’s chromosomes recombine to create the individuals paternal chromosome. Similarly for maternal chromosome.
22
A recombinant sequence will have have two different ancestor sequences in the grandparent. The Diploid Model Back in Time.
23
1- recombination histories I: Branch length change 4 3 12 4 3 1 2 4 3 1 2
24
1- recombination histories II: Topology change 4 3 12 4 3 1 2 4 3 1 2
25
1- recombination histories III: Same tree 4 3 12 4 3 1 2 4 3 1 2
26
1- recombination histories IV: Coalescent time must be further back in time than recombination time. 3 412 c r
27
Recombination-Coalescence Illustration Copied from Hudson 1991 Intensities Coales. Recomb. 1 2 3 2 6 2 3 (2+b) 1 (1+b) 0 b
28
Age to oldest most recent common ancestor From Wiuf and Hein, 1999 Genetics Scaled recombination rate - Age to oldest most recent common ancestor 0 kb 250 kb
29
S – number of Segments E(S ) = 1 + Number of genetic ancestors to the Human Genome sequence time R R R C C C Statements about number of ancestors are much harder to make. Simulations
30
A randomly picked ancestor: (ancestral material comes in batteries!) 0 0 52.000 260 Mb 0 6890 8360 7.5 Mb *35 0 30kb *250 Parameters used 4N e 20.000 Chromos. 1: 263 Mb. 263 cM Chromosome 1: Segments 52.000 Ancestors 6.800 All chromosomes Ancestors 86.000 Physical Population. 1.3-5.0 Mill. Applications to Human Genome (Wiuf and Hein,97)
31
Ignoring recombination in phylogenetic analysis Mimics decelerations/accelerations of evolutionary rates. No & Infinite recombination implies molecular clock. General Practice in Analysis of Viral Evolution!!! RecombinationAssuming No Recombination 14321432
32
Simulated Example
33
Genotype and Phenotype Covariation: Gene Mapping Time Result:The Mapping Function Reich et al. (2001) Decay of local dependency A set of characters. Binary decision (0,1). Quantitative Character. Dominant/Recessive. Penetrance Spurious Occurrence Heterogeneity genotype Genotype Phenotype phenotype Genetype -->Phenotype Function Sampling Genotypes and Phenotypes
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.