Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.

Similar presentations


Presentation on theme: "Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of."— Presentation transcript:

1 Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of sequenced data Parameter Estimation Model Testing Coalescent Theory in Biology www. coalescent.dk TGTTGT CATAGT CGTTAT

2 Haploid Model Diploid Model Wright-Fisher Model of Population Reproduction i. Individuals are made by sampling with replacement in the previous generation. ii. The probability that 2 alleles have same ancestor in previous generation is 1/2N Individuals are made by sampling a chromosome from the female and one from the male previous generation with replacement Assumptions 1.Constant population size 2.No geography 3.No Selection 4.No recombination

3 P(k):=P{k alleles had k distinct parents} 1 2N 1 2N *(2N-1) *..* (2N-(k-1)) =: (2N) [k] (2N) k k -> any k -> k k -> k-1 Ancestor choices: k -> j For k << 2N: S k,j - the number of ways to group k labelled objects into j groups.(Stirling Numbers of second kind.

4 Mean, E(X 2 ) = 2N. Ex.: 2N = 20.000, Generation time 30 years, E(X 2 ) = 600000 years. Waiting for most recent common ancestor - MRCA P(X 2 = j) = (1-(1/2N)) j-1 (1/2N) Distribution until 2 alleles had a common ancestor, X 2 ?: P(X 2 > j) = (1-(1/2N)) j P(X 2 > 1) = (2N-1)/2N = 1-(1/2N) 1 2N 1 1 1 2 j 1 1 2 j

5 10 Alleles’ Ancestry for 15 generations

6 1. Simultaneous Events 2. Multifurcations. 3. Underestimation of Coalescent Rates Multiple and Simultaneous Coalescents

7 2 563 0.0 1.0 1.0 corresponds to 2N generations 1 4 0 2N 0 6 6/2N e t c :=t d /2N e Discrete  Continuous Time

8 The Standard Coalescent Two independent Processes Continuous: Exponential Waiting Times Discrete: Choosing Pairs to Coalesce. 12345 WaitingCoalescing 4--5 3--(4,5) (1,2)--(3,(4,5)) 1--2 {1}{2}{3}{4}{5} {1,2}{3,4,5} {1,2,3,4,5} {1}{2}{3}{4,5} {1}{2}{3,4,5}

9 Expected Height and Total Branch Length Expected Total height of tree: H k = 2(1-1/k) i.Infinitely many alleles finds 1 allele in finite time. ii. In takes less than twice as long for k alleles to find 1 ancestors as it does for 2 alleles. Expected Total branch length in tree, L k : 2*(1 + 1/2 + 1/3 +..+ 1/(k-1)) ca= 2*ln(k-1) 1 2 3 k 1/3 1 2 1 2/(k-1) Time Epoch Branch Lengths

10 Effective Populations Size, N e. In an idealised Wright-Fisher model: i. loss of variation per generation is 1-1/(2N). ii. Waiting time for random alleles to find a common ancestor is 2N. Factors that influences N e : i. Variance in offspring. WF: 1. If variance is higher, then effective population size is smaller. ii. Population size variation - example k cycle: N 1, N 2,..,N k. k/N e = 1/N 1 +..+ 1/N k. N 1 = 10 N 2 = 1000 => N e = 50.5 iii. Two sexes N e = 4N f N m /(N f +N m )I.e. N f - 10 N m -1000 N e - 40

11 6 Realisations with 25 leaves Observations: Variation great close to root. Trees are unbalanced.

12 Sampling more sequences The probability that the ancestor of the sample of size n is in a sub-sample of size k is Letting n go to infinity gives (k-1)/(k+1), i.e. even for quite small samples it is quite large.

13 Probability for two genes being identical: P(Coalescence < Mutation) = 1/(1+  ). m mutation pr. nucleotide pr.generation. L: seq. length µ = m*L Mutation pr. allele pr.generation. 2N e - allele number.  := 4N*µ -- Mutation intensity in scaled process. Adding Mutations sequence time Discrete time Discrete sequence Continuous time Continuous sequence 1/L 1/(2N e ) time sequence  /2 mutation coalescence Note: Mutation rate and population size usually appear together as a product, making separate estimation difficult. 1

14 Three Models of Alleles and Mutations. Infinite Allele Infinite Site Finite Site acgtgctt acgtgcgt acctgcat tcctgcat acgtgctt acgtgcgt acctgcat tcctggct tcctgcat i. Only identity, non-identity is determinable ii. A mutation creates a new type. i. Allele is represented by a line. ii. A mutation always hits a new position. i. Allele is represented by a sequence. ii. A mutation changes nucleotide at chosen position.   

15 1 2 3 45 Infinite Allele Model

16 Final Aligned Data Set: Infinite Site Model

17 1 3 4 5 2 1 3 4 5 2 {},, Ignoring mutation position Ignoring sequence label Ignoring mutation position Ignoring sequence label Labelling and unlabelling:positions and sequences 9 coalescence events incompatible with data 4 classes of mutation events incompatible with data The forward-backward argument

18 Infinite Site Model: An example Theta=2.12 2 3 2 3 5 5 4 9 10 5 19 14 33

19 Impossible Ancestral States

20 Final Aligned Data Set: acgtgctt acgtgcgt acctgcat tcctgcat s s s Finite Site Model

21 Diploid Model with Recombination An individual is made by: 1.The paternal chromosome is taken by picking random father. 2.Making that father’s chromosomes recombine to create the individuals paternal chromosome. Similarly for maternal chromosome.

22 A recombinant sequence will have have two different ancestor sequences in the grandparent. The Diploid Model Back in Time.

23 1- recombination histories I: Branch length change 4 3 12 4 3 1 2 4 3 1 2

24 1- recombination histories II: Topology change 4 3 12 4 3 1 2 4 3 1 2

25 1- recombination histories III: Same tree 4 3 12 4 3 1 2 4 3 1 2

26 1- recombination histories IV: Coalescent time must be further back in time than recombination time. 3 412 c r

27 Recombination-Coalescence Illustration Copied from Hudson 1991 Intensities Coales. Recomb. 1 2  3 2  6 2  3 (2+b)  1 (1+b)  0  b

28 Age to oldest most recent common ancestor From Wiuf and Hein, 1999 Genetics Scaled recombination rate -  Age to oldest most recent common ancestor 0 kb 250 kb

29 S  – number of Segments E(S  ) = 1 +  Number of genetic ancestors to the Human Genome sequence time R R R C C C Statements about number of ancestors are much harder to make. Simulations

30 A randomly picked ancestor: (ancestral material comes in batteries!) 0 0 52.000 260 Mb 0 6890 8360 7.5 Mb *35 0 30kb *250 Parameters used 4N e 20.000 Chromos. 1: 263 Mb. 263 cM Chromosome 1: Segments 52.000 Ancestors 6.800 All chromosomes Ancestors 86.000 Physical Population. 1.3-5.0 Mill. Applications to Human Genome (Wiuf and Hein,97)

31 Ignoring recombination in phylogenetic analysis Mimics decelerations/accelerations of evolutionary rates. No & Infinite recombination implies molecular clock. General Practice in Analysis of Viral Evolution!!! RecombinationAssuming No Recombination 14321432

32 Simulated Example

33 Genotype and Phenotype Covariation: Gene Mapping Time Result:The Mapping Function Reich et al. (2001) Decay of local dependency A set of characters. Binary decision (0,1). Quantitative Character. Dominant/Recessive. Penetrance Spurious Occurrence Heterogeneity genotype Genotype  Phenotype phenotype Genetype -->Phenotype Function Sampling Genotypes and Phenotypes


Download ppt "Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of."

Similar presentations


Ads by Google