Presentation is loading. Please wait.

Presentation is loading. Please wait.

Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic.

Similar presentations


Presentation on theme: "Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic."— Presentation transcript:

1 Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic model would also incorporate – Insertions, deletions, inversions Sequences are large, but not infinite – Mutations can occur at the same point but on different lineages 5/18/2015Comp 790– Coalescence with Mutations1

2 Finite Sites Model Jukes-Cantor model- all positions are equally likely to mutate, mutations to any of the 3 other nucleotides are equally probable Kimura model- accommodates that transitions (A T and C G) occur more frequently than transversions Positions evolve independently 5/18/2015Comp 790– Coalescence with Mutations2 ACCTGCAT ACGTGCAT ACGTGCTT TCCTGCAT ACGTGCAAACGTGCTA ACGTGCTT ACGTGCTA ACGTGCAA TCCTGCAT

3 Wright-Fisher with Mutations Each gene passed on to the next generation is subject to mutation with probability u (probability 1 - u it is copied without modification) Can accommodate any one of Infinite Alleles, Infinite Sites, or Finite Sites Overall structure is the same Working backwards from the present, the probability that a lineage experiences the first mutation j generations in the past is: 5/18/2015Comp 790– Coalescence with Mutations3 Population of 8 with 4 alleles; one group of 5, and 3 of 1

4 Return of the Basic Coalescent Formula is similar in form to the discrete coalescent T M denotes the number of generations until the first mutation event. It is a geometric variable with parameter u. If time is measured in units of 2N generations then: Where θ = 4Nu, the population mutation rate, can be interpreted as the expected number of mutations separating two samples 5/18/2015Comp 790– Coalescence with Mutations4

5 Probabilities of Events If we consider n disjoint lineages then the time to the first mutation along any line the distribution is exponential with parameter Wait for both mutation and coalescence events, then the parameter is the sum of the two parameters (consequence of min(U,V) ~ Exp(a+b)) Whether the 1 st event is a coalescent of mutation event is determined by a biased Bernoulli trials, with probabilities 5/18/2015Comp 790– Coalescence with Mutations5 for coalescence, and,, of mutation

6 Simulating Sequence Evolution Simulating a set of genes with mutations 5/18/2015Comp 790– Coalescence with Mutations6 1.Put k=n, where n is the sample size 2.Choose an exponential variable with parameter k(k- 1+θ)/2 3.With probability (k-1)/(k-1+θ) the event is a coalescent event and probability θ/(k-1+θ) it is a mutation event 4.If a coalescent event, choose a pair randomly to coalesce, set k  k-1 5.If a mutation event, choose a lineage to mutate 6.Continue until k is one

7 Coalescent in Python Straightforward translation into Python 5/18/2015Comp 790– Coalescence with Mutations7 T = [[i,0.0] for i in xrange(N)] # gene number and time of merge k = N theta = 4.0*N*0.1 t = 0.0 while k > 1: t += expovariate(0.5*k*(k-1+theta)) if (random() < theta/(k-1+theta)): i = randint(0,k-1) T[i] = [T[i], t] else: i = randint(0,k-1) j = randint(0,k-1) while i == j: j = randint(0,k-1) T[i] = [T[i], T[j], t] T.pop(j) k -= 1

8 An Alternate Algorithm Waiting time until a mutation along a lineage is an exponential distribution with parameter θ/2 (slide 4) Equivalent to distributing mutations along a t-length path with Poisson distribution having parameter tθ/2 With mean, tθ/2. Given the number of mutations on a branch, the times are random The number and times of mutations on each branch are independent Thus, coalescence can be computed first, then mutations can be inserted along each branch The placing mutations on branches is called a “Poisson process” 5/18/2015Comp 790– Coalescence with Mutations8

9 Poisson Algorithm A variant of the continuous-time coalescent with mutations 5/18/2015Comp 790– Coalescence with Mutations9 1.Simulate the genealogy of N sequences according to a coalescent process with rate (algorithm from lecture 4) 2.For each branch generate a random number, M t, using a Poisson distribution with parameter tθ/2, where t is the length of the branch 3.For each branch, the times of the M t mutation events are chosen at random

10 Benefits of New Algorithm Mutations can be added onto a genealogy in retrospect Tweak mutation and coalescent parameters independently Mutation does not effect the fundamental results of coalescence Mutations only impact the extant Alleles 5/18/2015Comp 790– Coalescence with Mutations 1.How long ago did N haplotypes diverge? 2.What mutation rate would explain the diversity seen? 1.How long ago did N haplotypes diverge? 2.What mutation rate would explain the diversity seen?

11 Book What’s ahead I will finish chapter 2 next Tuesday From there on, one of you will be responsible for the a chapter – Each chapter in 2 lectures (pick up the pace a bit), a Thursday followed by a Tuesday – You’ll have a weekend to prepare each lecture – I will do chapter 8 5/18/2015Comp 790– Continuous-Time Coalescence11


Download ppt "Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic."

Similar presentations


Ads by Google