N-gene Coalescent Problems Probability of the 1 st success after waiting t, given a time-constant, a ~ p, of success 5/20/2015Comp 790– Continuous-Time Coalescence1
Review N-genes Likelihood k genes have a distinct lineage is: Manipulating a little Where, for large N, 1/N 2 is negligible 5/20/2015Comp 790– Continuous-Time Coalescence2 The 1 st gene can choose its parent freely, but the next k-1 must choose from the remainder Genes without a child
Approx N-gene Coalescence Approximate probability k-genes have different parents: The probability two or more have a common parent: Repeated distinct lineages for j generations leads to a geometric distribution, with 5/20/2015Comp 790– Continuous-Time Coalescence3 Recall that the 2-gene case had a similar form, but with 1 in place of the combinatorial. Here the combinatorial terms accounts for all possible k-choose-2 pairs, which are treated independently
Impact of Approximation Approximation is not “proper” for all values of k < 2N Considering the following values of N 5/20/2015Comp 790– Continuous-Time Coalescence4 N k
Fix N and Vary k Comparing the actual to the approximation 5/20/2015Comp 790– Continuous-Time Coalescence5
Concrete Example In a population of 2N = 10 the probability that 3 genes have one ancestor in the previous generation is: The probability that all 3 have a different ancestor is: The remaining probability is that the 3 genes have two parents in the previous generation 5/20/2015Comp 790– Continuous-Time Coalescence6 The 1 st gene can choose its parent freely, while the next 2 must choose the same one The i st gene can choose its parent from the 10, while the next 2 must choose the remainder
Example Continued The probability is that 2 or more genes have common parents in the previous generation is: By our approximation term the probability that two or more genes share a common parent is: Leads to a MRCA estimate of 5/20/2015Comp 790– Continuous-Time Coalescence7 The probability that 2 have common parents plus the probability all 3 have a common parent Error in approximation for k=3, 2N=10
For Large N and Small k For 2N > 100, the agreement improves, so long as k << 2N The advantage of the approximation is that it fit’s the “form” of a geometric distribution, an thus can be generalized to a continuous-time model 5/20/2015Comp 790– Continuous-Time Coalescence8
Continuous-time Coalescent In the Wright-Fisher model time is measures in discrete units, generations. A continuous time approximation is conceptually more useful, and via the given approximation, computationally simple Moreover, a continuous model can be constructed that is independent of the population size (2N), so long as our sample size, k, is much smaller (one of those rare cases where a small sample size simplifies matters) The only time we will need to consider population size (2N) is when we want to convert from time back into generations. 5/20/2015Comp 790– Continuous-Time Coalescence9
Continuous-time Derivation As before, let, where j is now time measured in generations It follows that j = 2Nt translates continuous time, t, back into generations j. In practice floor(2Nt) is used to assign a discrete generation number. The waiting time,, for k genes to have k – 1 or fewer ancestors is exponentially distributed,, derived from t = j/2N, M=2N and Giving: 5/20/2015Comp 790– Continuous-Time Coalescence10 The probability that k genes will have k-1 or fewer ancestors at some time greater than or equal to t
Visualization Plots of, for k = [3, 4, 5, 6] 5/20/2015Comp 790– Continuous-Time Coalescence11 k=3 k=4 k=5 k=6
Continuous Coalescent Time Scale In the continuous-time time constant is a measure of ancestral population size, with the original at time 0, ½ the original at time 0.5, and ¼ at 1.0 5/20/2015Comp 790– Continuous-Time Coalescence t 0 N 2N 2.6N Population size
A Coalescent Model The continuous coalescent lends itself to generative models The following algorithm constructs a plausible genealogy for n genes This model is backwards, it begins from the current populations and posits ancestry, in contrast to a forward algorithm like those used in the first lecture 5/20/2015Comp 790– Continuous-Time Coalescence13 1.Start with k = n genes 2.Simulate the waiting time,, to the next event, 3.Choose a random pair (i, j) with 1 ≤ i < j ≤ k uniformly among the pairs 4.Merge I and J into one gene and decrease the sample size by one, k k -1 5.Repeat from step 2 while k > 1
Properties of a Coalescent Tree The height, H n, of the tree is the sum of time epochs, T j, where there are j = n, n-1, n-2, …, 2, 1 ancestors. The distribution of H n amounts to a convolution of the exponential variables whose result is: Where With 5/20/2015Comp 790– Continuous-Time Coalescence14 As n ∞, E(H n ) 2, and, if n=2, E(H 2 )=1. Thus, the waiting time for n genes to find their common ancestor is less than twice the time for 2!
5/20/2015Comp 790– Continuous-Time Coalescence15