Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combinatorics & the Coalescent (26.2.02) Tree Counting & Tree Properties. Basic Combinatorics. Allele distribution. Polya Urns + Stirling Numbers. Number.

Similar presentations


Presentation on theme: "Combinatorics & the Coalescent (26.2.02) Tree Counting & Tree Properties. Basic Combinatorics. Allele distribution. Polya Urns + Stirling Numbers. Number."— Presentation transcript:

1 Combinatorics & the Coalescent (26.2.02) Tree Counting & Tree Properties. Basic Combinatorics. Allele distribution. Polya Urns + Stirling Numbers. Number af ancestral lineages after time t. Inclusion-Exclusion Principle.

2 A set of realisations (from Felsenstein)

3 Binomial Numbers 12345n 1 25 34 n Binomial Expansion: Special Cases:

4 Recursion:Initialisation: n=0 1 1 1 1 1 2 2 1 2 1 4 3 1 3 3 1 8 4 1 4 6 4 1 16 5 1 5 10 10 5 1 32 6 1 6 15 20 15 6 1 64 7 1 7 21 35 35 21 7 1 128 k = 0 1 2 3 4 5 6 7 1 2 5 r 34 n-1 n-r-1 34 n-1 n-r 1 2 5 r-1 12 5 r 34 6 n-r n n ++

5 0 1 2 3 The Exponential Distribution. The Exponential Distribution: R+ Expo(a) Density: f(t) = ae -at, P(X>t)= e -at Properties: X ~ Exp(a) Y ~ Exp(b) independent i. P(X>t 2 |X>t 1 ) = P(X>t 2 -t 1 ) (t 2 > t 1 ) ii. E(X) = 1/a. iii. P(X < Y) = a/(a + b). iv. min(X,Y) ~ Exp (a + b). v. Sums of k iid X i is  (k,a) distributed

6 The Standard Coalescent Two independent Processes Continuous: Exponential Waiting Times Discrete: Choosing Pairs to Coalesce. 12345 WaitingCoalescing 4--5 3--(4,5) (1,2)--(3,(4,5)) 1--2 {1}{2}{3}{4}{5} {1,2}{3,4,5} {1,2,3,4,5} {1,2}{3}{4,5} {1}{2}{3,4,5}

7 Tree Counting Tree: Connected undirected graph without cycles. k nodes (vertices) & k-1 edges. Nodes with one edge are leaves (tips) - the rest are internal. s1 s4 s6 s5 s3 s2 a2a1a3 a4 r Ignore root & branch lengths gives unrooted tree topology. If age ordering of internal nodes are retained this gives the coalescent topology. Labels of internal nodes are permutable without change of biological interpretation. If labels at leaves are ignored we have the shape of a tree. Most biological trees are bifurcating. Valency 3 (number of edges touching internal nodes) if made unrooted. Such unrooted trees have n-2 internal nodes & 2n-3 edges.

8 Counting by Bijection Bijection to a decision series: 321k1k1 Level 0 Level 1 Level 2 Level L 321k2k2 132N N=k 1 *k 2 *...*k L

9 Trees: Rooted, bifurcating & nodes time-ranked. k 1 ji k-1 1 (i,j) k-2 1 (i,j)(n,m) m n 3456789101520 318180 2700 5.7 10 4 1.5 10 6 5.7 10 7 2.5 10 9 6.9 10 18 5.6 10 29 Recursion: T k = T k-1 Initialisation: T 1 = T 2 =1

10 Trees: Unrooted & valency 3 2 1 3 1 1 2 4 2 3 3 1 2 3 4 4 1 2 3 4 1 2 34 1 2 34 1 2 34 1 2 34 5 55 5 5 456789101520 315105945 103451.4 10 5 2.0 10 6 7.9 10 12 2.2 10 20 Recursion: T n = (2n-5) T n-1 Initialisation: T 1 = T 2 = T 3 =1

11 Coalescent versus unrooted tree topologies 4 leaves: 3 unrooted trees & 18 coalescent topologies. 1 unrooted tree topology contains 6 coalescent topologies. 1 42 3 4 111 2 2233 34 44

12 External (  ) versus Internal  Branches. E(  ) = 2 E(  ) = Inner & outer branches Fu & Li (1993) Red - external. Others internal. Except for green branch, internal- external corresponds to singlet/non- singlet segregating sites if only one mutation can happen per position. ACTTGTACGA TCTTATACGA ACTTATACGA s n Let l i,n be length of i’th external branch in an n-tree. Obviously E(  ) = nE(l n,i ) (any i) l n-1,j + t n Pr= 1-2/n L n,i = t n Pr= 2/n

13 Probability of hanging Sub-trees. Kingman (1982b) 1243n t=0 t=t 1 12k For a coalescent with n leaves at time 0, with k ancestors at time t 1, let  be the groups of leaves of the k subtrees hanging from time t 1. Let 1, 2.., k be the number of leaves of these sub-trees. Example: n=8, k=3. Classes observed : 4, 3, 1 The basal division splits the leaves into (k,n-k) sets with probability: 1/(n-1).

14 Nested subsamples (Saunders et al.(1986) Adv.Appl.Prob.16.471-91.) t=0 Population Sub-sampleSample t=t 1 i’ i j j’ 2N Transitions i,j i-1,j i-1,j-1 i,j 1,1 2,12,2 3,13,23,3 4,14,24,34,4 5,15,25,35,45,5 6,16,26,36,46,56,6 7,17,27,37,47,57,67,7 8,18,28,38,48,58,68,78,8 9,19,29,39,49,59,69,79,89,9 i, j

15 Nested subsamples (Saunders et al.(1986) Adv.Appl.Prob.16.471-91.) Pr{MRCA(sub-sample) = MRCA(sample)} = Pr{MRCA(sub-sample) = MRCA(population)} =

16 Age of a Mutation Wiuf & Donnelly (1999) Wiuf (2000), Matthews (2000) The probability that there are k differences between two sequences. Going back in time 2 kinds of events can occur (mutations (  - or a coalescent (1). This gives a geometric distribution: --*-------*------*----- ----*----*----*----*--- Exp(1) Exp(  )

17 Polya Urns & Infinite Allele Model (Donnelly,1986 + Hoppe,1984+87) The only observation made in the infinite allele models is identity/non-identity among all pairs of alleles. I.e. The central observation is a series of classes and their sizes. What is the next event - a duplication of an exiting type or a introduction of a ”new” allele. This model will give rise to distributions on partitions of {1,2,..,n} like {1,4,7}{2,3}{5}{6}. Since the labelling is arbitrary, only the information about the size of these groups is essential for instance represented as 1 2 2 1 3 1. Expected number of mutations in unit interval (2N) is .

18 Classical Polya Urns Feller I. 213 Let X 0 be the initial configuration of the initial Urn. A step: take a random ball the urn and put it back together with an extra of the same colour. X k be the content after the k’th step. Let Y k be the colour of the k’th picked ball. i. P{Y k =j} = P{Y 1 =j}. ii. Sequences Y 1... Y k resulting in the same X k - has the same probability.

19 Labelling, Polya Urns & Age of Alleles (Donnelly,1986 + Hoppe,1984+87) An Urn: 1   2 1 1 A ball is picked proportionally to its weight. Ordinary balls have weight 1. If the initial  -size ball is picked, it is replaced together with a completely new type. If an ordinary ball is picked, it is replaced together with a copy of itself. There is a simple relationship between the distribution of ”the alleles labeled with age ranking” is the same as ”the alleles labeled with size ranking” As they come By size By age

20 Ewens' formula. (1972 TPB 3.87-112) P n (a 1,a 2,,a n ) = k is a minimal sufficient statistic for   the probability of the data conditioned on k is  -less and there is no simpler such statistic. P n (a 1,a 2,,a n ;k) = E n (k types) = P 5 (2,0,1,0,0) is the probability of seeing 2 singles and one allele in 3 copies in a sample of 5. Obviously, a 1 +2a 2 + +ia i +na n =n

21 Stirling Numbers Partitioning into k sets - Stirling Numbers (of second kind) - S n,k k unlabelled bins - all non-empty. Bell Numbers - B n - Partioning into any number of sets. k n 1234567 11 211 3131 41761 511525101 61319065151 7163301350140211 123n564 123k Obviously: B 1 2 5 15 52 193 877

22 Stirling Numbers Basic Recursion: S n,k = kS n-1,k + S n-1,k-1 Initialisation: S n,1 = S n,n = 1. n-1 items - k classes: {..},{..},..,{..} (n-1,k-1): {..},{..},..,{..} (n,k) : {..},{..},..,{..} + ”n”

23 Ewens' formula - example. (1972 TPB 3.87-112) P 5 (2,0,1,0,0) = P 5 (2,0,1,0,0;3) = Assume has been observed and that 0.5 mutation is expected per unit (2N) time. E 5 (k types) =

24 Ancestors to Ancestors Griffiths(1980), Tavaré(1984) h i,j = probability that i individuals has j ancestors after time t. i [k] = i(i-1)..(i-k+1) i (k) = i(i+1)..(i+k-1) Example: Disappearance of 7 lineages.

25 Y:# of Ancestors to time t. 3 methods of solution: i.Sum of different independent exponential distributions: ii. Distribution in markov chain: iii. Combination of known probabilities: a. Probability that i alleles has i/less ancestors. b. This probability is the same for all i-sets c. No coalescence within a set, implies no coalescence within all subsets. ij-11 i-1j+1j 1 t

26 3 Ancestors to 2 Ancestors : (3/2)(e -t - e -3t ) ?: (e -t - e -3t )/2 Exactly one coalescence:3(e -t - (e -t - e -3t )/2)-e -3t ) 1,2 2,3 1,3 1,2,3 (2,3) (1,2) (1,3) e -3t e -t (2,3) (1,3) (1,2) e -t ? ?? {1,2,3} {1,2}{1,3}{2,3} Jordan’s Sieve: A 1 : 3e -t - 2A 2 : 2 ((e -t + e -3t )/2) + 3A 3 : 3 e -3t

27 The exclusion-inclusion principle. 112I IIVenn Diagrams: {I + II} - {I} + {II} + {I&II} = 0 1 1 1 2 2 2 3 III III {I+II+III} = {I}+ {II}+ {III} - ({I,II} + {I,III} + {II,III}) +{I,II,III}

28 Exclusion-inclusion& Jordan’s Sieve 1 1 1 1 1 2 2 22 2 4 2 3 3 33 2 S j j=1,..,r the given sets, A k - sum of intersection of k sets in 1 sets A 1 - 2A 2 + 3A 3 - 4 A 4 in 2 sets A 2 - 3A 3 + 6 A 4 in 3 sets A 3 - 4 A 4 in 4 sets A 4 in some set A 1 - A 2 + A 3 - A 4 Example: the elements above: Total number: In exactly m sets: (Jordan’s Sieve) (Jordan’s Sieve) exclusion-inclusion

29 Surviving Lineages Which probability statements can be made? Let s be subset of i {1,2,..i} and S(s) be the event that no coalescence has happened to s. Additionally, if s’ is a subset of s, then S(s) implies S(s’). {2,..,i} {1,2,..,i-1} {1,3..,i} {1,2,..,i} i 1 i-1 i Size number {1,2} e -t (i-1,i) e -t 2 j

30 Surviving Lineages where There aresets. We want events member of only one of them. Summation is over all k-subsets of {1,..,r} and intersection is between the k sets chosen.

31 0 t1t1 t 7 7 4 5 6 4 P k (t 1 ) = h i,k (t 1 )* h k,j (t- t 1 )/ h i,j (t) Example: 7 --> 4 lineages.

32 Summary Tree Counting & Tree Properties. Basic Combinatorics. Allele distribution. Polya Urns + Stirling Numbers. Number af ancestral lineages after time t. Inclusion-Exclusion Principle.

33 Recommended Literature Bender(1974) Asymmptotic Methods in Enumeraion Siam Review vol16.4.485- Donnelly (1986) ”Theor.Pop.Biol. Ewens (1972) Theor.Pop.Biol. Ewens (1989) ”Population Genetics Theory - The Past and the Future” Feller (1968+71) Probability Theory and its Applications I + II Wiley Fu & Li (1993) ”Statistical Tests of Neutrality of Mutations” Genetics 133.693-709. Griffiths (1980) Griffiths & Tavaré(1998) ”The Age of a mutation on a general coalescent tree. Griffiths & Tavaré(1999) ”The ages of mutations in gene trees” Griffiths & Tavaré(2001) ”The genealogy of a neutral mutation” Hoppe (1984) ”Polya-like urns and the Ewens’ sampling formula” J.Math.Biol. 20.91-94 Kingman (1982) ”On the Genealogy of Large Populations” 27-43. Kingman (1982) ”The Coalescent” Stochastic Processes and their Applications 13..235-248. Kingman (1982) Matthews,S.(1999) ”Times on Trees, and the Age of an Allele” Theor.Pop.Biol. 58.61-75. Möhle Pitman Schweinsberg Simonsen & Churchill (1997) Saunders et al.(1986) ”On the genealogy of nested subsamples from a haploid population” Adv.Apll.Prob. 16.471-91. Tajima (1983) Evolutionary Relationships of DNA Sequences in Finite Poulations Genetics 105.437-60. Tavaré (1984) Line-of-Descent and Genealogical Processes, and Their Application in Population Genetics Models. Theor.Pop.Biol. 26.119-164. Thompson,R. (1998) ”Ages of mutations on a coalescent tree” Math.Bios. 153.41-61. van Lint & Wilson (1991) A Course in Combinatorics - Cambridge Wiuf (2000) On the Genealogy of a Sample of Neutral Rare Alleles. Theor.Pop.Biol. 58.61-75. Wiuf & Donnelly (1999) Conditional Genealogies and the Age of a Mutant. Theor. Pop.Biol. 56.183-201.


Download ppt "Combinatorics & the Coalescent (26.2.02) Tree Counting & Tree Properties. Basic Combinatorics. Allele distribution. Polya Urns + Stirling Numbers. Number."

Similar presentations


Ads by Google