Presentation is loading. Please wait.

Presentation is loading. Please wait.

Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5.

Similar presentations


Presentation on theme: "Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5."— Presentation transcript:

1 Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5

2 Final Exam 24-hour, takehome exam More straight-forward questions than in homeworks Please email Michael and Serafim by Friday, with your preference of day to take exam Exam starts Sunday, …, Thursday noon; ends Monday,..., Friday noon

3 Number of labeled unrooted tree topologies How many possibilities are there for leaf 4? 1 2 3 4 4 4

4 Number of labeled unrooted tree topologies How many possibilities are there for leaf 4? For the 4 th leaf, there are 3 possibilities 1 2 3 4

5 Number of labeled unrooted tree topologies How many possibilities are there for leaf 5? For the 5 th leaf, there are 5 possibilities 1 2 3 4 5

6 Number of labeled unrooted tree topologies How many possibilities are there for leaf 6? For the 6 th leaf, there are 7 possibilities 1 2 3 4 5

7 Number of labeled unrooted tree topologies How many possibilities are there for leaf n? For the n th leaf, there are 2n – 5 possibilities 1 2 3 4 5

8 Number of labeled unrooted tree topologies #unrooted trees for n taxa: (2n-5)*(2n-7)*...*3*1 = (2n-5)! / [2n-3*(n-3)!] #rooted trees for n taxa: (2n-3)*(2n-5)*(2n-7)*...*3 = (2n-3)! / [2n-2*(n-2)!] 1 2 3 4 5 N = 10 #unrooted: 2,027,025 #rooted: 34,459,425 N = 30 #unrooted: 8.7x10 36 #rooted: 4.95x10 38

9 Search through tree topologies: Branch and Bound Observation: adding an edge to an existing tree can only increase the parsimony cost Enumerate all unrooted trees with at most n leaves: [i 3 ][i 5 ][i 7 ]……[i 2N–5] ] where each i k can take values from 0 (no edge) to k At each point keep C = smallest cost so far for a complete tree Start B&B with tree [1][0][0]……[0] Whenever cost of current tree T is > C, then:  T is not optimal  Any tree extending T with more edges is not optimal: Increment by 1 the rightmost nonzero counter

10 Bootstrapping to get the best trees Main outline of algorithm 1.Select random columns from a multiple alignment – one column can then appear several times 2.Build a phylogenetic tree based on the random sample from (1) 3.Repeat (1), (2) many (say, 1000) times 4.Output the tree that is constructed most frequently

11 Probabilistic Methods A more refined measure of evolution along a tree than parsimony P(x 1, x 2, x root | t 1, t 2 ) = P(x root ) P(x 1 | t 1, x root ) P(x 2 | t 2, x root ) If we use Jukes-Cantor, for example, and x 1 = x root = A, x 2 = C, t 1 = t 2 = 1, = p A  ¼(1 + 3e -4α )  ¼(1 – e -4α ) = (¼) 3 (1 + 3e -4α )(1 – e -4α ) x1x1 t2t2 x root t1t1 x2x2

12 Probabilistic Methods If we know all internal labels x u, P(x 1, x 2, …, x N, x N+1, …, x 2N-1 | T, t) = P(x root )  j  root P(x j | x parent(j), t j, parent(j) ) Usually we don’t know the internal labels, therefore P(x 1, x 2, …, x N | T, t) =  x N+1  x N+2 …  x 2N-1 P(x 1, x 2, …, x 2N-1 | T, t) x root = x 2N-1 x1x1 x2x2 xNxN xuxu

13 Computing the Likelihood of a Tree Define P(L k | a): probability of subtree rooted at x k, given that x k = a Then, P(L k | a) = (  b P(L i | b) P(b | a, t ki ) )(  c P(L j | c) P(c | a, t ki ) ) xkxk xixi xjxj t ki t kj

14 Felsenstein’s Likelihood Algorithm To calculate P(x 1, x 2, …, x N | T, t) Initialization: Set k = 2N – 1 Recursion: Compute P(L k | a) for all a   If k is a leaf node: Set P(L k | a) = 1(a = x k ) If k is not a leaf node: 1. Compute P(L i | b), P(L j | b) for all b, for daughter nodes i, j 2. Set P(L k | a) =  b,c P(b | a, t ki )P(L i | b) P(c | a, t kj ) P(L j | c) Termination: Likelihood at this column = P(x 1, x 2, …, x N | T, t) =  a P(L 2N-1 | a)P(a)

15 Probabilistic Methods Given M (ungapped) alignment columns of N sequences, Define likelihood of a tree: L(T, t) = P(Data | T, t) =  m=1…M P(x 1m, …, x nm, T, t) Maximum Likelihood Reconstruction: Given data X = (x ij ), find a topology T and length vector t that maximize likelihood L(T, t)

16 Some new sequencing technologies

17 Molecular Inversion Probes

18

19 Single Molecule Array for Genotyping—Solexa

20 Nanopore Sequencing http://www.mcb.harvard.edu/branton/index.htm

21 Nanopore Sequencing http://www.mcb.harvard.edu/branton/index.htm

22 Nanopore Sequencing—Assembly Resulting reads are likely to look different than Sanger reads:  Long (perhaps 10,000bp-1,000,000bp)  High error rate (perhaps 10% – 30%)  Two colors? A/ CTG AT/ CG AG/ CT How can we assemble under such conditions?

23 Pyrosequencing

24 Pyrosequencing on a chip Mostafa Ronaghi, Stanford Genome Technologies Center 454 Life Sciences

25 Pyrosequencing Signal

26 Pyrosequencing—Assembly Resulting reads are likely to look different than Sanger reads:  Short (currently 100 to 200 bp)  Low error rates, except in homopolymeric runs (AAA…, CCC…, etc)  Currently, not known how to do paired reads on a chip ?

27 Polony Sequencing


Download ppt "Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5."

Similar presentations


Ads by Google