Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5.

Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5

Final Exam 24-hour, takehome exam More straight-forward questions than in homeworks Please email Michael and Serafim by Friday, with your preference of day to take exam Exam starts Sunday, …, Thursday noon; ends Monday,..., Friday noon

Number of labeled unrooted tree topologies How many possibilities are there for leaf 4? 1 2 3 4 4 4

Number of labeled unrooted tree topologies How many possibilities are there for leaf 4? For the 4 th leaf, there are 3 possibilities 1 2 3 4

Number of labeled unrooted tree topologies How many possibilities are there for leaf 5? For the 5 th leaf, there are 5 possibilities 1 2 3 4 5

Number of labeled unrooted tree topologies How many possibilities are there for leaf 6? For the 6 th leaf, there are 7 possibilities 1 2 3 4 5

Number of labeled unrooted tree topologies How many possibilities are there for leaf n? For the n th leaf, there are 2n – 5 possibilities 1 2 3 4 5

Number of labeled unrooted tree topologies #unrooted trees for n taxa: (2n-5)*(2n-7)*...*3*1 = (2n-5)! / [2n-3*(n-3)!] #rooted trees for n taxa: (2n-3)*(2n-5)*(2n-7)*...*3 = (2n-3)! / [2n-2*(n-2)!] 1 2 3 4 5 N = 10 #unrooted: 2,027,025 #rooted: 34,459,425 N = 30 #unrooted: 8.7x10 36 #rooted: 4.95x10 38

Search through tree topologies: Branch and Bound Observation: adding an edge to an existing tree can only increase the parsimony cost Enumerate all unrooted trees with at most n leaves: [i 3 ][i 5 ][i 7 ]……[i 2N–5] ] where each i k can take values from 0 (no edge) to k At each point keep C = smallest cost so far for a complete tree Start B&B with tree [1][0][0]……[0] Whenever cost of current tree T is > C, then:  T is not optimal  Any tree extending T with more edges is not optimal: Increment by 1 the rightmost nonzero counter

Bootstrapping to get the best trees Main outline of algorithm 1.Select random columns from a multiple alignment – one column can then appear several times 2.Build a phylogenetic tree based on the random sample from (1) 3.Repeat (1), (2) many (say, 1000) times 4.Output the tree that is constructed most frequently

Probabilistic Methods A more refined measure of evolution along a tree than parsimony P(x 1, x 2, x root | t 1, t 2 ) = P(x root ) P(x 1 | t 1, x root ) P(x 2 | t 2, x root ) If we use Jukes-Cantor, for example, and x 1 = x root = A, x 2 = C, t 1 = t 2 = 1, = p A  ¼(1 + 3e -4α )  ¼(1 – e -4α ) = (¼) 3 (1 + 3e -4α )(1 – e -4α ) x1x1 t2t2 x root t1t1 x2x2

Probabilistic Methods If we know all internal labels x u, P(x 1, x 2, …, x N, x N+1, …, x 2N-1 | T, t) = P(x root )  j  root P(x j | x parent(j), t j, parent(j) ) Usually we don’t know the internal labels, therefore P(x 1, x 2, …, x N | T, t) =  x N+1  x N+2 …  x 2N-1 P(x 1, x 2, …, x 2N-1 | T, t) x root = x 2N-1 x1x1 x2x2 xNxN xuxu

Probabilistic Methods Given M (ungapped) alignment columns of N sequences, Define likelihood of a tree: L(T, t) = P(Data | T, t) =  m=1…M P(x 1m, …, x nm, T, t) Maximum Likelihood Reconstruction: Given data X = (x ij ), find a topology T and length vector t that maximize likelihood L(T, t)

Some new sequencing technologies

Molecular Inversion Probes

Single Molecule Array for Genotyping—Solexa

Nanopore Sequencing http://www.mcb.harvard.edu/branton/index.htm

Nanopore Sequencing—Assembly Resulting reads are likely to look different than Sanger reads:  Long (perhaps 10,000bp-1,000,000bp)  High error rate (perhaps 10% – 30%)  Two colors? A/ CTG AT/ CG AG/ CT How can we assemble under such conditions?

Pyrosequencing

Pyrosequencing on a chip Mostafa Ronaghi, Stanford Genome Technologies Center 454 Life Sciences

Pyrosequencing Signal

Pyrosequencing—Assembly Resulting reads are likely to look different than Sanger reads:  Short (currently 100 to 200 bp)  Low error rates, except in homopolymeric runs (AAA…, CCC…, etc)  Currently, not known how to do paired reads on a chip ?

Polony Sequencing

Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5.

Similar presentations

Presentation on theme: "Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5.

Similar presentations

Presentation on theme: "Phylogeny Tree Reconstruction 1 4 3 2 5 1 4 2 3 5."— Presentation transcript:

Similar presentations

About project

Feedback