Doug Raiford Lesson 9.  3 Approaches  Distance  Parsimony  Maximum Likelihood  Have already seen a distance method 12/18/20152Phylogenetics Part.

1 Doug Raiford Lesson 9

2  3 Approaches  Distance  Parsimony  Maximum Likelihood  Have already seen a distance method 12/18/20152Phylogenetics Part II

3  What’s wrong with UPGMA?  Let’s revisit the example  Can this be? Doesn’t the derived tree imply that B is equidistant from C and D 12/18/2015Phylogenetics Part II3 ABCD ABCD A0767 B045 C03 D0

4  UPGMA averaged the two and put them both (branches for C and D) at 1.5  What if don’t have equal rates of evolution after a divergence 12/18/2015Phylogenetics Part II4 ABCD ABCD A0767 B045 C03 D0 4.5 12 2.5

5  Differing rates of evolution can sometimes cause problems with UPGMA  Especially if very similar (small distances) 12/18/2015Phylogenetics Part II5 ABC A043 B03 C0 ABC 1 21 1 This treeYields this matrixYields this tree BCA

6  Also called minimum evolution method  Definition of parsimony: 1 a : the quality of being careful with money or resources : thrift b : the quality or state of being stingy 2 : economy in the use of means to an end; especially : economy of explanation in conformity with Occam's razor  Ockham's razor: the simplest explanation is usually the best 12/18/20156Phylogenetics Part II

7  Looks at each column of an MSA and attempts to find a tree that describes  Builds a consensus tree atgccgca-actgccgcaggagatcaggactttcatgaatatcatcatgcgtggga-ttcag acctccatacgtgccccaggagatctggactttcacc---tggatcatgcgaccgtacctac t-atgg-t-cgtgccgcaggagatcaggactttca-gt--g-aatcatctgg-cgc--c-aa t--tcgt-ac-tgccccaggagatctggactttcaaa---ca-atcatgcgcc-g-tc-tat aattccgtacgtgccgcaggagatcaggactttcag-t--a-tatcatctgtc-ggc--tag 12/18/20157Phylogenetics Part II

8  What do we mean when we say “attempts to find a tree that describes”  Attempts to fit all possible trees in each column and choose best  How determine all possible trees?  How determine which one has the best fit?  Assume that majority nucleotide represents ancestor AGCT AACT One possible tree AAAG A 0 0 A or a G 0 if A 1 if A 12/18/20158Phylogenetics Part II Total mutations that explain this tree = 1 Pretty darn good

9  When there are two organisms there is only one possible tree AB 12/18/20159Phylogenetics Part II

10  What about when there are three  Third could go… AB 12/18/201510Phylogenetics Part II

11  For each of the previous 3 trees, could add 4 th to any of its branches (or could form a new root)  Each of the possible trees had 4 branches so could add to one of 4 locations (or splice in at top)  So total number of trees with 4 leaves:  3*5=15 12/18/2015Phylogenetics Part II11 AB If this were the tree

12  N i is number of trees given i taxa  B i is the number of branches in a tree given i taxa  B i =B i-1 +2, also i x 2-2  N i =N i-1 *(B i-1 +1)  plus 1 due to possible new root  N 2 = 1  B 2 =2 12/18/2015Phylogenetics Part II12 TaxaBranchesTrees 221 343 4615 58105 610945 71210,395 814135,135 9162,027,025 101834,459,425 1120654,729,075 Defined by a recurrence relation so … That’s right, as usual, exponential Defined by a recurrence relation so … That’s right, as usual, exponential What does this growth rate look like?

13  Rooted vs. un-rooted  Wherever the root is, un-kink it 12/18/2015Phylogenetics Part II13

14  Always bifurcated  Can never have 3 branches “from” a single node  What are the odds? 12/18/2015Phylogenetics Part II14 A BC D

15  Three possible trees 12/18/2015Phylogenetics Part II15 A BC D A DC B A CB D Are there any other combinations?

16  For each of the three trees (having 4 taxa) could add a branch to any of the 5 branches  3*5=15 trees 12/18/2015Phylogenetics Part II16 A BC D

17  Outgroup  Include an organism that is known to be further away from all taxa than they are from each other 12/18/201517Phylogenetics Part II A BC D If outgroup goes here… outgroup ABCD

18  N i is number of trees given i taxa  B i is the number of branches in a tree given i taxa  B i =B i-1 +2, also i x 2-3  N i =N i-1 *(B i-1 )  No need for a “plus 1” for a possible new root because there are no roots  N 2 = 1  B 2 =2 12/18/2015Phylogenetics Part II18 TaxaBranchesTrees 331 453 5715 69105 711945 81310,395 915135,135 10172,027,025 111934,459,425 1221654,729,075

19  Noticed that for un-rooted trees:  B i =2i-3 (for i  2)  Also noticed  N i =N i-1 *B i-1  And reduced to  (2n-5)(2n-7)(2n-9)…(3)(1) where n is number of taxa  Shorthand: (2n-5)!!  For rooted  N i =N i-1 *(B i-1 +1)  Reduced to  (2n-3)!! 12/18/201519Phylogenetics Part II Ni=B i-1 *N i-1 =(2(i-1)-3)N i-1 =(2i-5)N i-1 =(2i-5)(2i-7)N i-2 Till the N term gets to 3 Double factorial: each successive number reduced by two

20  Radical reduction in the number  Still only bought one additional taxa 12/18/2015Phylogenetics Part II20 TaxaUn-rooted treesRooted trees 313 4315 5 105 6 945 7 10,395 8 135,135 9 2,027,025 102,027,02534,459,425 1134,459,425654,729,075 12654,729,07513,749,310,575

21  Even brighter mathematicians 12/18/201521Phylogenetics Part II Can you see why?

22  Not really a candidate for dynamic programming  Don’t repeat a bunch of sub- problems over and over  Each sub-problem is a tree, and they are all unique 12/18/2015Phylogenetics Part II22 Still exponential

23  Discard large subsets of possible solutions  Use heuristics or predictions 12/18/2015Phylogenetics Part II23 Don’t bother

24  Calculate a reasonable upper bound using a fast algorithm like UPGMA (hierarchical clustering)  Incrementally grow potential trees  Any branch that any that go over threshold stop investigating 12/18/2015Phylogenetics Part II24 A BC D X X X Don’t bother, over threshold

25  Some columns all same  Add no meaning  All trees minimum  Columns that are all different  Also add no meaning  Must have minimum 2 nt’s (or aa’s) that are the same  Useful in one respect  If all the same infer makeup of ancestor 12/18/2015Phylogenetics Part II25 AGCT AACT ACCT AAAA A 0 0 A A 0 0 00

26  Each column yields a tree  If all agree done  If some different use majority rule  If sample too small perform bootstrapping  randomly draw sequences from MSA  Generate more trees  labeled branches with the percentage of bootstrap trees in which they appear  Used as a measure of support (repeatability) 12/18/2015Phylogenetics Part II26

27  Still have maximum likelihood  Also, some inferential stuff, but that’s all in the next lecture 12/18/2015Phylogenetics Part II27

28 12/18/201528Phylogenetics Part III

