Phylogenetic trees
ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp Trees
A branch = An edge External node - leaf HumanChimp Chicken Gorilla The root Internal nodes Terminology
HumanChimp Chicken Gorilla INGROUP OUTGROUP Ingroup / Outgroup:
The maximum parsimony principle. (The shortest path) Modified from Inferring Phylogenies (Book), Author: Prof. Joe Felsenstein
Genes: 0 = absence, 1 = presence speciesg1g2g3g4g5g6 s s s s s
s1s4s3 s2 s5 Evaluate this tree…
s1s4s3 s2 s5 1
s1s4s3 s2 s5 01
s1s4s3 s2 s5 110
s1s4s3 s2 s Gene number 1
s1s4s3s2s5 Gene number 1. The most parsimonious ancestral character states
s1s4s3s2s5 Gene number 1, Option number
s1s4s3s2s5 Gene number 1, Option number 2. Minimal number of changes for gene 1 (character 1) =
s1s4s3 s2 s5 00 Gene number 2,
s1s4s3 s2 s5 Gene number 2, Option number
s1s4s3 s2 s5 Gene number 2, Option number
s1s4s3 s2 s Number of changes for gene 2 (character 2) = 2 Gene number 2, Option number 3.
Sum of changes = 9 Genes: 0 = absence, 1 = presence speciesg1g2g3g4g5g6 s s s s s Total number of changes given the tree
Can we do better? Sum of changes = 9
YES WE CAN! Sum of changes = 8 Sum of changes = 9 The MP (most parsimonious) tree:
s1s4s3 s2 s5 The MP (most parsimonious) tree: Sum of changes for this tree topology = 8
Intermediate Summary MP tree = one for which minimal number of changes are needed to explain the data We can now search for the best tree under the MP criterion
Challenges Evaluating big tree “by hand” can be problematic. We want the computer to do it. Going over all the trees? How many trees are there? Can we generalize to nucleotides? To amino acids? Is the parsimony criterion ideal?
MP for nucleotides
Positions: speciesp1p2p3p4p5p6 s1AAGTAA s2CAAAAC s3CAGGAA s4AAATAC s5GCGCCA s1AAGTAA s2CAAAAC s3CAGGAA s4AAATAC s5GCGCCA
s1s4s3 s2 s5 G Position number 1 AACC
s1s4s3 s2 s5 G Position number 1 A A CCA C C C Number of changes for position 1 = 2
GACAGGGA CAAG GCGA GAAA HumanChimp Chicken Gorilla Duck Find the MP score of the tree for these sequences Exercise
How to efficiently compute the MP score of a tree
AG C C A HumanChimp Chicken Gorilla Duck {A,G} {A,C,G} {A,C} Postorder tree scan. In each node, if the intersection between the leaves is empty: we apply a union operator. Otherwise, an intersection. The Fitch algorithm (1971):
AG C C A HumanChimp Chicken Gorilla Duck {A,G} {A,C,G} {A,C} Total number of changes = number of union operators.
Rooting the tree From Wiki commons
Positions: speciesp1p2p3p4p5p6 HumanAAGTAA ChimpAATTAC GorillaACATAA AAAAAAAAA CHGGCHHCG Total number of changes = 0 For all 3 possible tree topologies
Positions: speciesp1p2p3p4p5p6 HumanAAGTAA ChimpAATTAC GorillaACATAA AACCAAAAC CHGGCHHCG Total number of changes = 1 For all 3 possible tree topologies
Positions: speciesp1p2p3p4p5p6 HumanAAGTAA ChimpAATTAC GorillaACATAA TGAATGGTA CHGGCHHCG Total number of changes = 2 For all 3 possible tree topologies
Positions: speciesp1p2p3p4p5p6 HumanAAGTAA ChimpAATTAC GorillaACATAA CHGGCHHCG Total number of changes is always the same for all 3 possible tree topologies
With 4 taxa Orangutan
G OHC H CGO O CHG G HCO H OCG O HGC G COH H OGC O CGH O CGH O HGC O CHG C HGO C OHG C OGH
G OHC H CGO O CGH O CGH C OHG
G OHC H CGO O CGH C CGH C OHG O O G H
The position of the root does not affect the MP score. Conclusion
Chimp Orangutan Gorilla Human C GCA G G G G G G A G After “bending” the trees, the association of changes and branches does not change! Rooting does not change MP score G
Chimp Orangutan Gorilla Human C GCC G G G C C G C G C After “bending” the trees, the association of changes and branches does not change! Rooting does not change MP score
Back to solving the relationships between human, chimp and gorilla… Using an outgroup
No MP with 3 species
Back to solving the relationships between human, chimp and gorilla… Using an outgroup
Human Chimp Chicken Gorilla Human Gorilla Chimp Chicken Human Chicken Chimp Gorilla With 4 taxa, there are 3 difference unrooted trees.
Human Chimp Chicken Gorilla Human Gorilla Chimp Chicken Human Chicken Chimp Gorilla One tree gets a better score (less changes) than the other trees.
Human Chimp Chicken Gorilla We then use an external knowledge, that chicken is the outgroup and get a rooted tree
C X Y H X O CHYO Can you root the unrooted tree to obtain the tree below? Exercise
How many rooted trees result from an unrooted tree with n taxa? Exercise
Assume you have three sequences and the MP score of the unrooted tree is X. You now add another sequence. Can the score of the 4-taxa tree be lower than that of the 3 taxa tree? Exercise