Phylogentic Tree
Evolution Evolution of organisms is driven by Diversity Different individuals carry different variants of the same basic blue print Mutations The DNA sequence can be changed due to single base changes, deletion/insertion of DNA segments, etc.
Basic Assumptions Closer related organisms have more similar genomes. Highly similar genes are homologous (have the same ancestor). A universal ancestor exists for all life forms. Phylogenetic relation can be expressed by a dendrogram (a “tree”).
phylogenetic tree phylogenetic tree is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species;
Ancestral Node or ROOT of the Tree Internal Nodes Branches or Lineages Terminal Nodes A B C D E Common Phylogenetic Tree Terminology
Phylogenetic trees diagram the evolutionary relationships between the taxa ((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses Taxon A Taxon B Taxon C Taxon E Taxon D No meaning to the spacing between the taxa, or to the order in which they appear from top to bottom. This dimension either can have no scale, can be proportional to genetic distance or amount of change (for ‘phylograms’ or ‘additive trees’), or can be proportional to time. These say that B and C are more closely related to each other than either is to A, and that A, B, and C form a clade that is a sister group to the clade composed of D and E. If the tree has a time scale, then D and E are the most closely related.
Historical Note Until mid 1950’s phylogenies were constructed by experts based on their opinion (subjective criteria) Since then, focus on objective criteria for constructing phylogenetic trees Thousands of articles in the last decades Important for many aspects of biology Classification Understanding biological mechanisms
Morphological vs. Molecular Classical phylogenetic analysis: morphological features: number of legs, lengths of legs, etc. Modern biological methods allow to use molecular features Gene sequences Protein sequences Analysis based on homologous sequences in different species
Morphological topology Archonta Glires Ungulata Carnivora Insectivora Xenarthra (Based on Mc Kenna and Bell, 1997)
RatQEPGGLVVPPTDA RabbitQEPGGMVVPPTDA GorillaQEPGGLVVPPTDA CatREPGGLVVPPTEG From sequences to a phylogenetic tree There are many possible types of sequences to use.
Perissodactyla Carnivora Cetartiodactyla Rodentia 1 Hedgehogs Rodentia 2 Primates Chiroptera Moles+Shrews Afrotheria Xenarthra Lagomorpha + Scandentia Mitochondrial ( 线粒体 ) topology (Based on Pupko et al.,)
What can we get from phylogenetic trees? A few examples of what can be inferred from phylogenetic trees built from DNA or protein sequence data: Which species are the closest living relatives of modern humans? Did the infamous Florida Dentist infect his patients with HIV?
Which species are the closest living relatives of modern humans? Mitochondrial DNA, most nuclear DNA-encoded genes, and DNA/DNA hybridization all show that bonobos and chimpanzees are related more closely to humans than either are to gorillas. MYA Chimpanzees Orangutans Humans Bonobos Gorillas 0 14
Did the Florida Dentist infect his patients with HIV? DENTIST Patient D Patient F Patient C Patient A Patient G Patient B Patient E Patient A Local control 2 Local control 3 Local control 9 Local control 35 Local control 3 Yes: The HIV sequences from these patients fall within the clade of HIV sequences found in the dentist. No Phylogenetic tree of HIV sequences from the DENTIST, his Patients, & Local HIV-infected People:
Types of trees Unrooted tree represents the same phylogeny without the root node
Rooted versus unrooted trees Tree A a b Tree B c Tree C Represents the three rooted trees
Inferring evolutionary relationships between the taxa requires rooting the tree: To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug on it until the ends of the string (the taxa) fall opposite the root: A B C Root D A B C D Note that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D. Rooted tree Unrooted tree
Now, try it again with the root at another position: A B C Root D Unrooted tree Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D. C D Root Rooted tree A B
An unrooted, four-taxon tree theoretically can be rooted in five different places to produce five different rooted trees The unrooted tree 1: AC B D Rooted tree 1d C D A B 4 Rooted tree 1c A B C D 3 Rooted tree 1e D C A B 5 Rooted tree 1b A B C D 2 Rooted tree 1a B A C D 1 These trees show five different evolutionary relationships among the taxa!
x C A B D AD B E C A D B E C F Each unrooted tree theoretically can be rooted anywhere along any of its branches N (2N - 5)!/(2N - 3 (N - 3)!) (2N - 3)!/(2N - 2 (N - 2)!)
By outgroup: Uses taxa (the “outgroup”) that are known to fall outside of the group of interest (the “ingroup”). Requires some prior knowledge about the relationships among the taxa. There are two major ways to root trees: A B C D By midpoint or distance: Roots the tree at the midway point between the two most distant taxa in the tree, as determined by branch lengths. This assumption is built into some of the distance-based tree building methods. outgroup d (A,D) = = 18 Midpoint = 18 / 2 = 9
Two Methods of Tree Construction Distance- A tree that recursively combines two nodes of the smallest distance. Parsimony – A tree with a total minimum number of character changes between nodes.
Types of data used in phylogenetic inference: Character-based methods: Use the aligned characters, such as DNA or protein sequences, directly during tree inference. Taxa Characters Species AATGGCTATTCTTATAGTACG Species BATCGCTAGTCTTATATTACA Species CTTCACTAGACCTGTGGTCCA Species DTTGACCAGACCTGTGGTCCG Species ETTGACCAGTTCTCTAGTTCG Distance-based methods: Transform the sequence data into pairwise distances (dissimilarities), and then use the matrix during tree building. A B C D E Species A Species B Species C Species D Species E
Distance-Based Method Input: distance matrix between species For two sequences s i and s j, perform a pairwise (global) alignment. Let f = the fraction of sites with different residues. Then Outline: Cluster species together Initially clusters are singletons At each iteration combine two “closest” clusters to get a new one (Jukes-Cantor Model)
Unweighted Pair Group Method using Arithmetic Averages (UPGMA) UPGMA is a type of Distance-Based algorithm UPGMA steps:. 1. Cluster the two species with the smallest distance putting them into a single group. 2. Recalculate the distance matrix with the new group against other groups: 3. With the new distance matrix repeat 1 until all species have been grouped.
Algorithm
UPGMA Step 1 SpeciesABCD B9 ––– C811 –– D – E Merge D & E DE SpeciesABC B9 –– C811 – DE d(DE)A = 0.5 * (dDA+dEA) = 0.5*(12+15) = 13.5 d(DE)B = 0.5 * (dDB+dEB) = 0.5*(15+18) = 16.5 d(DE)C = 0.5 * (dDC+dEC) = 0.5*(10+13) = 11.5
UPGMA Step 2 Merge A & C DE SpeciesABC B9 –– C811 – DE AC SpeciesBAC 10 – DE
UPGMA Steps 3 & 4 Merge B & AC DEAC SpeciesBAC 10 – DE B Merge ABC & DE DEACB (((A,C)B)(D,E))
Optimality criterion: The ‘most-parsimonious’ tree is the one that requires the fewest number of evolutionary events (e.g., nucleotide substitutions, amino acid replacements) to explain the sequences. Parsimony-score: Number of character-changes ( mutations ) along the evolutionary tree Example: Most Parsimonious Tree (MP Tree) AGA AAA AAG GGA AAA AGA AAA AAG GGA AAA AGA Most parsimonious tree: Tree with minimal parsimony score Score = 4 Score = 3
We cannot go over all the trees. We will try to find a way to find the best tree. There are approximate solutions… But what if we want to make sure we find the global maximum. There is a way more efficient than just go over all possible tree. It is called BRANCH AND BOUND and is a general technique in computer science, that can be applied to phylogeny. There are many trees..,
BRANCH AND BOUND To exemplify the BRANCH AND BOUND (BNB) method, we will use an example not connected to evolution. Later, when the general BNB method is understood, we will see how to apply this method to finding the MP tree. We will present the traveling sales person path problem (TSP).
Branch and Bound for TSP Find a minimum cost round-trip path that visits each intermediate city exactly once Greedy approach: A,G,E,F,B,D,C,A = 251 A C F E D G B
Search all possible paths All paths A G (20) A G F (88) AGFBAGFBAGFEAGFEAGFCAGFC A G E (55) A B (46)A C (93) A C B (175) A C B E (257) ACDACDACFACF Best estimate: 251
Back to finding the MP tree Finding the MP tree BNB helps, though it is still exponential…
The MP search tree is added to branch is added to branch 2. There are 5 branches
The MP search tree 4 is added to branch
MP-BNB 4 is added to branch Best (minimum) value = 52
MP-BNB 4 is added to branch Best record = 52
MP-BNB 4 is added to branch Best record = 52
MP-BNB Best record = 52
MP-BNB Best record = 52
MP-BNB Best record =
MP-BNB Best record =
MP-BNB Best record =
MP-BNB Best record =
MP-BNB Best TREE. MP score = 42 Total # trees visited: 14
Order of Evaluation Matters Evaluate all 3 first Total tree visited: 9 The bound after searching this subtree will be 42.