Presentation is loading. Please wait.

Presentation is loading. Please wait.

BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011.

Similar presentations


Presentation on theme: "BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011."— Presentation transcript:

1 BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011

2 Phylogenetics  According to the evolutionary theory, all life forms on this planet are related to one another by descent.  Traditionally, phylogenetics is the study of the evolutionary relationships of a group of organisms.  The evolutionary relationships of organisms are usually described by means of a phylogenetic tree.  Charles Darwin’s tree of life: The first conceptual evolutionary tree of life.

3 Molecular phylogenetics  Ernst Haeckel’s tree of organisms was based on the similarity of morphological features or organisms. Ernst Haeckel, Stem-Tree of Organisms, 1866, courtesy Robert J. Richards  With the availability of protein and DNA sequences, phylogenetic trees are constructed based on these sequences  The studies that use molecular sequence to deduce the evolutionary relationships among organisms and genes is called molecular phylogenetics.  There are good reasons for using molecular sequence data to study phylogenetics: 1.Sequences evolve in a more regular manner than do other features; 2.Sequences are more amenable to quantitative treatment. 3.Sequence data are more abundant.

4 Phylogenetic trees  A phylogenetic tree is an acyclic (no loop is allowed) graph, in which nodes denote taxonomic units (organism, molecules), and branches (edges) connecting the nodes denote the relationships of the taxonomic units in terms of descent and ancestry.  The length of an edge usually reflects the number of evolutionary changes or time of divergence between the two taxonomic units.  Phylogenetic trees are usually binary trees: there is no more than three edges connecting a node.  If a node has more than one edges connecting to it, it is called an internal node, otherwise it is called an external node.  Internal nodes represent ancestral taxonomic units, external nodes represent the extant taxonomic units, and are referred to as operational taxonomic units (OTUs).

5 Phylogenetic trees  Edges that connect to external nodes are called external or peripheral edges, and those that connect internal nodes are called internal edges.  If we know the earliest branded edge, which is called an outgroup, we can place a root on the edge connecting the outgroup, then the tree is a rooted tree.  If we know or if we can deduce the time of divergence between organisms, then OTUs will line up level on the same line representing the current time, and vertical length between the two nodes represent the time of their divergence.  The root represents the common ancestor of the OTUs. From the root we can find an evolutionary path to the OUT.  The branching pattern of a tree is called its topology. Rotating a tree around its internal nodes does not change its topology. Outgroup Rotate around the root by180°

6 Phylogenetic trees  A tree is said to be additive if the distance between any two OTUs is equal to the sum of the length of all the branches connecting them. A B C D E I 2 1 2 1 2 3 6 1 G F H For example, if additivity holds, then the distance between A and C is 2 + 1 + 3 + 2 = 8  The distance between two OTUs are computed based on the molecular sequences, while the branch lengths are estimated from the distances between OTUs according to certain rules, so the additivity may not necessarily hold for some algorithms.

7 Phylogenetic trees  If we use the number of changes (or evolutionary distance) to scale the edges, the OTUs may not line up level on the same line, because the rates of evolution of different taxonomic units may be different. However, if the tree is rooted, we still know the order of branch during the course of evolution.  If the evolutionary rates are the same, then the OTUs will line up level on the same line, and the evolutionary distances can be converted to time of divergence using the same scaling factor. In this case, the tree is ultrametric.  If the branch length on a rooted tree is the time of divergence, then for any three OTUs, two branches among them is longer than the third, and the two longest distances are the same. This property of such a tree is called ultrametricity.  An ultrametric tree is also additive, but the reverse is not necessarily true.

8 Urooted trees  If we do not known the earliest branch in the tree, the tree is unrooted.  An unrooted tree can only specify the relationships among the OTUs, but does not define an evolutionary path of an OTU.  Most phylogenetic tree construction methods produce unrooted trees.  To convert an unrooted tree to a rooted one, we usually include an outgroup for the analysis.  An outgroup can be identified according to other information, such as paleontological evidence, and morphological evidence. The root is placed on the edge that connects the outgroup and the other OTUs. Unrooted tree Rooted tree Outgroup

9 Monophyletic groups and clades  The collection of all the descendents of an ancestor is called a monophyletic group or a clade.  A group of OTUs that do not include all the descendents of a common ancestor is called a paraphyletic group. A B C D E I 2 1 2 1 2 3 6 1 G F H  The OTUs A and B form a monophyletic groups; A, B, C, D and E is a monophyletic group; C and D is a monophyletic group.  A and C is paraphyletic group; A, B and E is paraphyletic group; and B and C is paraphyletic group, etc.

10 Gene trees and species tree  When we construct a phylogenetic tree of a group of genes, the tree reflects the evolutionary relationships of the genes, and it is called a gene tree.  The phylogenetic tree that describes the evolutionary relationships of a groups of organism is called a species tree.  When we want to infer the species tree of a group of organisms using molecular sequence data, we pick up the gene or genes that are most informative about the evolutionary history of the organisms, and use the constructed gene tree to represent the species tree. The first gene trees were based on cytochromes c and hemoglobin sequences. The widely accepted species trees of organisms were based on the small subunit of ribosome RNA (rRNA) gene sequences, since all organisms have the genes, and they tend to evolve slowly.

11 Methods for phylogenetic tree reconstruction  Numerous tree-construction methods have been proposed, because no method performs well under all circumstances  Most of these methods depend on a multiple alignment of the sequences.  The gaps in the original multiple alignment will be removed, and sometimes manual adjustments are necessary to remove uncertain alignment in the variable regions.  A high quality alignment is necessary for the correct inference of a phylogenetic tree. A multiple alignment of mitochondria rRNA genes of some mammals

12  These alignment based tree-construction methods can be divided into three categories: Methods for phylogenetic tree reconstruction 1.Distance matrix methods: evolutionary distance (number of substitutions/per site per time unit) between each pair of sequences is computed based on a substitution model of sequence evolution, a tree is then constructed by an algorithm based on some functional relationships among the distance values. 2.Maximum parsimony methods: a tree is constructed through the identification of the tree that confers the shortest path that leads to the changes in the aligned sequences. 3.Maximum likelihood methods: likelihood values for possible trees are computed, and the tree that has the maximal likelihood value is selected as the inferred tree.

13  This is the simplest method for tree construction. In order for this method to work correctly, we have to assume that the rate of evolution (rate of substitution) is the same for all lineages so that a linear relation exists between the evolutionary distance and the time of divergence.  After obtaining a multiple alignment, we first compute the evolutionary distance d between any pair of sequences based on a sequence substitution model, i.e., J-C and K2P models, etc. Unweighted pair-group method with an arithmetic mean (UPGMA) The J-C distance matrix of the small subunit mitochondrial rRNA genes in a group of catarrhini

14  We have discussed the UPGMA algorithm earlier, here is its adaptation to constructing a phylogenetic tree: The UPGMA algorithm Step 1: Assign each OUT as a distinct cluster; Step 2: Joint two clusters that have the shortest distance, the length of the branch equals the distance between the two clusters, put branching point at the middle of the branch; Step 3: Re-compute the similarity scores among the clusters if their similarity scores have not been computed as the average pairwise distances, Step 4: Repeat steps 2 and 3 until all OUTs have been linked to another cluster.

15  Let’s first look at a toy example: The UPGMA algorithm A B C D d AB d AC d AD d AB d BC d BD d AC d BC d CD d AD d BD d CD A B C D A B If d AB is the smallest, join AB, and put a branching point at the middle of the edge (AB) C D d (AB)C d (AB)D d (AB)C d CD d (AB)D d CD (AB) C D If d (AB)C is the smallest, join (AB) and C, and put a branching point at the middle of the edge A B C

16 The UPGMA algorithm (AB) C D d (AB)C d (AB)D d (AB)C d CD d (AB)D d CD (AB) C D If d (AB)C is the smallest, join (AB) and C, and put a branching point at the middle of the edge A B C (ABC) D d (ABC)D (ABC) D Join (ABC) and D, and put a root at the middle of the edge A B C D Root

17  The tree constructed by UPGMA method is rooted, the root is put on the middle point of the last jointed cluster.  The UPGMA tree is ultrametric: i.e., given any three OTUs, the two longest distances among them are the same.  A distance matrix is said to be ultrametric if an ultrametric tree can be constructed such that the distance between any two OTUs is the same as that specified in the matrix.  If the distance matrix is ultrametric, then, UPGMA guarantees to produce a correct tree, i.e., d tree ij = d ij.  However, in reality the ultrametricity may not hold, as the sequence may evolve at different rate.  Even if the sequences evolve at the same rate, the distances among them will be only approximately ultrametric, because of the random nature of nucleotide substitutions.  When the distance matrix is far away from ultrametric, the resulting tree will be very different from the true tree, i.e., d tree ij ≠ d ij. The UPGMA algorithm

18  An real-world example: mitochondria rRNA genes of a group of closely related Catarrhini: The UPGMA algorithm  The algorithm first joint chimpanzee and pygmy chimp, and then add Human, and so on.  The tree is ultrametric and hence additive, but the matrix is only approximately ultrametric. d AB = 0.0865 x 2 = 0.173 A B


Download ppt "BINF6201/8201 Molecular phylogenetic methods 1 11-01-2011."

Similar presentations


Ads by Google