Bioinformatics 2011 Molecular Evolution Revised 29/12/06.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
An Introduction to Phylogenetic Methods
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
IE68 - Biological databases Phylogenetic analysis
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
- A brief introduction in 4 hours -
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Bioinformatics and Phylogenetic Analysis
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
07/05/2004 Evolution/Phylogeny Introduction to Bioinformatics MNW2.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Lecture 24 Inferring molecular phylogeny Distance methods
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Phylogenetic trees Sushmita Roy BMI/CS 576
Phylogenetic Analysis
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Molecular Phylogeny and Evolution.
Terminology of phylogenetic trees
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
plants animals monera fungi protists protozoa invertebrates vertebrates mammals Five kingdom system (Haeckel, 1879)
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Phylogeny and the Tree of Life
Introduction to Bioinformatics Resources for DNA Barcoding
Evolutionary genomics can now be applied beyond ‘model’ organisms
Phylogenetic basis of systematics
Inferring a phylogeny is an estimation procedure.
Phylogenetic Inference
Goals of Phylogenetic Analysis
Molecular basis of evolution.
Molecular Evolution.
Chapter 19 Molecular Phylogenetics
#30 - Phylogenetics Distance-Based Methods
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

Bioinformatics 2011 Molecular Evolution Revised 29/12/06

Phylogeny is the inference of evolutionary relationships All forms of life share a common origin. –deduce the correct trees for all species of life – to estimate the time of divergence between organisms since the time they last shared a common ancestor

Terminology Phylogenetic trees that are used to assess the relationships of homologous proteins (or nucleotide sequences) in a family OTU or external node Internal node Branch Bifurcating node Clade Phylogram

Terminology

Species tree versus gene tree In a species tree an internal node represents a speciation event In a gene tree an internal node represents the divergence of an ancestral gene into two new genes with distinct sequences Species tree <> Gene tree –horizontal gene transfer – gene duplications

Species tree versus gene tree Gray et al.

Phylogenetic inference 1.Selection of sequences for analysis 2.Multiple sequence alignment 3.Tree building 4.Tree evaluation

1.selection of sequences for analysis DNA: –Higher phylogenetic signal: Synonymous vs nonsynonymous substitutions (detect negative and positive selection) Protein: –Phylogenetic signal less predominant than in DNA –Better to construct a tree for evolutionary distant species or genes RNA: rRNA often used for constructing species trees Phylogenetic inference

2. multiple sequence alignment This is a critical step in the analysis as in many cases the alignment of amino acids or nucleotides in a column implies that they share a common ancestor If you misalign a group of sequences you will still be able to produce a tree. However, it is not likely to be biologically meaningful. Crap in is crap out! Inspect the alignment to be sure that all sequences are homologous Some times with ClustalW distantly related sequences are not well aligned. Try different gap and extension parameters to improve the alignment Only use these columns of the multiple alignment for which you have data for all organisms or sequences. Delete the columns for which this is not the case. Delete columns with gaps

Phylogenetic inference 3. Tree building Character-based methods Non-character based methods Methods based on an explicit model of evolution Maximum Likelihood Methods/Bayesian Phylogeny Pairwise distance methods Methods not based on an explicit model of evolution Maximum Parsimony Methods

Distance based methods Distance based methods: –calculate the distances between molecular sequences using some distance metric –A clustering method (UPGMA, neighbour joining) is used to infer the tree from the pairwise distance matrix –treat the sequence from a horizontal perspective, by calculating a single distance between entire sequences Advantage: Fast Allow using evolutionary models Disadvantage: sequences reduced to one number

Character based methods Character based methods: –treat the sequences from a vertical perspective – they search for each column of the alignment, the simplest explanation for how the characters evolved. –For instance, MP involves a search for a tree with the fewest number of amino acid (or nucleotide character changes that account for the observed differences between the protein (gene) sequences.

Phylogenetic inference 4. Tree evaluation: bootstrapping sampling technique for estimating the statistical error in situations where the underlying sampling distribution is unknown evaluating the reliability of the inferred tree - or better the reliability of specific branches How to proceed: From the original alignment, columns in the sequence alignment are chosen at random ‘sampling with replacement’ a new alignment is constructed with the same size as the original one a tree is constructed This process is repeated 100 of times

Phylogenetic inference Show bootstrap values on phylogenetic trees majority-rule consensus tree map bootstrap values on the original tree

Maximum parsimony Principle Select that tree that minimizes the total tree length = being the number of nucleic acid substitutions or amino acid replacements required to explain a given set of data. Method a particular topology is considered for this topology, the ancestral sequences at each branching point are reconstructed the minimum number of events to explain the sequence differences over the whole tree is computed: the minimum number of substitutions is computed for each nucleotide (or amino acid) site, and the numbers for all sites are added. another tree topology is chosen

Maximum parsimony

OTU'srooted tree topologiesunrooted tree topologies equation Exhaustive search impossible Heuristics needed

Maximum parsimony Find different tree topologies that are 'equally parsimonious‘ Represent results as a consensus tree. –'strict' consensus tree –'majority-rule' consensus tree

Maximum parsimony Only informative sites of the alignment are used in the construction of the tree: when there are at least two different kinds of characters, each represented at least two times

Maximum parsimony Parsimony trees are usually only represented as a tree topology (cladogram): sometimes, the parsimony program cannot decide in which branches the substitutions have been taken place. It can not calculate branch lengths.

Maximum parsimony Assumptions Equal rate of evolution in all branches no correction for multiple mutations, i.e. no substitution model can be applied (see further) Advantages sequence information is not reduced to one number (such as for example in pairwise distance methods) Disadvantages of maximum parsimony methods can be slow for very large datasets sensitive to unequal rates of evolution in different lineages (see further) =>long branch attraction

Pairwise distance methods Distance calculation Inferring the tree topology

Pairwise distance methods Approach: align pairs of sequences and count the number of differences (Hamming distance). For an alignment of length N with n sites at which there are differences: D= (n/N*100). Problem: observed differences <> actual genetic distances between the sequences. => dissimilarity is an underestimation of the true evolutionary distance, because of the fact that some of the sequence positions are the result of multiple events Solution: Use an evolutionary model that corrects for multiple mutations Distance calculation

Pairwise distance methods Distance calculation

Pairwise distance methods Distance calculation

Pairwise distance methods Other evolutionary models Distance calculation

Pairwise distance methods Distance calculation Unequal mutation rate per position (gamma correction of Jukes Cantor model

Pairwise distance methods Ultrametric trees are rooted trees, in which all the endnodes are equidistant from the root of the tree, Assuming a molecular clock: i.e, that all sequences evolve at a similar rate Tree inference: UPGMA

Pairwise distance methods when two OTUs are grouped, we treat them as a new single OTU when OTUs A, B (which have been grouped before) and C are grouped into a new node ‘u’, then the distance from node ‘u’ to any other node ‘k’ (e.g. grouping D and E) is simply computed as follows: Tree inference: WPGMA

Pairwise distance methods Tree inference: WPGMA

Pairwise distance methods Advantages: Fast Allows incorporation of evolutionary models Disadvantages: Assumption of a molecular clock Tree inference: UPGMA

Pairwise distance methods Additive distances can be fitted to an unrooted tree such that the evolutionary distance between a pair of OTUs equals the sum of the lengths of the branches connecting them, rather than being an average as in the case of cluster analysis Tree construction methods: minimum evolution, the tree that minimizes the sum of the lengths of the branches is regarded the best estimate of the phylogeny Drawback for the ME method: is that in principle all different tree topologies have to be investigated in order to find the ‘minimum’ tree. The neighbour joining (NJ) method, developed by Saitou and Nei (1987) offers a heuristic approach to solve this problem Tree inference: neighbor joining

Pairwise distance methods

Tree inference: neighbor joining

Pairwise distance methods Tree inference: neighbor joining

Pairwise distance methods

Tree inference: neighbor joining Pairwise distance methods

Advantages: Fast Allows incorporation of evolutionary models No assumption of a molecular clock Tree inference: neighbor joining