Lecture 24 Inferring molecular phylogeny Distance methods

Slides:



Advertisements
Similar presentations
Introduction to Molecular Evolution
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
An Introduction to Phylogenetic Methods
Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.
Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
IE68 - Biological databases Phylogenetic analysis
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
BME 130 – Genomes Lecture 26 Molecular phylogenies I.
Phylogenetic reconstruction
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Parsimony 2.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Tree-Building. Methods in Tree Building Phylogenetic trees can be constructed by: clustering method optimality method.
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Terminology of phylogenetic trees
Molecular phylogenetics
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Molecular Phylogeny. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Phylogeny Ch. 7 & 8.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Phylogenetics.
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Multiple Sequence alignment and Phylogenetic trees.
Evolutionary genomics can now be applied beyond ‘model’ organisms
Phylogenetic basis of systematics
Phylogeny - based on whole genome data
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
Motif discovery and Phylogenetic trees.
Patterns in Evolution I. Phylogenetic
Inferring phylogenetic trees: Distance and maximum likelihood methods
Phylogenetic Trees.
#30 - Phylogenetics Distance-Based Methods
Lecture 7 – Algorithmic Approaches
Phylogeny.
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

Lecture 24 Inferring molecular phylogeny Distance methods Bioinformatics Lecture 24 Inferring molecular phylogeny Distance methods Discrete methods Comparisons of different tree building methods Estimating sampling error: the bootstrap

Inferring molecular phylogeny The objective of molecular phylogenetics is to convert sequences information (DNA, RNA, proteins) into an evolutionary tree for this sequences. Ever growing number of tree building methods can very roughly be split into two approaches. Distance methods versus discrete characters methods. Clustering methods versus search methods. These methods will be considered during the lecture.

Distance methods The simplest distance method based on assumption of constant substitution rates and approximately equal length of neighboring branches called UPGMA (Unweighted Pair Group Method with Arithmetic Mean). A distance matrix, representing distances between all possible pairs of sequences used for the phylogenetic reconstruction must be built as a first step. The UPGMA starts from calculating branch length

Distance methods: an idealised case A. Sequences Sequence A ACGCGTTGGGCGATGGCAAC Sequence B ACGCGTTGGGCGACGGTAAT Sequence C ACGCATTGAATGATGATAAT Sequence B ACACATTGAGTGATAATAAT B. Distances between sequences nAB 3 nAC 7 nAD 8 nBC 6 nBD 7 nCD 3 C. Distance table OTU A B C D - 3 7 8 6 D. The assumed unrooted tree A C D B 1 2 4

Diagram illustrating the stepwise construction of a phylogenetic tree for four OTUs according to unweighted pair group method with arithmetic mean (UPGMA). The resulting tree is ultrametric. Methods used: distance and clustering. 8 - C 13 11 B 7 14 A D 9.5 13.5 AD dAD 2 d(AB)C d(ADC)B) 3.5 (AD)B = (AB + DB)/2 Values for these tables are calculated from the data presented in the initial table (ADC)B = (AB + DB + CB)/3 4.75 6.33 12.67 ADC (AD)C = (AC + DC)/2

Neighbours-joining tree construction. Methods: distance and clustering. OTU H C G O 1.45* - 1.51 1.57 2.98 2.94 3.04 R 7.51 7.55 7.39 7.10 H – Human C – Chimpanzee G – Gorilla O – Orangutan R – Rhesus monkey * Number of nucleotide substitutions per 100 sites between OTUs.

Neighbours-relation scores obtained from the distance matrix (see previous slide) Calculation of the total scores: (dHG + dCO) – min score each pair (HG) and (CO) is assigned score of 1; other pairs score 0. As a result the scores are obtained, which are shown in the table. (OR) has the highest total score.

Building Neighbours-Joining (NJ) tree 5.22 5.25 (OR) 1.57 1.51 G 1.45 C H OTU Treating (OR), which has the highest total score, as a separate single OUT, the following table can be calculated. As only 4 OTUs are left, it is easy to see that dHC + dG(OR) = 6.67 < < dHG + dC(OR) = 6.76 < < dH(OR) + DCG = 6.82 Therefore, H and C are chosen as one pair of neighbours G and (OR) as the other.

Maximum parsimony Methods: discrete characters and search/optimisation Informative sites (*) in four compared sequences, used for phylogenetic reconstruction.   Site Sequence 1 2 3 4 5 6 7 8 9 A G T C  Inf. sites *

Three possible unrooted trees (I, II and III) for four DNA sequences (1, 2, 3, 4) that have been used to choose the most parsimonious tree.

Comparison of different tree-building methods Efficiency (how fast is the method?), Power (how much data does the method need to produce reasonable result?) Consistency (will it converge on the right answer given enough data?) Robustness (will minor violations of the method’s assumptions result in poor estimates of phylogeny?) Falsibility (will the method tell when its assumption violated, in order to avoid using this method)

Performance of UPGMA and parsimony methods The success rate is the percentage of times that the correct tree was recovered in that region of the parameter space. White area in the left top of the both diagram, where non of the methods performs well

MEGA 3

MEGA3: Sequence Data Explorer Variable sites Parsimonious sites Sequences continue

MEGA 3: phylogenetic trees Neighbor- joining (NJ) Minimum evolution (ME) Maximum Parsimony (MP) UPGMA

Bootstrapping NJ ME MP UPGMA