1 Dan Graur Methods of Tree Reconstruction. 2 3.

Slides:



Advertisements
Similar presentations
Introduction to Molecular Evolution
Advertisements

Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
An Introduction to Phylogenetic Methods
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
IE68 - Biological databases Phylogenetic analysis
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
Lecture 24 Inferring molecular phylogeny Distance methods
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Terminology of phylogenetic trees
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Tree Inference Methods
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Johns Hopkins University - Fall 2003 Phylogenetics & Computational Genomics Lecture #6 Page 1 Week6: Intro to Phylogenetic Reconstruction.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Fixations along phylogenetic lineages. Phylogenetic reconstruction: a simplification of the evolutionary process.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
A brief introduction to phylogenetics
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Lecture 2: Principles of Phylogenetics
Introduction to Phylogenetics
MOLECULAR PHYLOGENETICS Four main families of molecular phylogenetic methods :  Parsimony  Distance methods  Maximum likelihood methods  Bayesian methods.
Calculating branch lengths from distances. ABC A B C----- a b c.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Phylogeny Ch. 7 & 8.
Phylogenetics.
Phylogenetic Trees - Parsimony Tutorial #13
Construcción de cladogramas y Reconstrucción Filogenética
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each.
Phylogenetic basis of systematics
Inferring a phylogeny is an estimation procedure.
Multiple Alignment and Phylogenetic Trees
Goals of Phylogenetic Analysis
Systematics: Tree of Life
Inferring phylogenetic trees: Distance and maximum likelihood methods
Systematics: Tree of Life
The Most General Markov Substitution Model on an Unrooted Tree
Lecture 7 – Algorithmic Approaches
Phylogeny.
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

1 Dan Graur Methods of Tree Reconstruction

2

3

4

5

6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state (based on character states) 3. maximum likelihood (based on both character states and distances)

7 DISTANCE-MATRIX METHODS In the distance matrix methods, evolutionary distances (usually the number of nucleotide substitutions or amino-acid replacements between two taxonomic units) are computed for all pairs of taxa, and a phylogenetic tree is constructed by using an algorithm based on some functional relationships among the distance values.

8 Multiple Alignment

9 Compute pairwise distances by correcting for multiple hits at a single sites Number of differences Number of changes (e.g., number of nucleotide substitutions, number of amino acid replacements)

10 * Distance Matrix* * *Units: Numbers of nucleotide substitutions per 1,000 nucleotide sites

11 Distance Methods: UPGMA Neighbor-relations Neighbor joining

12 UPGMA Unweighted pair-group method with arithmetic means

13 UPGMA employs a sequential clustering algorithm, in which local topological relationships are identified in order of decreased similarity, and the tree is built in a stepwise manner.

14 simple OTUs

15 composite OTU

16

17

18 UPGMA yields the correct answer only if the distances are ultrametric! Q: What happens if the distances are only additive? Q: What happens if the distances are not even additive?

19 Neighborliness methods The neighbors-relation method (Sattath & Tversky) The neighbor-joining method (Saitou & Nei)

20 neighbors In an unrooted bifurcating tree, two OTUs are said to be neighbors if they are connected through a single internal node.

21 If we combine OTUs A and B into one composite OTU, then the composite OTU (AB) and the simple OTU C become neighbors.

A B C D + < + = + Four-Point Condition

23 The Neighbor Joining Method

24 In distance-matrix methods, it is assumed: SimilarityKinship Similarity  Kinship

25

26 Similarities among OTUs can be due to: Ancestry: –Shared ancestral characters (symplesiomorphies) –Shared derived characters (synapomorphy) Homoplasy: –Convergent events –Parallel events –Reversals From Similarity to Relationship

27 Parsimony Methods: Willi Hennig

28 William of Occam (ca ) English philosopher & Franciscan monk William of Occam was “solemnly” excommunicated by Pope John XXII. [Entities must not be multiplied beyond necessity]

29 MAXIMUM PARSIMONY METHODS Maximum parsimony involves the identification of a topology that requires the smallest number of evolutionary changes to explain the observed differences among the OTUs under study. In maximum parsimony methods, we use discrete character states, and the shortest pathway leading to these character states is chosen as the “best” or maximum parsimony tree. Often two or more trees with the same minimum number of changes are found, so that no unique tree can be inferred. Such trees are said to be equally parsimonious.

30 invariant

31 variant

32 uninformative

33 informative

34

35

36

37

38 In the case of four OTUs, an informative site can only favor one of the three possible alternative trees. Thus, the tree supported by the largest number of informative sites is the most parsimonious tree.

39 Inferring the maximum parsimony tree: 1. Identify all the informative sites. 2. For each possible tree, calculate the minimum number of substitutions at each informative site. 3. Sum up the number of changes over all the informative sites for each possible tree. 4. Choose the tree associated with the smallest number of changes as the maximum parsimony tree.

Maximum parsimony (Practice): Data 1.TGCA 2.TACC 3.AGGT 4.AAGT Step 1. Identify all the informative sites. ***

41 Maximum parsimony (Practice): Data 1.TGC 2.TAC 3.AGG 4.AAG Step 2. For each possible tree, calculate the minimum number of substitutions at each informative site.

42 Maximum parsimony (Practice): Data 1.TGC 2.TAC 3.AGG 4.AAG Step 3. Sum up the number of changes over all the informative sites for each possible tree

43 Maximum parsimony (Practice): Data 1.TGC 2.TAC 3.AGG 4.AAG Step 4. Choose the tree associated with the smallest number of changes as the maximum parsimony tree

44 Problem (exaggerated)

45 Fitch’s (1971) method for inferring nucleotides at internal nodes The set at an internal node is the intersection (  ) of the two sets at its immediate descendant nodes if the intersection is not empty. The set at an internal node is the union (  of the two sets at its immediate descendant nodes if the intersection is empty. When a union is required to form a nodal set, a nucleotide substitution at this position must be assumed to have occurred.

46 Fitch’s (1971) method for inferring nucleotides at internal nodes 4 substitutions 3 substitutions

47 Testing properties of ancestral proteins The ability to infer in silico the sequence of ancestral proteins, in conjunction with some astounding developments in synthetic biology, allow us to “resurrect” putative ancestral proteins in the laboratory and test their properties. These properties, in turn, can be used to test hypotheses concerning the physical environment which the ancestral organism inhabited (its paleoenvironment).

48 Testing properties of ancestral proteins Gaucher et al. (2003) used EF-Tu (Elongation-Factor thermounstable) gene sequences from completely sequenced mesophile eubacteria to reconstruct candidate ancestral sequences at nodes throughout the bacterial tree. These inferred ancestral proteins were, then, synthesized in the laboratory, and their activities and thermal stabilities were measured and compared to those of extant organisms. Thermostability curves The temperature profile of the inferred ancestral protein was 55°C, suggesting that the ancestor of extant mesophiles was a thermophile.

49 Ancestral reconstruction is not possible with morphological data.

50 The impossibility of exhaustively searching for the maximum-parsimony tree when the number of OTUs is large

51 all best Exhaustive = Examine all trees, get the best tree (guaranteed). some best Branch-and-Bound = Examine some trees, get the best tree (guaranteed). may or may not be best Heuristic = Examine some trees, get a tree that may or may not be the best tree.

52 Exhaustive

53 Branch- and-Bound Rationale: The length of a tree with n +1 OTUs can either be equal to or larger than the length of a tree with n OTUs. Reminder: The total number of substitutions in a tree = tree length

54 Branch -and- Bound Obtain a tree by a fast method. (e.g., the neighbor-joining method) Compute numbers of substitutions (L) for this tree. Turn L into an upper bound value. Rationale: the maximum parsimony tree must be either equal in length to L or shorter.

55 Branch -and- Bound luck The magnitude of the search will depend on the data (i.e., luck).

56 Heuristic

57

58 Likelihood Example: Coin tossing Data: 10 tosses: 6 heads + 4 tails Hypothesis: Binomial distribution

59 LIKELIHOOD IN MOLECULAR PHYLOGENETICS The data are the aligned sequences The model is the probability of change from one character state to another (e.g., Jukes & Cantor 1-P model). The parameters to be estimated are: Topology & Branch Lengths

60

Based on “Bayes Theorem” Thomas Bayes (1701–1761) A = a proposition, a hypothesis. B = the evidence. P(A) = the prior, the initial degree of belief in A. P(A|B) = the posterior, the new degree of belief in A given B (the evidence). P(B|A)/P(B) = represents the support B provides for A. Bayesian Phylogenetics