1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.

Slides:



Advertisements
Similar presentations
Introduction to Molecular Evolution
Advertisements

Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
An Introduction to Phylogenetic Methods
1 Dan Graur Methods of Tree Reconstruction. 2 3.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Phylogenetic Trees Lecture 4
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic reconstruction
IE68 - Biological databases Phylogenetic analysis
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Maximum Parsimony.
Lecture 24 Inferring molecular phylogeny Distance methods
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Maximum parsimony Kai Müller.
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Terminology of phylogenetic trees
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Tree Inference Methods
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Fixations along phylogenetic lineages. Phylogenetic reconstruction: a simplification of the evolutionary process.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
A brief introduction to phylogenetics
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Lecture 2: Principles of Phylogenetics
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
Phylogenetic Trees - Parsimony Tutorial #13
Construcción de cladogramas y Reconstrucción Filogenética
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each.
Inferring phylogenetic trees: Distance methods
Inferring a phylogeny is an estimation procedure.
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
Goals of Phylogenetic Analysis
Patterns in Evolution I. Phylogenetic
Inferring phylogenetic trees: Distance and maximum likelihood methods
Lecture 7 – Algorithmic Approaches
Phylogeny.
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

1 Dan Graur Molecular Phylogenetics

2

3

4

5

6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state (based on character states) 3. maximum likelihood (based on both character states and distances)

7 DISTANCE-MATRIX METHODS In the distance matrix methods, evolutionary distances (usually the number of nucleotide substitutions or amino-acid replacements between two taxonomic units) are computed for all pairs of taxa, and a phylogenetic tree is constructed by using an algorithm based on some functional relationships among the distance values.

8 Multiple Alignment

9 * Distance Matrix* * *Units: Numbers of nucleotide substitutions per 1,000 nucleotide sites

10 Distance Methods: UPGMA Neighbor-relations Neighbor joining

11 UPGMA Unweighted pair-group method with arithmetic means

12 UPGMA employs a sequential clustering algorithm, in which local topological relationships are identified in order of decreased similarity, and the tree is built in a stepwise manner.

13 simple OTUs

14 composite OTU

15

16

17 UPGMA only works if the distances are strictly ultrametric.

18 Neighborliness methods The neighbors-relation method (Sattath & Tversky) The neighbor-joining method (Saitou & Nei)

19 neighbors In an unrooted bifurcating tree, two OTUs are said to be neighbors if they are connected through a single internal node.

20 If we combine OTUs A and B into one composite OTU, then the composite OTU (AB) and the simple OTU C become neighbors.

21 A B C D + < + = + Four-Point Condition

22

23

24 In distance-matrix methods, it is assumed: SimilarityKinship Similarity  Kinship

25 From Similarity to Relationship Similarity = Relationship, only if genetic distances increase with divergence times (monotonic distances).

26 Similarities among OTUs can be due to: Ancestry: –Shared ancestral characters (plesiomorphies) –Shared derived characters (synapomorphy) Homoplasy: –Convergent events –Parallel events –Reversals From Similarity to Relationship

27

28 Parsimony Methods: Willi Hennig

29 Occam’s razor “Pluralitas non est ponenda sine neccesitate.” (Plurality should not be posited without necessity.) William of Occam or Ockham (ca ) English philosopher & Franciscan monk Excommunicated by Pope John XXII in Officially rehabilitated by Pope Innocent VI in 1359.

30 MAXIMUM PARSIMONY METHODS Maximum parsimony involves the identification of a topology that requires the smallest number of evolutionary changes to explain the observed differences among the OTUs under study. In maximum parsimony methods, we use discrete character states, and the shortest pathway leading to these character states is chosen as the best or maximum parsimony tree. Often two or more trees with the same minimum number of changes are found, so that no unique tree can be inferred. Such trees are said to be equally parsimonious.

31 invariant

32 variant

33 uninformative

34 informative

35

36

37

38

39 Inferring the maximum parsimony tree: 1. Identify all the informative sites. 2. For each possible tree, calculate the minimum number of substitutions at each informative site. 3. Sum up the number of changes over all the informative sites for each possible tree. 4. Choose the tree associated with the smallest number of changes as the maximum parsimony tree.

40 In the case of four OTUs, an informative site can only favor one of the three possible alternative trees. Thus, the tree supported by the largest number of informative sites is the most parsimonious tree.

41 With more than 4 OTUs, an informative site may favor more than one tree, and the maximum parsimony tree may not necessarily be the one supported by the largest number of informative sites.

42 The informative sites that support the internal branches in the inferred tree are deemed to be synapomorphies. All other informative sites are deemed to be homoplasies.

43

44 Parsimony is based solely on synapomorphies

45

46 Variants of Parsimony Wagner-Fitch: Unordered. Character state changes are symmetric and can occur as often as neccesary. Camin-Sokal: Complete irreversibility. Dollo: Partial irreversibility. Once a derived character is lost, it cannot be regained. Weighted: Some changes are more likely than others. Transversion: A type of weighted parsimony, in which transitions are ignored.

47 Fitch’s (1971) method for inferring nucleotides at internal nodes

48 Fitch’s (1971) method for inferring nucleotides at internal nodes The set at an internal node is the intersection (  ) of the two sets at its immediate descendant nodes if the intersection is not empty. The set at an internal node is the union (  of the two sets at its immediate descendant nodes if the intersection is empty. When a union is required to form a nodal set, a nucleotide substitution at this position must be assumed to have occurred. number of unions = minimum number of substitutions

49 Fitch’s (1971) method for inferring nucleotides at internal nodes 4 substitutions 3 substitutions

50

51 total number of substitutions in a tree = tree length

52 Searching for the maximum-parsimony tree

53 all best Exhaustive = Examine all trees, get the best tree (guaranteed). some best Branch-and-Bound = Examine some trees, get the best tree (guaranteed). may or may not be the best Heuristic = Examine some trees, get a tree that may or may not be the best tree.

54 Exhaustive Descendant trees of tree 2 Ascendant tree 2

55 Branch -and- Bound

56 Branch -and- Bound Obtain a tree by a fast method. (e.g., the neighbor-joining method) Compute minimum number of substitutions (L). Turn L into an upper bound value. Rationale: (1) the maximum parsimony tree must be either equal in length to L or shorter. (2) A descendant tree is either equal in length or longer than the ascendant tree.

57 Branch -and- Bound

58 Heuristic

59

60

61 Likelihood Example: Coin tossing Data: Outcome of 10 tosses: 6 heads + 4 tails Hypothesis: Binomial distribution

62 LIKELIHOOD IN MOLECULAR PHYLOGENETICS The data are the aligned sequences The model is the probability of change from one character state to another (e.g., Jukes & Cantor 1-P model). The parameters to be estimated are: Topology & Branch Lengths

63

64 Background: Maximum Likelihood How to calculate ML score for a tree : 1... j......N Seq x: C...GGACGTTTA...C Seq y: C...AGATCTCTA...C

65 Background: Maximum Likelihood Calculate likelihood for a single site j given tree : A B C R: root where