1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Multiple Sequence Alignment & Phylogenetic Trees.
1 Dan Graur Methods of Tree Reconstruction. 2 3.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Phylogenetic Trees Lecture 4
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Phylogenies Preliminaries Distance-based methods Parsimony Methods.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
Phylogenetic reconstruction
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Terminology of phylogenetic trees
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
PHYLOGENETIC TREES Dwyane George February 24,
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetic Tree Reconstruction
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
MOLECULAR PHYLOGENETICS Four main families of molecular phylogenetic methods :  Parsimony  Distance methods  Maximum likelihood methods  Bayesian methods.
Calculating branch lengths from distances. ABC A B C----- a b c.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Doug Raiford Lesson 9.  3 Approaches  Distance  Parsimony  Maximum Likelihood  Have already seen a distance method 12/18/20152Phylogenetics Part.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Phylogenetics.
Phylogenetic Trees - Parsimony Tutorial #13
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Phylogenetic basis of systematics
Inferring a phylogeny is an estimation procedure.
Multiple Alignment and Phylogenetic Trees
Goals of Phylogenetic Analysis
Inferring phylogenetic trees: Distance and maximum likelihood methods
Phylogenetic Trees.
Lecture 7 – Algorithmic Approaches
Phylogeny.
Presentation transcript:

1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida

2 Goals Understand phylogenetic tree Learn –distance matrix based methods –maximum likelihood method –character based methods

3 What is phylogeny?

4 Phylogeny Shows the ancestral relationship between genes or organisms Infer relationship based on genotype rather than phenotype

5 Why Phylogeny? Understand history of organisms Understand how various functions evolved Multiple sequence alignment Gene function prediction

6 Phylogenetic Tree (1) Node = taxonomical unit –Leaf nodes = gene or organism –Internal node = inferred ancestor Bifurcating = two lineages Multifurcating = more than two lineages Branch = ancestral relationship

7 Phylogenetic Tree (2) Rooted = a single node is common ancestor to all Unrooted = provides no information about the direction of evolution Viruses of the family Reoviridae

8 Phylogenetic Tree (3) n = number of data Find the number of rooted trees for n = 3. Rooted => NR = (2n-3)!/2 n-2 (n-2)! Unrooted => NU = (2n-5)!/2 n-3 (n-3)! nNRNU x10 6 2x x x x x > ((1, 2), 3) 2 -> ((1, 3), 2) 1 -> ((3, 2), 1) Newick format

9 Distance Matrix Methods UPGMA (Unweighted Pair Group Method with Arithmetic mean)

10 UPGMA (1) Create a distance matrix between all pairs of taxa Iteratively do following until all taxa are merged –Merge the pair (x, y) with smallest distance d(x, y) and form xy –Set distance d(z, xy) = (d(z, x) + d(z, y))/2 for all z

11 Choose two clusters with minimum distance and combine them ABCDE A B04414 C0616 D013 E0 UPGMA (2) A BC D E

12 Update distance matrix Distance of new cluster to nodes in original clusters is half of original distance ABCDE A01197 BC0515 D013 E0 UPGMA (3) A BC D E 2 2

13 ABCDE A01197 BC0515 D013 E0 UPGMA (4) A BC D E 2 2

14 ABCDE A0107 BCD014 E0 UPGMA (5) A BC D E

15 ABCDE A0107 BCD014 E0 UPGMA (6) A BC D E

16 AEBCD AE012 BCD0 UPGMA (7) A BC D E

17 produced tree (((B, C), D), (A, E)) UPGMA (8) A BC D E ABCDE A B04414 C0616 D013 E0 Not additive (path lengths may not Indicate actual distance. E.g., C and D)

18 Other distance based methods

19 Neighbor Relation Method (1) Consider all possible arrangements Choose the one that satisfies distance relation B A C D a b e c d AC + BD = AD + BC AB + CD < AC + BD

20 Neighbor Relation Method (2) {A, B, C, D, E, F, …} {A, B, C, D} 1.AB + CD 2.AC + BD 3.AD + BC min ABCDEFGH… A B C D E F G H … (Sattath, Tversky, 1977) {A, B, C, E}... 1.AB + CE 2.AE + BC 3.AC + BE Vote UPGMA on the votes

21 Neighbor Joining Method Start with a star tree Merge pairs of nodes that minimize sum of branch lengths B A C D B A C D E E

22 Maximum Likelihood Method

23 Maximum Likelihood Method Generate all possible trees Find the likelihood of tree –Use substitution probabilities (e.g., Jukes-Cantor) Choose the tree with highest likelihood Exhaustive search. Very slow Requires computation of inferred ancestors ACGCTAFKI GCGCTAFKI ACGCTAFKL GCGCTGFKI GCGCTLFKI ASGCTAFKL ACACTAFKL A  G I  L A  G A  L C  S G  A

24 Character Based Methods

25 AAA AGAAGA AAG GGA AAA AGAAGA AGA AAA AAG GGA Parsimony (1) There are various trees that could explain the phylogeny of the following sequences: AAG, AAA, GGA, AGA Parsimony prefers the second tree because it requires the fewer substitution events

26 Parsimony (2) Multiply align sequences For each column of the alignment –Generate all possible trees –Compute the number of substitutions –Vote for the tree with the smallest number of substitutions Pick the tree with the best vote 1: G G G G G G 2: G G G A G T 3: G G A T A G 4: G A T C A T 2G 1G3A 4A 3A 1G2G 4A

27 How can we infer the ancestors? ? ? ?

28 Inferring Ancestor (1/3) ATGGA A TGG A A TGG A XY Z If X  Y =  Z = X  Y Else Z = X  Y

29 Inferring Ancestor (2/3) A A TG G GA G,A G,A,T A A TG G G A G,T G,A,T A A TG G G A G G,A XY Z If X  Y =  Z = X  Y Else Z = X  Y

30 Inferring Ancestor (3/3) A A TG G GA G,A G,A,T A A TG G G A G,T G,A,T A A TG G G A G G,A Minimum number of substitutions = # unique characters - 1

31 Branch and Bound Method 1.Find an upper bound to tree length (L) –E.g., use UPGMA 2.Start with a small tree 3.Incrementally add more branches to tree –Exclude trees with length > L

32 Branch and Bound Example BC A BC D A BD C A DC B A

33 Consensus Trees There may be many trees of the same parsimony Consensus tree summarizes them by collapsing nodes –Resulting tree may not be bifurcating Strict consensus T% majority rule consensus

34 Consensus 1: (A, ((B, (C, D)), (E, (F, G)))) 2: ((A, (C, (B, D))), (E, (F, G))) 3: ((A, (D, (B, C))), (E, (F, G))) Strict: (A, (B, C, D), (E, (F, G))) 50% : ((A, (B, C, D)), (E, (F, G)))

35 Tree Confidence Is the resulting tree reliable? Usually a confidence is computed for each part of the tree –Bootstrapping

36 Bootstrapping Given a phylogenetic tree T 1.Multiply align sequences based on T 2.Randomly select columns from the alignment (with replacement) to create a new dataset of the same size 3.Find the phylogenetic tree T’ for the subset 4.Repeat steps 2-3 many times 5.Compute the fraction of times T’ overlaps with T : G G G A G G A T C A 2: G G G A G T A T C A 3: G G A T A G A C A T 4: G A T C A T G T A T 5: G T T C A T A T C T : G G G G G G G C C C 2: G G G G G T T C C C 3: G G A A A G G A A A 4: G G T A A T T A A A 5: G G T A A T T A C C

37 Reading Assignment Krane, Chapter 4, 5 Mount, Chapter 7