CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
An Introduction to Phylogenetic Methods
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Problem Set 2 Solutions Tree Reconstruction Algorithms
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Phylogeny Tree Reconstruction
Overview of Phylogeny Artiodactyla (pigs, deer, cattle, goats, sheep, hippopotamuses, camels, etc.) Cetacea (whales, dolphins, porpoises)
Bioinformatics Algorithms and Data Structures
Phylogeny Tree Reconstruction
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Perfect Phylogeny MLE for Phylogeny Lecture 14
Phylogenetic trees Sushmita Roy BMI/CS 576
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics II.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Trees Tutorial 5. Agenda How to construct a tree using Neighbor Joining algorithm Phylogeny.fr tool Cool story of the day: Horizontal gene.
Molecular Evolution.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Calculating branch lengths from distances. ABC A B C----- a b c.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Evolutionary tree reconstruction (Chapter 10). Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships.
Evolutionary tree reconstruction
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Tutorial 5 Phylogenetic Trees.
Phylogenetic Trees - Parsimony Tutorial #13
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Phylogenetic basis of systematics
Distance based phylogenetics
CSCI2950-C Lecture 7 Molecular Evolution and Phylogeny
dij(T) - the length of a path between leaves i and j
Inferring a phylogeny is an estimation procedure.
Character-Based Phylogeny Reconstruction
Multiple Alignment and Phylogenetic Trees
BNFO 602 Phylogenetics Usman Roshan.
#30 - Phylogenetics Distance-Based Methods
Phylogeny.
Presentation transcript:

CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: University of South Carolina Department of Computer Science and Engineering HAPPY CHINESE NEW YEAR

Outline Review For Exams Data for Phylogenetic Tree inference Classification of Tree inference approaches Neighbor-joining algorithm Parsimony-based tree reconstruction Least Square Best-fit reconstruction 3/21/20162

Midterm, Midterm How to review: read slides and textbooks, especially CG book. Format of problems: examples ◦ Brief questions: what is the difference between global alignment and local alignment? ◦ calculation: build a HMM model for a multiple seq alignment ◦ Definition: blasting, Motif, ORF

Covered Topics Understand: concepts, algorithm ideas, tools ◦ Sequencing/blasting ◦ Gene finding ◦ Alignment algorithms and applications ◦ DNA motif search ◦ HMM profiles ◦ Gene prediction algorithms ◦ Promoter predictions ◦ Comparative genomics ◦ ……

A B C D E Characters Taxa A B C D E Distances Phylogenetic Reconstruction There are essentially two types of data for phylogenetic tree estimation: ◦ Distance data, usually stored in a distance matrix, e.g. DNA×DNA hybridisation data, morphometric differences, immunological data, pairwise genetic distances ◦ Character data, usually stored in a character array;  e.g. multiple sequence alignment of DNA sequences, morphological characters.

Phylogenetic Reconstruction Given the huge number of possible trees even for small data sets, we have two options: ◦ Build one according to some clustering algorithm ◦ Assign a “goodness of fit” criterion (an objective function) and find the tree(s) which optimise(s) this criterion

CS Distances Nucleotide Sites Type of Data UPGMA Neighbor-Joining Minimum Evolution Maximum Parsimony Maximum Likelihood Tree Building Method Optimality Criterion Clustering Algorithm Phylogenetic Reconstruction

Phylogenetic Methods Maximum likelihood Maximizes likelihood of observed data Many different procedures exist. Three of the most popular: Maximum parsimony Minimizes total evolutionary change Neighbor-joining Minimizes distance between nearest neighbors

Distance based tree Construction Given a set of species (leaves in a supposed tree), and distances between them – construct a phylogeny which best “fits” the distances. Orc: ACAGTGACGCCCCAAACGT Elf: ACAGTGACGCTACAAACGT Dwarf: CCTGTGACGTAACAAACGA Hobbit: CCTGTGACGTAGCAAACGA Human: CCTGTGACGTAGCAAACGA Orc Elf Dwarf Hobbit Human

Distance Matrix Given n species, we can compute the n x n distance matrix D ij D ij may be defined as the edit distance between a gene in species i and species j, where the gene of interest is sequenced for all n species. Dij can also be any other feature-based distances

Distances in Trees Edges may have weights reflecting: ◦ Number of mutations on evolutionary path from one species to another ◦ Time estimate for evolution of one species into another In a tree T, we often compute d ij (T) - the length of a path between leaves i and j

Distances in Trees Edges may have weights reflecting: ◦ Number of mutations on evolutionary path from one species to another ◦ Time estimate for evolution of one species into another In a tree T, we often compute d ij (T) - the length of a path between leaves i and j

Distance in Trees: an Exampe d 1,4 = = 68 i j

Fitting Distance Matrix Given n species, we can compute the n x n distance matrix D ij Evolution of these genes is described by a tree that we don’t know. We need an algorithm to construct a tree that best fits the distance matrix D ij

Reconstructing a 3 Leaved Tree Tree reconstruction for any 3x3 matrix is straightforward We have 3 leaves i, j, k and a center vertex c Observe: d ic + d jc = D ij d ic + d kc = D ik d jc + d kc = D jk

Reconstructing a 3 Leaved Tree d ic + d jc = D ij + d ic + d kc = D ik 2d ic + d jc + d kc = D ij + D ik 2d ic + D jk = D ij + D ik d ic = (D ij + D ik – D jk )/2 Similarly, d jc = (D ij + D jk – D ik )/2 d kc = (D ki + D kj – D ij )/2

Trees with > 3 Leaves An tree with n leaves has 2n-3 edges This means fitting a given tree to a distance matrix D requires solving a system of “n choose 2” equations with 2n-3 variables This is not always possible to solve for n > 3

Additive Distance Matrices Matrix D is ADDITIVE if there exists a tree T with d ij (T) = D ij NON-ADDITIVE otherwise

Distance Based Phylogeny Problem Goal: Reconstruct an evolutionary tree from a distance matrix Input: n x n distance matrix D ij Output: weighted tree T with n leaves fitting D If D is additive, this problem has a solution and there is a simple algorithm to solve it

Using Neighboring Leaves to Construct the Tree Find neighboring leaves i and j with parent k Remove the rows and columns of i and j Add a new row and column corresponding to k, where the distance from k to any other leaf m can be computed as: D km = (D im + D jm – D ij )/2 Compress i and j into k, iterate algorithm for rest of tree

Finding Neighboring Leaves To find neighboring leaves we simply select a pair of closest leaves.

Finding Neighboring Leaves To find neighboring leaves we simply select a pair of closest leaves. WRONG

Finding Neighboring Leaves Closest leaves aren’t necessarily neighbors i and j are neighbors, but (d ij = 13) > (d jk = 12) Finding a pair of neighboring leaves is a nontrivial problem!

Neighbor Finding: Seitou & Nei algorithm (1987) Theorem (Saitou & Nei) Assume all edge weights are positive. If D(i,j) is minimal (among all pairs of leaves), then i and j are neighboring leaves in the tree. Definitions

Neighbor Joining Algorithm In 1987 Naruya Saitou and Masatoshi Nei developed a neighbor joining algorithm for phylogenetic tree reconstruction Finds a pair of leaves that are close to each other but far from other leaves: implicitly finds a pair of neighboring leaves Advantages: works well for additive and other non-additive matrices, it does not have the flawed molecular clock assumption

Neighbor-joining Guaranteed to produce the correct tree if distance is additive May produce a good tree even when distance is not additive Step 1: Finding neighboring leaves Define D ij = d ij – (r i + r j ) Where 1 r i = –––––  k d ik |L|

Algorithm: Neighbor-joining Initialization: Define T to be the set of leaf nodes, one per sequence Let L = T Iteration: Pick i, j s.t. D ij is minimal Define a new node k, and set d km = ½ (d im + d jm – d ij ) for all m  L Add k to T, with edges of lengths d ik = ½ (d ij + r i – r j ) Remove i, j from L; Add k to L Termination: When L consists of two nodes, i, j, and the edge between them of length d ij

Rooting a tree, and definition of outgroup Neighbor-joining produces an unrooted tree How do we root a tree between N species using n-j? An outgroup is a species that we know to be more distantly related to all remaining species, than they are to one another Example: Human, mouse, rat, pig, dog, chicken, whale Which one is an outgroup? Outgroup can act as a root

Neighbor Joining Algorithm-Widely Used Applicable to matrices which are not additive Known to work good in practice The algorithm and its variants are the most widely used distance-based algorithms today.

Maximum Parsimony Method for Tree Inference A Character-based method Input: h sequences (one per species), all of length k. Goal: Find a tree with the input sequences at its leaves, and an assignment of sequences to internal nodes, such that the total number of substitutions is minimized. Two sub-problems: 1.Find the parsimony cost of a given tree (easy) 2.Search through all tree topologies (hard)

Example Input: four nucleotide sequences: AAG, AAA, GGA, AGA taken from four species. AGA AAA GGA AAG AAA Total #substitutions = 4 By the parsimony principle, we seek a tree that has a minimum total number of substitutions of symbols between species and their originator in the phylogenetic tree. Here is one possible tree.

Least Squares Distance Phylogeny Problem If the distance matrix D is NOT additive, then we look for a tree T that approximates D the best: Squared Error : ∑ i,j (d ij (T) – D ij ) 2 Squared Error is a measure of the quality of the fit between distance matrix and the tree: we want to minimize it. Least Squares Distance Phylogeny Problem: finding the best approximation tree T for a non- additive matrix D (NP-hard).

Search through tree topologies: Branch and Bound Observation: adding an edge to an existing tree can only increase the parsimony cost Enumerate all unrooted trees with at most n leaves: [i 3 ][i 5 ][i 7 ]……[i 2N–5] ] where each i k can take values from 0 (no edge) to k At each point keep C = smallest cost so far for a complete tree Start B&B with tree [1][0][0]……[0] Whenever cost of current tree T is > C, then: ◦ T is not optimal ◦ Any tree with more edges containing T, is not optimal: Increment by 1 the rightmost nonzero counter

Comparison of Methods Neighbor-joiningMaximum parsimonyMaximum likelihood Very fastSlowVery slow Easily trapped in local optima Assumptions fail when evolution is rapid Highly dependent on assumed evolution model Good for generating tentative tree, or choosing among multiple trees Best option when tractable (<30 taxa, strong conservation) Good for very small data sets and for testing trees built using other methods

Summary Category of phylogenetic inference algorithms Neighbor-joining algorithm

Acknowledgement Anonymous authors