Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Doug Brutlag Professor.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Multiple Sequence Alignment & Phylogenetic Trees.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.
Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
IE68 - Biological databases Phylogenetic analysis
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
© Wiley Publishing All Rights Reserved. Phylogeny.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Bioinformatics Algorithms and Data Structures
Phylogenetic Trees: Assumptions All existing species have a common ancestor Each species is descended from a single ancestor Each speciation gives rise.
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
The Tree of Life From Ernst Haeckel, 1891.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
07/05/2004 Evolution/Phylogeny Introduction to Bioinformatics MNW2.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Terminology of phylogenetic trees
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetic Tree Reconstruction
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Evolutionary tree reconstruction
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Tutorial 5 Phylogenetic Trees.
Phylogenetics.
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Part 9 Phylogenetic Trees
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Evolutionary genomics can now be applied beyond ‘model’ organisms
Phylogenetic basis of systematics
Phylogenetic Inference
Multiple Alignment and Phylogenetic Trees
The Tree of Life From Ernst Haeckel, 1891.
Inferring phylogenetic trees: Distance and maximum likelihood methods
Phylogenetic Trees.
Phylogeny.
Presentation transcript:

Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Phylogenies

Cladogram Representation of Phylogenies A B C D E F G H I J L M N O P Q Years x Years x

Dendrogram Representation of Phylogenies Substitutions per 100 Residues GrowTree Phylogram February 1, 2010

Cladogram

Phenogram

Curve-O-Gram

Eurogram

RadialGram

Methods for Determining Phylogenies Parsimony (character based) o Assigns mutations to branches o Minimize number of edits o Topology maximizes similarity of neighboring leaves Distance methods o Branch lengths = D(i,j)/2 for sequences i, j o Distances must be at least metric o Distances can reflect time or edits o Distance must be relatively constant per unit branch length A B C D E F G H I J L M N O P Q

Methods for Determining Phylogenies Parsimony o Minimum mutation (Fitch, PAUP) o Minimal length encoding Probabilistic o Branch and Bound o Maximum likelihood Distance methods o Ultrametric Trees o Additive Trees o UPGMA o Neighbor Joining A B C D E F G H I J L M N O P Q

Properties of Trees Rooted or Unrooted Nodes and Branches o Internal Nodes o External Nodes - leaves Operational Taxonomic Units Outgroups Topology One path/pair Distances X Y Z R X3 4 5 Z Y R2 21 X Z R1 R Y

Orthologous Evolution

Paralogous Evolution hemoglobins

Challenges Making Trees: Gene Duplication versus Speciation

Orthology and Paralogy HB Human WB Worm HA1 Human HA2 Human Yeast WA Worm Thanks to Seraphim Batzoglou Orthologs: Derived by speciation Paralogs: Gene Duplications Orthologs: Derived by speciation Paralogs: Gene Duplications

Challenges Making Trees: Gene Conversion A A T C G C G A T A G C A T C A A T T C C C T C Thanks to Maryellen Ruvolo

A A T C G C G A T A G C A T CC G C G A T C TC A T C A A T T C C C T C Challenges Making Trees: Gene Conversion Thanks to Maryellen Ruvolo

A B C D Gene N Challenges Making Trees: Gene Conversion Gene M Thanks to Maryellen Ruvolo Orthologs: Derived by speciation Paralogs: Gene Duplications Orthologs: Derived by speciation Paralogs: Gene Duplications

A N B N C M C N D N A M B M D M Challenges Making Trees: C M Has Been Converted from C N Thanks to Maryellen Ruvolo Orthologs: Derived by speciation Paralogs: Gene Duplications Orthologs: Derived by speciation Paralogs: Gene Duplications Gene N Gene M

Consensus CG/LH Tree Thanks to Maryellen Ruvolo

Gene conversion between 1 st & 2nd exons of LH, CG2 Genes LH Gen e CG2 Gene 168 nt 15nt No ConversionConversion ThankThank Thanks to Maryellen Ruvolo

Challenges Making Trees: Varying Rates of Mutation

Challenges Making Trees: Horizontal Gene Transfer

Maximum Ultrametric Distance Trees Matrix D is ultrametric for tree T if: o If D is a symmetric n by n matrix of distances o T contains n leaves, one from each row or column o Each node of T labeled by one entry from D o Numbers from root to leaves strictly decrease o For any two leaves i, j, D(i,j) labels nearest common ancestor of i and j in tree Matrix DTree T

Maximum Ultrametric Distance Trees A symmetric matrix D is ultrametric if and only if for every three leaves i, j, and k, there is a tie for the maximum distance between D(i,j), D(i,k) and D(j,k). U V IJ K

Additive Distance Trees Matrix D Tree T A B C D

Distance Metrics Obey the Triangle Inequality D(i,j) ≤ D(i,k) +D(j,k) for all i, j, k (Max Score - Smith-Waterman Score) is a Metric if o If Gap-penalty ≥ 1+ Gap-size/(n-1) o Assuming match = 1 and mismatch = -1

Three Leaf Tree Observe D 1,2 D 1,3 D 2,3 Calculate L 1,A L 2,A L 3,A A L1,A L2,A L3,A

Three Leaf Tree Observe D 1,2 D 1,3 D 2,3 Calculate L 1,A L 2,A L 3,A D1,2=L1,A+L2,A D1,3=L1,A+L3,A D2,3=L2,A+L3,A A L1,A L2,A L3,A

Solution to Three Species Tree A L1,A L2,A L3,A

Four Species Tree Calculate L 1,A L 2,A L 3,B L 4,B, L A,B Observe D 1,2 D 1,3 D 1,4 D 2,3 D 2,4 D 3, L1,A L2,A A,B L3,A L4,A A B

Four Species Topology Label species 1, 2, 3, and 4 so that: D(1,2) + D(3,4) ≤ D(1,3) + D(2,4) = D(1,4) + D(2,3) A 4 B L1,A L2,A LA,B L3,A L4,A

Solution for Four Species L1,A = 1/4*(D1,3 + D1,4 - D2,3 -D2,4) + 1/2*D1,2 L2,A = 1/4*(D2,3 + D2,4 - D1,3 - D1,4) + 1/2*D1,2 LB,3 = 1/4*(D1,3 + D2,3 - D1,4 - D2,4) + 1/2*D3,4 LB,4 = 1/4*(D1,4 + D2,4 - D1,3 - D2,3) + 1/2*D3,4 LA,B = 1/4*(D1,3 + D1,4 + D2,3 + D2,4) - 1/2*(D1,2 + D3,4) A 4 B L1,A L2,A LA,B L3,A L4,A

Four Species =>Three Topologies A 4 B A 4 B 1 2 A 3 4

Species, Distances, Branches & Topologies

Number of Topologies for n Species

UPGMA: Unweighted Pair Group Method with Arithmetic Average Where D1,(34) = (D1,3+D1,4)/2 and D2,(34) = (D2,3+D2,4)/

UPGMA Dendrogram

UPGMA Clustering

Neighbor Joining Method XY i j n n-2 n-1 n n-2 n-3 n-4 n

Nearest Neighbor Dendrogram A B C D E R W X Y Z

New Hampshire Standard Tree

SeqWeb GrowTree Program

GrowTree Parameters

GrowTree Distances

GrowTree Phylogram (UPGMA)

GrowTree Alignment

GrowTree Neighbor Joining Tree

GrowTree VegF Input

GrowTree VegF Neighbor Joining Tree

VegF Growth Factors

GrowTree VegF UPGMA Tree

GrowTree VegF Alignment