CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods: Models of Evolution Anders Gorm Pedersen Molecular Evolution Group Center for Biological.

Slides:



Advertisements
Similar presentations
Introduction to molecular dating methods. Principles Ultrametricity: All descendants of any node are equidistant from that node For extant species, branches,
Advertisements

Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Phylogenetic Trees Lecture 4
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis
What is the probability that of 10 newborn babies at least 7 are boys? p(girl) = p(boy) = 0.5 Lecture 10 Important statistical distributions Bernoulli.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Brief Introduction to the Theory of Evolution Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
We have shown that: To see what this means in the long run let α=.001 and graph p:
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Consensus Trees Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Phylogenetic trees Sushmita Roy BMI/CS 576
1 Additive Distances Between DNA Sequences MPI, June 2012.
Phylogenetic Analysis Unit 16 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.
Lecture 3: Markov models of sequence evolution Alexei Drummond.
Tree Inference Methods
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
MOLECULAR PHYLOGENETICS Four main families of molecular phylogenetic methods :  Parsimony  Distance methods  Maximum likelihood methods  Bayesian methods.
Calculating branch lengths from distances. ABC A B C----- a b c.
Lecture 10 – Models of DNA Sequence Evolution Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
MAT 4830 Mathematical Modeling
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis
Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Evolutionary Change in Sequences
Lecture 14 CS5661 Neighbor Joining Generates unrooted tree, allowing for unequal branches Given: Distance matrix for sequences Steps: Repeat 1-3 till all.
Lecture 6B – Optimality Criteria: ML & ME
Inferring a phylogeny is an estimation procedure.
Maximum likelihood (ML) method
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Distances.
Goals of Phylogenetic Analysis
Inferring phylogenetic trees: Distance and maximum likelihood methods
Why Models of Sequence Evolution Matter
Lecture 6B – Optimality Criteria: ML & ME
Phylogenetic tree based on 16S rRNA gene sequence comparisons over 1,260 aligned bases showing the relationship between species of the genus Actinomyces.
Phylogenetic tree based on predominant 16S rRNA gene sequences obtained by C4–V8 Sutterella PCR from AUT-GI patients, Sutterella species isolates, and.
Presentation transcript:

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods: Models of Evolution Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods 1.Construct multiple alignment of sequences 2.Construct table listing all pairwise differences (distance matrix) 3.Construct tree from pairwise distances Gorilla : ACGTCGTA Human : ACGTTCCT Chimpanzee: ACGTTTCG GoHuCh Go-44 Hu-2 Ch- Go Hu Ch

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Optimal Branch Lengths: Least Squares Fit between given tree and observed distances can be expressed as “sum of squared differences”:Fit between given tree and observed distances can be expressed as “sum of squared differences”: Q =  (D ij - d ij ) 2 Q =  (D ij - d ij ) 2 Find branch lengths that minimize Q - this is the optimal set of branch lengths for this tree.Find branch lengths that minimize Q - this is the optimal set of branch lengths for this tree. S1 S3 S2 S4 a b c d e Distance along tree D 12  d 12 = a + b + c D 13  d 13 = a + d D 14  d 14 = a + b + e D 23  d 23 = d + b + c D 24  d 24 = c + e D 34  d 34 = d + b + e Goal: j>i

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Superimposed Substitutions Actual number ofActual number of evolutionary events:5 Observed number ofObserved number of differences:2 Distance is (almost) always underestimatedDistance is (almost) always underestimated ACGGTGC C T GCGGTGA

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model-based correction for superimposed substitutions Goal: try to infer the real number of evolutionary events (the real distance) based onGoal: try to infer the real number of evolutionary events (the real distance) based on 1. Observed data (sequence alignment) 2. A model of how evolution occurs

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Jukes and Cantor Model Four nucleotides assumed to be equally frequent (f=0.25)Four nucleotides assumed to be equally frequent (f=0.25) All 12 substitution rates assumed to be equalAll 12 substitution rates assumed to be equal Under this model the corrected distance is:Under this model the corrected distance is: D JC = x ln( x D OBS ) For instance:For instance: D OBS =0.43 => D JC =0.64 ACGT A -3     C    G    T   

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Other models of evolution

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS General Time Reversible Model Time-reversibility: The amount of change from state x to y is equal to the amount of change from y to x π A x P AG = π G x P GA => π A x π G x  = π G x π A x 