Lecture 6B – Optimality Criteria: ML & ME

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.
Phylogenetic Trees Lecture 4
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic trees Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Chapter 2.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
CISC667, F05, Lec16, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (III) Probabilistic methods.
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Available at DNA variation in Ecology and Evolution DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Calculating branch lengths from distances. ABC A B C----- a b c.
Lecture 10 – Models of DNA Sequence Evolution Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogenetic Trees - Parsimony Tutorial #13
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Phylogenetic basis of systematics
Lecture 10 – Models of DNA Sequence Evolution
Inferring a phylogeny is an estimation procedure.
Lecture 6A – Introduction to Trees & Optimality Criteria
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
Goals of Phylogenetic Analysis
Recitation 5 2/4/09 ML in Phylogeny
Inferring phylogenetic trees: Distance and maximum likelihood methods
CS 581 Tandy Warnow.
Why Models of Sequence Evolution Matter
Lecture 6B – Optimality Criteria: ML & ME
BNFO 602 Phylogenetics – maximum likelihood
BNFO 602 Phylogenetics Usman Roshan.
The Most General Markov Substitution Model on an Unrooted Tree
Lecture 7 – Algorithmic Approaches
Lecture 10 – Models of DNA Sequence Evolution
Phylogeny.
Molecular data assisted morphological analyses
Lecture 6A – Introduction to Trees & Optimality Criteria
10.4 How to Construct a Cladogram
Presentation transcript:

Lecture 6B – Optimality Criteria: ML & ME LH = Pr (Data | Hypothesis) = P (D | H, M) L(t) = P (D | t, m) Just as in parsimony, we assume independence of characters. Single-site Likelihood

given the tree and model of evolution. Single-site Likelihood – Probability of a single column in the alignment, given the tree and model of evolution. Note, we’re indexing branches, labelled vx,y,, and we’re interested in their lengths (Units are substitutions per site, a function of rate x time). To calculate the SSL for this site, we sum the probabilities for all possible character-state reconstructions at each internal node. There are n-1 internal nodes, each of which could have one of 4 possible states. There are 4n-1 unique reconstructions: 4n-1 = 43 = 64.

Calculating Single-site Likelihoods (Sum probabilities of all reconstructions) r1 r2 r3 r64 or So now, the issue becomes how we calculate the Pr( Rr|t ).

Calculating Reconstruction Probabilities In reconstruction r, m is the state at node 3, k is the state at node 1, l is the state at node 2 Pr (Rr | t ) = pm x Pm,k(v3,1) x Pk,G(v1,w) x Pk,A(v1,x) x Pm,l(v3,2) x Pl,C(v2,y) x Pl,C(v2,z) pm is the frequency of the nucleotide m. This provides an estimate of the probability of observing state m at the root node. Pi,j is the probability of substitution between states i & j. This is derived the model of sequence evolution that we assume (much more on those in a few weeks). This is in some sense analogous to the step matrix that determines costs for transformations between states in the Sankoff algorithm in parsimony.

Calculating Reconstruction Probabilities Start at the root & traverse the tree Pr (Rr | t ) = pm x Pm,k(v3,1) x Pk,G(v1,w) x Pk,A(v1,x) x Pm,l(v3,2) x Pl,C(v2,y) x Pl,C(v2,z) So the SSL would be the sum of all possible reconstructions. Each summation is across all four nucleotides. Branch lengths (vi,j) are parameters to be optimized.

A few points to note: Site patterns The frequency of site pattern a. 1 A G T A C A . . . . . . . . . . . . . . . 2 A G T A . . . . . . . . . . . . . . . . . 3 A G T A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . n A G T A . . . . . . . . . . . . . . . . . Pattern 1 2 3 1 . . . . . . . . . . . . . . . .(a) Site patterns The frequency of site pattern a.

C. Minimum Evolution

Additivity Additivity: dAB = pAB = v1 + v2 , dAC = pAC = v1 + v3 + v4 , dAD = pAD = v1 + v3 + vd , dBC = pBC = v2 + v3 + v4 , dBD = pBD = v2 + v3 + v5 , dCD = pCD = v4 + v5 a usually equals 2 (so this is a least-squares method). wij allows weighting of the piar-wise error terms. wij = 1 assumes that the errors are identical across all dij. wij = 1/dij assumes that errors are proportional to dij. wij = 1/d2ij assumes that errors are proportional to the square root of dij. wij = 1/s2ij weights by the expected variance in the dij (which may not be known).

Relationships among Optimality Criteria Both MP and ML are character based (whereas ME is not). Both ME and MP minimize the amount of evolution (i.e., sum of branch lengths). Both ME and ML rely on an explicit model of sequence evolution.