Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.

Slides:



Advertisements
Similar presentations
Maximum Likelihood:Phylogeny Estimation Neelima Lingareddy.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
An Introduction to Phylogenetic Methods
Phylogenetic Trees Lecture 4
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic reconstruction
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Lecture 24 Inferring molecular phylogeny Distance methods
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
CISC667, F05, Lec16, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (III) Probabilistic methods.
The concept of likelihood refers given some data D, a decision must be made about an adequate explanation of the data. In the phylogenetic framework, one.
Probabilistic methods for phylogenetic trees (Part 2)
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
BINF6201/8201 Molecular phylogenetic methods
Molecular phylogenetics
Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.
.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.
Tree Inference Methods
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Lecture 2: Principles of Phylogenetics
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Lecture 10 – Models of DNA Sequence Evolution Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Statistical stuff: models, methods, and performance issues CS 394C September 16, 2013.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Introduction to Bioinformatics Resources for DNA Barcoding
Phylogenetic basis of systematics
Distance based phylogenetics
Lecture 6B – Optimality Criteria: ML & ME
The ideal approach is simultaneous alignment and tree estimation.
Maximum likelihood (ML) method
Inferring phylogenetic trees: Distance and maximum likelihood methods
Molecular Evolution.
Why Models of Sequence Evolution Matter
Lecture 6B – Optimality Criteria: ML & ME
The Most General Markov Substitution Model on an Unrooted Tree
Phylogenetic analysis of AquK2P.
Presentation transcript:

Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success in single trial, what is probability of h success over n trials?

The ML question The ML calculation: L(p|nh) = C(h, n) * p h * (1-p )(n-h) What is probability that parameter p results in h success over n trials? Experiment with test values of p and choose the one that results in highest likelihood

Consider a small alignment AG AC TG Let #sequences = s = 3 Each position a data point For each position, 4 s possible values, e.g. {A,A,A},{A,A,T}… In this example, 64 possible values each position.

Probability/Likelihood function The simplest model – use an arbitrary p for each of the 64 possible values based on its observed freq. 2 patterns have p=0.5, all others p=0. Result “works” but is not biologically interesting.

Maximum likelihood testing model

Definition Method for the inference of phylogeny Method that searches for the tree with the highest probability or likelihood.

Example going through the Maximum likelihood model Assume that we have the aligned nucleotide sequences for four taxa: (1) A G G C U C C A A....A (2) A G G U U C G A A....A (3) A G C C C A G A A.... A (4) A U U U C G G A A.... C Evaluate the likelihood of the uprooted tree represented by the nucleotides of site j in the sequence

Since the likelihood of the tree is independent of the position of the root, we can display the figure as shown in Figure B. Assume that the nucleotides evolve independently (the Markovian model of evolution) Calculate the likelihood for each site separately and combine the likelihood into a total value towards the end.. To calculate the likelihood for site j, we have to consider all the possible scenarios by which the nucleotides present at the tips of the tree could have evolved. Therefore the likelihood for a particular site is the summation of the probabilities of every possible reconstruction of ancestral states, given some model of base substitution.

So in this specific case all possible nucleotides A, G, C, and T occupying nodes (5) and (6), or 4 x 4 = 16 possibilities: Protein sequences each site may occupy 20 states (that of the 20 amino acids) 20x20 thus 400 possibilities have to be considered. Since any one of these scenarios could have led to the nucleotide configuration at the tip of the tree, we must calculate the probability of each and sum them to obtain the total probability for each site j.

The likelihood for the full tree then is product of the likelihood at each site. Since the individual likelihoods are extremely small numbers it is convenient to sum the log likelihoods at each site and report the likelihood of the entire tree as the log likelihood.

This above procedure is then repeated for all possible topologies (for all possible trees). The tree with the highest probability is the tree with the highest maximum likelihood.

Hulsenbeck J., Crandall, K. Annu. Rev. Ecol. Syst., 1997, 28:

DNA Substitution Models

General DNA Substitution Model Likelihood L is the propability of observing data D given hypothesis H L = Pr(D/H) The use of maximum likelihood (ML) algorithms in developing phylogenetic hypotheses requires a model of evolution.

The rate matrix for a general model of DNA substitution is given by The rows and columns are ordered A, C, G and T. The matrix gives the rate of change from nucleotide i(arranged along the rows) to nucleotide j(along the columns). For example r2pC gives the rate of change from A to C.

Let P(v,s) be the transition probability matrix where p i,j (v,s) is the probability that nucleotide i changes into j over branch length v. The vector s contains the parameters of the substitution model(eg. pA, pC, pG, pT, r1,r2…). For two-state case, to calculate the probability of observing a change over a branch of length v, the following matrix calculation is performed: P (v,s) = e Qv

DNA substitution Models

Advantages of Maximum likelihood Lower variance than other methods Least affected by sampling error Robust to many violations of the assumptions of the evolutionary model, even with very short sequences, they outperform other methods). Are less error prone. Statistically well founded. Evaluate different tree topologies.

Disadvantages of Maximum likelihood CPU intensive and may take a long time to complete an evaluation The result is dependent on the model of evolution used.