. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

. Inference and Parameter Estimation in HMM Lecture 11 Computational Genomics © Shlomo Moran, Ydo Wexler, Dan Geiger (Technion) modified by Benny Chor.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Probabilistic Modeling of Molecular Evolution Using Excel, AgentSheets, and R Jeff Krause (Shodor)
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Phylogenetic Trees Lecture 4
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
Phylogeny Tree Reconstruction
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Realistic evolutionary models Marjolijn Elsinga & Lars Hemel.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
Phylogeny Tree Reconstruction
Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.
. Computational Genomics Lecture #3a (revised 24/3/09) This class has been edited from Nir Friedman’s lecture which is available at
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
CISC667, F05, Lec16, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (III) Probabilistic methods.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Perfect Phylogeny MLE for Phylogeny Lecture 14
Estimating Evolutionary Distances from DNA Sequences Lecture 14 ©Shlomo Moran, parts based on Ilan Gronau.
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.
Lecture 3: Markov models of sequence evolution Alexei Drummond.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Hidden Markov Models BMI/CS 776 Mark Craven March 2002.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Calculating branch lengths from distances. ABC A B C----- a b c.
Lecture 10 – Models of DNA Sequence Evolution Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Statistical stuff: models, methods, and performance issues CS 394C September 16, 2013.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Phylogeny Ch. 7 & 8.
Phylogenetic Trees - Parsimony Tutorial #13
MODELLING EVOLUTION TERESA NEEMAN STATISTICAL CONSULTING UNIT ANU.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
Hidden Markov Models BMI/CS 576
An Equivalence of Maximum Parsimony and Maximum Likelihood revisited
Goals of Phylogenetic Analysis
Recitation 5 2/4/09 ML in Phylogeny
Comparative RNA Structural Analysis
CS 581 Tandy Warnow.
CS 581 Tandy Warnow.
The Most General Markov Substitution Model on an Unrooted Tree
Phylogeny.
Presentation transcript:

. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken from Dan Geiger, modified by Benny Chor. Background reading: Durbin et al Chapter 8.

2 Our Probabilistic Model (Reminder) Now we don’t know the states at internal node(s), nor the edge parameters pe1, pe2, pe3 XXYXY YXYXX YYYYX pe1pe1 pe2pe2 pe3pe3 A single edge is a fairly boring tree… ?????

3 Maximum Likelihood Maximize likelihood (over edge parameters), while averaging over states of unknown, internal node(s). XXYXY YXYXX YYYYX pe1pe1 pe2pe2 pe3pe3 ?????

4 Maximum Likelihood (2) Consider the phylogenetic tree to be a stochastic process. XYX YYX XXX XXY XXX XYX XXX The probability of transition from character a to character b along edge e is given by parameters p e. Given the complete tree, the likelihood of data is determined by the values of the p e ‘s. Observed Unobserved

5 Maximum Likelihood (3) We assume each site evolves independently of the others. X Y X X This allows us to decompose the likelihood of the data (sequences at leaves) to the product of each site, given the (same) tree and edge probabilities. This is the first key to an efficient DP algorithm for the tiny ML problem. (Felsenstein, 1981). Will now show how Pr(D (i) |Tree,  ) is efficiently computed. Y Y X X X X X Y Pr(D|Tree,  )=  i Pr(D (i) |Tree,  )

6 X p1p1 p2p2 tree 1 tree 2 Let T be a binary tree with subtrees T 1 and T 2. Let L x (D | T,  ) be the likelihood of T with X at T’s root. Define L Y (D | T,  ) similarly. Computing the Likelihood Y

7 By the definition of likelihood (sum over internal assignments), L(D | T,  ) = L x (D | T,  ) + L Y (D | T,  ) This is the second key to an efficient DP algorithm for the tiny ML problem. (Felsenstein, 1981) Computing the Likelihood (2) X p1p1 p2p2 tree 1 tree 2 Y

8 Computing L x (D | Tree,  ) X p1p1 p2p2 X Y X Y tree 1 tree 2

9 Computing L x (D | Tree,  ) X p1p1 p2p2 X Y X Y tree 1 tree 2 L x (D | Tree,  ) = ( L x (D | Tree 1,  )(1- p 1 )+ L Y (D | Tree 1,  ) p 1 ) * ( L x (D | Tree 2,  )(1- p 2 )+ L Y (D | Tree 2,  ) p 2 )

10 The Dynamic Programming Algorithm X p1p1 p2p2 X Y X Y tree 1 tree 2 The algorithm starts from the leaves and proceeds up towards the root. For each sub-tree visited, keep both L x (D | sub-tree,  ) and L Y (D | sub-tree,  ). This enables computing L x and L Y likelihoods w.r.t T using 5 multiplications and 2 additions.

11 The Dynamic Programming Algorithm X p1p1 p2p2 X Y X Y tree 1 tree 2 The algorithm thus takes O(1) floating point operations per internal node of the tree. If there are n leaves, the number of internal nodes is n-1, so overall complexity is O(n).

12 What About Initialization? X p1p1 p2p2 X Y X Y tree 1 tree 2 Well, this is easy. If T is a leaf that contains X, then L x (D | T,  ) = 1, and L x (D | T,  ) = 0. ( the case where T is a leaf that contains Y is left as a bonus assignment )

13 A Few More Question Marks X p1p1 p2p2 X Y X Y tree 1 tree 2 What if tree is not binary? Would it not effect complexity… What if tree unrooted? Can show symmetry of substitution probabilities implies likelihood invariant under choice of roots. Numerical questions (underflow, stability). Non binary alphabet.

14 From Two to Four States Model Maximize likelihood (over edge parameters), while averaging over states of unknown, internal node(s). But what do the edge probabilities mean now? ACCGT AAGTT CGGCT pe1pe1 pe2pe2 pe3pe3 ?????

15 From Two to Four States Model (2) u So far, our models consisted of a “regular” tree, where in addition, edges are assigned substituion probabilities. u For simplicity, assumed our “DNA” has only two states, say X and Y. u If edge e is assigned probability p e, this means that the probability of substitution (X Y) across e is p e. u Now a single p e can no longer express all 16-4=12 possible substitution probabilities.

16 From Two to Four States Model u Now a single p e can no longer express all 16-4=12 possible substitution probabilities. u The most general model will indeed have 12 independent parameters per edge, e.g. p e (C->A), p e (T->A), etc. It need not be symmetric. u Still, most popular models are symmetric, and use far less parameters per edge. u For example, the Jukes-Cantor substitution model assumes equal substitution probability of any unequal pair of nucleotides (across each edge separately).

17 The Jukes-Cantor model (1969) Jukes-Cantor assume equal prob. of change: GA TC    1-3 

18 Tiny ML on Four States : Like Before, Only More Cases Can handle DNA subst. models, AA subst. models,... Constant (per node) depends on alphabet size. A C G T P(G  C) *P C (left subtree)

19 Kimura’s K2P model (1980) Jukes-Cantor model does not take into account that transitions rates (between purines) A  G and (between pyrmidine) C  T are different from transversions rates (A  C, A  T, C  G, G  T). Kimura 2 parameter model uses a different substitution matrix:

20 Kimura’s K2P model (Cont) Leading using similar methods to: Where:

21 Additional Models There are yet more involved DNA substitution models, responding to phenomena occurring in DNA. Some of the models (like Jukes-Cantor, Kimura 2 parameters, and others) exhibit a “group-like” structure that helps analysis. The most general of these is a matrix where all rates of change are distinct (12 parameters). For AA (proteins), models typically have less structure. Further discussion is out of scope for this course. Please refer to the Molecular Evolution course (life science).

22 Back to the 2 States Model Showed efficient solution to the tiny ML problem. Now want to efficiently solve the tiny AML problem. XXYXY YXYXX YYYYX pe1pe1 pe2pe2 pe3pe3 ?????

23 Two Ways to Go In the second version (maximize over states of internal nodes) we are looking for the “most likely” ancestral states. This is called ancestral maximum likelihood (AML). In some sense AML is “between” MP (having ancestral states) and ML (because the goal is still to maximize likelihood). XXYXY YXYXX YYYYX pe1pe1 pe2pe2 pe3pe3 ?????

24 Two Ways to Go In some sense AML is “between” MP (having ancestral states) and ML (because the goal is still to maximize likelihood). The tiny AML algorithm will be like Fitch small MP algorithm: It goes up to the root, then back down to the leaves. XXYXY YXYXX YYYYX pe1pe1 pe2pe2 pe3pe3 ?????

25 Let T be a binary tree with subtrees T 1 and T 2. Let L E (D | T,  ) be the ancestral likelihood of T with E (X or Y) at the node of T’s father. Computing the Ancestral Likelihood X p1p1 p2p2 tree 1 tree 2 Y E p

26 By the definition of ancestral likelihood (maximizing over internal assignments), L X (D| T,  ) = max ( (1-p)L x (D | tree 1,  ) * L x (D | tree 2,  ), pL Y (D | tree 1,  )* L Y (D | tree 2,  ) ) This is key to an efficient DP algorithm for the tiny AML problem (Pupko et. al, 2000) Computing the Ancestral Likelihood (2) X p1p1 p2p2 tree 1 tree 2 Y X p

27 Boundary conditions: At leaves L X (D| T,  ) = 1-p if leaf label is X, p otherwise. At root: We pick label E (X or Y) that maximizes L E (D | tree 1,  ) L E (D | tree 2,  ). We now go down the tree. At each node we pick the label that maximizes the likelihood, given the (known) label of father. Total run time is O(n). Computing the Ancestral Likelihood (2) X p1p1 p2p2 tree 1 tree 2 Y X p