Recitation 5 2/4/09 ML in Phylogeny

Slides:



Advertisements
Similar presentations
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Phylogenetic Trees Lecture 4
Hidden Markov Models Eine Einführung.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Theory By Johan Walters (SR 2003)
CS 171: Introduction to Computer Science II
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Tree Reconstruction.
A Hidden Markov Model for Progressive Multiple Alignment Ari Löytynoja and Michel C. Milinkovitch Appeared in BioInformatics, Vol 19, no.12, 2003 Presented.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Phylogeny Tree Reconstruction
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
Phylogeny Tree Reconstruction
Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Copyright N. Friedman, M. Ninio. I. Pe’er, and T. Pupko. 2001RECOMB, April 2001 Structural EM for Phylogentic Inference Nir Friedman Computer Science &
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Sushmita Roy BMI/CS 576
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics II.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Statistical stuff: models, methods, and performance issues CS 394C September 16, 2013.
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Information Bottleneck versus Maximum Likelihood Felix Polyakov.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
Hidden Markov Models BMI/CS 576
Comp. Genomics Recitation 6 14/11/06 ML and EM.
Phylogenetic Trees - Parsimony Tutorial #12
Decision Trees DEFINITION: DECISION TREE A decision tree is a tree in which the internal nodes represent actions, the arcs represent outcomes of an action,
Lecture 6B – Optimality Criteria: ML & ME
Character-Based Phylogeny Reconstruction
Inferring phylogenetic trees: Distance and maximum likelihood methods
Comparative RNA Structural Analysis
CS 581 Tandy Warnow.
Lecture 6B – Optimality Criteria: ML & ME
The Most General Markov Substitution Model on an Unrooted Tree
Phylogeny.
Presentation transcript:

Recitation 5 2/4/09 ML in Phylogeny Comp. Genomics Recitation 5 2/4/09 ML in Phylogeny Based on Slides by Ron Shamir and Nir Friedman

Outline Maximum likelihood (ML) ML in phylogeny Ancestral sequence reconstruction using ML

Maximum likelihood One of the methods for parameter estimation Likelihood: L=P(Data|Parameters) Simple example: Simple coin with P(head)=p 10 coin tosses 6 heads, 4 tails L=P(Data|Params)=(106)p6 (1-p)4

Maximum likelihood We want to find p that maximizes L=(106)p6 (1-p)4 Infi 1, Remember? Log is a monotolical function, we can optimize logL=log[(106)p6 (1-p)4]= log(106)+6logp+4log(1-p)] Deriving by p we get: 6/p-4/(1-p)=0 Estimate for p:0.6 (Makes sense?)

Likelihood of a Tree Input (small problem): n sequences A tree T, with labels on the leafs (X) Find optimal labeled tree : labeling of internal nodes (Y) branch lengths (b) Maximizing the likelihood P(X|T,Y,b)

Likelihood (2) How to compute P(X|T,Y,b)? Assumptions: Each character is independent The branching is a Markov process: The probability of a node having a given label is only a function of the parent node and the branch length b between them. The probabilities P(x|y,t) are known

Example x1 x2 x3 x4 x5 t1 t2 t3 t5

What if we want P(X|T,b)? Assume that the branch lengths b are known. Independence of sites Markov property independence of each branch ALGMB, December 01 © Ron Shamir , TAU

Properties of P Additivity: Reversibility Allows to freely move the root

Efficient Likelihood Calculation (Felsenstein ’73) Use dynamic prog. similar to parsimony Need Sj(v,a) = Pr(subtree rooted in v | vj = x) Initialization: For each leaf v set Sj(v,a) = 1 if i is labeled by a, otherwise Sj(v,a) = 0 Recursion: Traverse the tree in postorder: for each node v with children u and w, for each state x Complexity: O(nmk2) n species, m chars, k states

Ancestral sequence reconstruction Input: Rooted tree + extant (leaf) sequences Substitution matrix + branch lengths Problem: Find the sequence assignment of internal states which maximizes the total tree likelihood

Solving ancestral sequence reconstruction Simple with parsimony methods, ≈ through the Fitch/Sankoff algorithms Here, we’re interested in ML Maximizing P(ancestral S|contemporary S) Joint vs. Marginal Marginal: focus on a single node (e.g., the root), and maximize its likelihood Joint: Infer all the sequences together

Solutions We can enumerate all the possible ancestral states and check their likelihood… cn possible combinations per character n – number of internal nodes Inapplicable when the tree is large Koshi and Goldshtein (1996) – fast algorithm for marginal reconstruction Pupko, Pe’er, Shamir and Graur (2004): fast algorithm for joint reconstruction

Basics We assume different sites evolve independently Working one site at a time Pij(t) – the probability of observing ij in time t We want to maximize P(v|data)=P(data|v)*P(v)/P(Data) Constant!

DP to the rescue DP often suitable for tree problems Idea: Start from the leaves and climb up the tree The subtree under every node is dependent only on the state of its parent! For node x compute Lx(i) and Cx(i) Lx(i) – the likelihood of x’s subtree under the condition that its parent is assigned with i Cx(i) – the state of x that gives rise to this likelihood

Algorithm phase I Initialization: Progression: Termination: For a leaf y assigned with j: Cy(i)=j, Ly(i)=Pij(t) Progression: For an internal node z with sons x,y already visited: for each i we compute Termination: For the root with sons x,y,z – choose k maximizing

Algorithm phase II “Traceback” Traverse the tree from root to the leaves For every internal node x with father y already reconstructed with i Reconstruct the state in x by setting Cx(i) Continue until all the nodes are reconstructed

Example

Complexity For n internal nodes and c possible states we compute a DP table of O(nc) cells. As we maximize in every cell over c states, time is O(nc2) As c is constant – O(n)