Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.

Slides:



Advertisements
Similar presentations
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Phylogenetic Trees Lecture 4
Molecular Evolution and Phylogenetic Tree Reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Tree Reconstruction.
A Hidden Markov Model for Progressive Multiple Alignment Ari Löytynoja and Michel C. Milinkovitch Appeared in BioInformatics, Vol 19, no.12, 2003 Presented.
Problem Set 2 Solutions Tree Reconstruction Algorithms
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Phylogeny Tree Reconstruction
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Fast Algorithms for Minimum Evolution Richard Desper, NCBI Olivier Gascuel, LIRMM.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Realistic evolutionary models Marjolijn Elsinga & Lars Hemel.
Phylogeny Tree Reconstruction
Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
CISC667, F05, Lec16, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (III) Probabilistic methods.
Copyright N. Friedman, M. Ninio. I. Pe’er, and T. Pupko. 2001RECOMB, April 2001 Structural EM for Phylogentic Inference Nir Friedman Computer Science &
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Phylogeny Tree Reconstruction
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Phylogenetic trees Sushmita Roy BMI/CS 576
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
KNURE, Software department, Ph , N.V. Bilous Faculty of computer sciences Software department, KNURE The trees.
Ceng-112 Data Structures I 1 Chapter 7 Introduction to Trees.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
MOLECULAR PHYLOGENETICS Four main families of molecular phylogenetic methods :  Parsimony  Distance methods  Maximum likelihood methods  Bayesian methods.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Molecular Ecolution: Phylogenetic trees Eric Xing Lecture 21, April.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Phylogenetic Trees - Parsimony Tutorial #13
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Phylogenetic basis of systematics
Data Mining Lecture 11.
Multiple Alignment and Phylogenetic Trees
Recitation 5 2/4/09 ML in Phylogeny
BNFO 602 Phylogenetics Usman Roshan.
CS 581 Tandy Warnow.
Trees Addenda.
BNFO 602 Phylogenetics – maximum likelihood
BNFO 602 Phylogenetics Usman Roshan.
Phylogeny.
Presentation transcript:

Probabilistic Approaches to Phylogenies BMI/CS Sushmita Roy Oct 2 nd, 2014

Readings Chapter 8 – Sections 8.1, 8.2, 8.3, 8.4

Key concepts Scoring a tree based on likelihood of observed data – How to compute the likelihood of sequences using a given tree and conditional probabilities Felsenstein’s algorithm General understanding of where the conditional probabilities are obtained from General understanding of search strategies – (similar to parsimony)

Probabilistic methods for phylogenies data: a set of n sequences tree: a phylogenetic tree for the n sequences Two approaches – Maximum likelihood framework P(data|tree) – Bayesian framework P(tree|data) Both need a probabilistic model to compute the probability of a set of sequences, given a tree and branch lengths

Maximum likelihood methods for phylogenetic trees “best” tree is the one that maximizes the likelihood of data (sequence) given model (Tree topology and branch length) Phylogenetic tree construction requires – Scoring a tree Requires computing the likelihood of sequence given the tree That is computing the probability of the sequences given a tree – Searching the space of possible trees Given a probabilistic model of sequence changes, we can compute the tree with the greatest likelihood (maximum likelihood)

Notation for computing the probability of sequences given a tree x j : sequence at node j x j i : character in the i th position for the j th sequence t j : length of the j th branch T : tree topology P(x|y,t): probability of switching from y to x from ancestor to child along a branch of length t – We will come back to defining such probabilistic models later

Computing the probability of sequences on a tree If we know P(x|y,t) we can compute the probability of the sequences This relies on a key assumption: – sequence at a child node i is independent of everything else given i ’s parent – E.g. for a node i whose parent is given by α(i) P(x i |x α(i),x j,x k..)=P(x i |x α(i) ) We will also make additional simplifying assumptions – We have an ungapped alignment – Characters at different sites evolve independently

Example of computing the probability of sequences given a tree x1x1 x2x2 x3x3 x4x4 x5x5 t1t1 t2t2 t3t3 t4t4 The probability of these sequences given this tree is Assume we are given the following tree for three sequences x 1, x 2 and x 3 at the leaf nodes First, assume that the sites evolve independently, and there are a total on N sites Hereafter, let’s just focus on one site, u Also for clarity, we will use t for all branch lengths

Example continued The expression Assume conditional independence, the above is Written more compactly as α(i) denotes the parent of i x1x1 x2x2 x3x3 x4x4 x5x5 t1t1 t2t2 t3t3 t4t4

Example continued Or more generally for n sequences at the leaves as Between internal nodes Between extant and internal nodes

But.. the ancestral sequences cannot be observed So our probability calculation needs to sum over all ancestral states Let us consider a simple example of two sequences xu1xu1 xu2xu2 t1t1 t2t2

Summing of ancestral state for a pair of sequences q a is the probability of observing character a at the root node 3 xu1xu1 xu2xu2 t1t1 t2t2 a

Generalizing this to n sequences Requires us to sum over all of the internal nodes α(i) gives the parent of I a i is a variable storing the character at the i th internal node Felsenstein’s algorithm gives an efficient way to compute this quantity Between internal nodes Between extant and internal nodes

Felsenstein’s algorithm Input: Given a set of n sequences at the leaf nodes, conditional probability distribution of character switch, and a tree topology Output: The likelihood of the sequences Also based on Dynamic programming – Relies on computations performed in subtrees for computations at the root of these subtrees Very similar ideas as in the Weighted Parsimony algorithm

Notation for Felsenstein’s algorithm P(L k |a): probability of the leaves below node k, given that the residue at k is a i and j will denote the children of k a, b, c characters at any node We’ll drop the subscript u and work with only one site

Felsentein’s algorithm Initialize: k=2n-1 Recursion: – If k is a leaf node, – Else, compute P(L i |a) and P(L j |a) for all a at daughters i and j Termination – Likelihood at a site

An observation and a simplication Note that Further more, we will assume that the conditional probabilities are independent of branch length Finally, assume q a =0.25 for all a in {A,T,G,C}

What is probability for the following set of residues ATG ACGT A C G T 0.7 Assume the above conditional probability matrix P(b|a) for all branches a b

In class exercise

The probabilities computed for each node Probability of sequence given tree is 0.25( )=0.0073

Felsentein’s algorithm comments Very similar to the weighted parsimony case – Main differences are at Leaf nodes Minimization versus summation for internal nodes Can it be used to infer ancestral states as well? – Instead of summing, we would maximize – As in the parsimony case, we would need to keep track of the maximizing assignment

Errata Slide 16 and slide 17 should have – Replace P(L i |a) by P(L i |b) Slide 20 had typos.