Download presentation
Published byCaitlin Briggs Modified over 9 years ago
1
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest that the explanation that makes the observation most likely is the preferred one Given some data, D, and a hypothesis, H, LD = Pr(D|H) - the likelihood of the data is the probability of observing D given H Previous slide = L = P(D | M, θ, τ, ν) For our purposes, D is the data set (sequences typically) and H is any possible tree relating those sequences The best tree is the one that makes the observed data most likely. The main idea behind maximum likelihood (ML) phylogenetic inference is to determine the tree topology, branch lengths, and evolutionary model that maximizes the probability of observing the sequences observed L(τ, θ) = Prob(Data | τ, θ) = Prob(Aligned sequences | tree, model)
2
Maximum Likelihood Given four taxa and associated sequences
1 – …TTCGCTTAA… 2 – …TTTCTGCAA… 3 – …TTGCTGGTA… 4 – …TCTCGGCAA… If we have an evolutionary model, we have an estimate of the instantaneous rates of change for any given site and set of nucleotides We can also derive any number of hypothetical trees on which to map the data Our job is to determine the likelihood of data given the evolutionary model and the possible trees
3
Maximum Likelihood Given four taxa and associated sequences
1 – …TTCGCTTAA… 2 – …TTTCTGCAA… 3 – …TTGCTGGTA… 4 – …TCTCGGCAA… For the bold position, there are three possible trees Assuming a given evolutionary model, a value can be assigned to each topology C C C G T(2) G Y Y Y X T(3) X G X T(2) T(2) T(3) T(3)
4
Maximum Likelihood 1 – …TTCGCTTAA… 2 – …TTTCTGCAA… 3 – …TTGCTGGTA… 4 – …TCTCGGCAA… Just as we are considering only one site among many, we can consider one tree among many This is one possible rooted tree There are 16 possible values for X and Y Again, let’s choose one of them G C T(2) T(3) Y T X A
5
Maximum Likelihood 1 – …TTCGCTTAA… 2 – …TTTCTGCAA… 3 – …TTGCTGGTA… 4 – …TCTCGGCAA… Since we have a model of evolutionary change we can calculate the probability of this tree for this site It is a product of the probability of all of the states/changes of state given our model of sequence evolution P(τ) = Π Pi, a product function P(τ) = PA x PAG x PAC x PAT x PTT x PTT N i=1 G C T(2) T(3) T A
6
Maximum Likelihood 1 – …TTCGCTTAA… 2 – …TTTCTGCAA… 3 – …TTGCTGGTA… 4 – …TCTCGGCAA… The probability must be calculated for all sites for this tree P(τ) = Π Pi, a product function Then for all sites in all possible trees These numbers are very small so they are typically expressed as log likelihoods ln L(τ) = Σ lnLi ln L(τ) is the log likelihood of observing the given alignment under the chosen evolutionary model, given that particular tree and branch lengths on the tree Because we are dealing not only with simple tree topologies but also with branch lengths, there are even more trees than ordinarily considered Heuristic (approximate) methods are usually applied N i=1 N G C T(2) T(3) i=1 T A
7
Maximum Likelihood Because we are dealing not only with simple tree topologies but also with branch lengths, there are even more trees than ordinarily considered To reduce the computational complexity, heuristic methods are usually applied to suggest reasonable starting trees Exact methods – will find the best tree under a given criterion but not feasible for large data sets Branch and Bound Heuristic - any approach to problem solving, learning, or discovery that employs a practical methodology not guaranteed to be optimal or perfect, but sufficient for the immediate goal Stepwise addition Branch swapping methods Quartet puzzling
8
Maximum Likelihood Branch-and-Bound Method Good for 12 – 25 taxa
Add taxa to trees along ‘paths’ Quit a path when it is apparent that no solutions along that path are optimal Accomplished by evaluating tree criterion after each addition Good for 12 – 25 taxa Will find a locally optimal tree
9
Maximum Likelihood Branch-and-Bound Method L = number of terminal taxa
Choose an initial tree with three leaves from L Add a terminal taxon at a defined position Repeat until all taxa are added Evaluate using optimality criterion Set upper bound for optimality criterion Repeat
10
Maximum Likelihood Branch-and-Bound Method Taxa A-F evaluated in
L = number of terminal taxa Choose an initial tree with three leaves from L Add a terminal taxon at a defined position Repeat until all taxa are added Evaluate using optimality criterion Set upper bound for optimality criterion Repeat Taxa A-F evaluated in this example
11
Maximum Likelihood Stepwise Addition Method
Select three random taxa from n terminal taxa Find the most likely tree Add another random taxon Repeat n-3 times Will find a locally optimal tree Other addition orders may give a more optimal tree Perform tree rearrangements to search for other optimal trees
12
Maximum Likelihood Stepwise Addition Method
Select three random taxa from n terminal taxa Find the most likely tree Add another random taxon Repeat n-3 times Will find a locally optimal tree Other addition orders may give a more optimal tree Perform tree rearrangements to search for other optimal trees
13
Maximum Likelihood Once you have found a reasonable tree using a heuristic method… Perform branch swapping to search for other, possibly more optimal trees Nearest neighbor interchange (NNI) Subtree pruning and regrafting (SPR) Tree bisection and reconnection (TBR) NNI TBR SPR
14
Maximum Likelihood Nearest neighbor interchange (NNI) For any internal edge, there are three ways the four subtrees can be regrouped
15
Maximum Likelihood Subtree pruning and regrafting (SPR) Clip subtrees and reinsert them at all possible locations
16
Maximum Likelihood Cut a tree into two subtrees
Tree bisection and reconnection (TBR) Cut a tree into two subtrees Reconnect the trees by creating a new branch that joins one subtree to a branch on the other
17
Maximum Likelihood Quartet Puzzling Method
Given any set of sequences, any group of four is a quartet
18
Maximum Likelihood Quartet Puzzling Method
1. Estimate parameters for the model to be used Build distance matrix (D) and corresponding NJ tree using a given model Determine ML branch lengths and use to re-estimate model parameters Using new estimates, rebuild D and NJ tree, re-estimate parameters Iterate second two steps until the parameters are stable 2. Calculate likelihoods for all quartets = 3 x (n!/(4!(n-4)!)) 3. Add taxa in random order and positioned in least contradictory position based on likelihoods Repeat using different addition orders to generate a set of trees 4. Build a consensus tree where the percent occurrence of each branch is represented with puzzle support values
19
Maximum Likelihood Quartet Puzzling Method Assume 6 taxa
1. Pick four at random and build the best ML tree 2. Pick another random sequence and add it for all possible quartets 3. Evaluate ML for each 4. Graft new taxon on best branch based on ML 5. Repeat 2,3 best 4 4’ 2’,3’ best
20
Maximum Likelihood All of these methods will work for ML, parsimony, Bayesian methods
21
Maximum Likelihood Objections to ML
Phylogenetic inferences using ML require an explicit model of evolution Good – we are aware of any assumptions Bad – where do we get our parameter estimates? If we knew the actual parameters, we could better infer evolution In order to get the actual parameters, we need to know the evolutionary history
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.