BNFO 602 Phylogenetics – maximum likelihood Usman Roshan
Maximum Likelihood D = data, M = model Bayes rule P(M|D) = P(D|M)P(M) / P(D) P(M|D) is the posterior probability P(D|M) is the likelihood P(M) is the prior probability on the model By rewriting P(D) we get = P(D|M)P(M) / ∑M P(D|M)P(M) which implies that P(M|D) is proportional to P(D|M)P(M) Note that by assuming uniform priors P(M|D) = P(D|M)1/k / ∑M P(D|M)1/k
Maximum Likelihood Data (input) is the alignment Model consists of the tree with branch lengths and leaves labeled with the DNA sequences in the data (input) a DNA sequence evolution model (such as Jukes Cantor) How do we compute the likelihood P(D|T) of the tree below?
Which of the two trees below have the higher likelihood?
Maximum Likelihood ML problem: Under a fixed model find the tree with branch lengths and internal nodes that has the highest likelihood. Very large search space NP-hard Sub-problems What is the likelihood of a tree with branch lengths and internal nodes? Linear time solution What if no internal nodes are given? Felsenstein’s algorithm gives linear time solution What if no branch lengths are given? We use gradient descent
Maximum Likelihood Comparison to MP: Both are NP-hard For fixed tree it takes polynomial time to find the parsimony score For fixed tree is is NP-hard to find the likelihood score Similar local search heuristics as MP