Building Phylogenies Maximum Likelihood
Methods Distance-based Parsimony Maximum likelihood
Methods Distance-based Parsimony Maximum likelihood
ML is based on a Markov model of evolution Observed: The species labeling the leaves Hidden: The ancestral states Transition probabilities: The mutation probabilities Assumptions: –Only mutations are allowed –Sites are independent
Models of evolution at a site Transition probability matrix: M = [m ij ], i, j {A, C, T, G} where m ij = Prob(i j mutation in 1 time unit) Branches may have different lengths
The probability of an assignment AGCT Probability = m TG · m GA · m GG · m TT · m TC · m TT G T T
Ancestral reconstruction: most likely assignment AGCT L* = max X,Y,Z {m XY · m YA · m YG · m XZ · m ZC · m ZT } Y X Z Compute using Viterbi algorithm
Likelihood of a tree AGCT L* = X,Y,Z {m XY · m YA · m YG · m XZ · m ZC · m ZT } Y X Z Compute using forward algorithm
Analyzing a site
Analysis for site j
Analysis for all sites Use enumeration (exhaustive, branch and bound, branch swapping, etc.) to find ML tree
Comments ML is robust ML converges to correct answer as more data is added Can put in a Bayesian statistical framework, to obtain a distribution of possible phylogenies ML can be slow
Complicating factors
Issues Complicating factors: –Gene duplication –Horizontal gene transfer: Exchange of genetic material between species –Chimeric genes Evolution may not be described by a tree, but by a network
Gene Duplication 11 22 human -globin 5’3’
Homology, orthology, and paralogy Homology: Similarity attributed to descent from a common ancestor. Orthologous sequences: Homologous sequences in different species that arose from a common ancestral gene during speciation –May or may not be responsible for a similar function Paralogous sequences: Homologous sequences within a single species that arose by gene duplication.
Orthology and Paralogy
Conflicts between genes y species? AB CAC B SpeciesGenes
Resolving the conflict AB CA B C AB C Problem: Resolve conflicts using the minimum number of duplications