Maximum Likelihood Molecular Evolution
Maximum Likelihood The likelihood function is the simultaneous density of the observation, as a function of the model parameters. L( ) = Pr(Data| ) If the observations are independent, we can decompose the term into
An example Consider the estimation of heads probability of a coin tossed n times Heads probability p Data = HHTTHTHHTTT L(p) = Pr(D|p) = pp(1-p)(1-p)p(1-p)pp(1- p)(1-p)(1-p) = p 5 (1-p) 6
L(p) = p 5 (1-p) 6 = 5/11
Maximum Likelihood Take the derivative of L with respect to p: Equate it to zero and solve: p = 5/11 ^
Log Likelihood For computational reasons, we maximise the logarithm lnL = 5 lnp + 6 ln(1-p) with derivative p = 5/11 ^
A tree (for one column of the alignment) … A … … C … … G …
Tree likelihood: Assumptions 1.Evolution in different sites is independent. 2.Evolution in different lineages is independent.
Pr(A,C,C,C,G,x,y,z,w|T) = Pr(x) Pr(y|x,t 6 ) Pr(A|y,t 1 ) Pr(C|y,t 2 ) Pr(z|x,t 8 ) Pr(C|z,t 3 ) Pr(w|z,t 7 ) Pr(C|w,t 4 ) Pr(G|w,t 5 )
Using models Observed differences Actual changes AG CT Example: Jukes-Cantor, if i=j, if i≠j
DNA substitution models
Comparison of substitution models
Using models Observed differences Actual changes AG CT Example: Jukes-Cantor, if i=j, if i≠j
30 nucleotides from -globin genes of two primates on a one-edge tree * * Gorilla GAAGTCCTTGAGAAATAAACTGCACACTGG Orangutan GGACTCCTTGAGAAATAAACTGCACACTGG There are two differences and 28 similarities tt lnL t= lnL=
Goldman-Yang/Muse-Gaut model 60+1 parameters Codon models
Detecting selection
Codon table
Ka/Ks ratio Ka: # non-synonymous changes / #non- synonymous sites Ks: # synonymous changes / # synonymous sites Ka/Ks : indicative of selective action <1 : purifying selection 1 : selectively neutral >1 : positive (darwinian) selection
Counting syn/non-syn changes CCC Pro ACC Thr CAC His CCA Pro CAA Gln ACA Thr CAA Gln AAC Asn ACA Thr AAC Asn AAA Lys Alignment: Seq 1:... CCC... Seq 2:... AAA...
Synonymous and non synonymous substitutions
Molecular clock
Zuckerlandl & Pauling (1965)
Molecular clock: use in taxonomy 18S RNA subunit of ribosome Very slowly evolving: good for microbes
Conservation For most sequences, the molecular clock does not apply Intron-exon structure Domains Sudden evolutionary bursts Varying effective population sizes Varying selective pressures