DIVERSIFYING SELECTION AND FUNCTIONAL CONSTRAINT ESTIMATING THE dN/dS RATIO FOR GENE SEQUENCES IN THE PRESENCE OF RECOMBINATION Danny Wilson 12th October 2004
Menu Codon-based models of molecular evolution An new method for estimating omega with recombination Does it work? Simulation studies and example data
Codon-based models of molecular evolution Part one Codon-based models of molecular evolution
Sampling usually occurs at this point i.e. post-selection Ancestral type Neutral mutant Inviable mutant Sampling usually occurs at this point i.e. post-selection Mutation Selection Underlying rates of non-synonymous mutation are usually confounded with selection against inviable mutants. Thus it is convenient to model functional constraint as mutational bias. (Or rather, make no attempt to disentangle the two).
Types of single nucleotide mutation Transitions vs. transversions Purine Transitions Transversions T C Pyramidine Transitions For any base there are always 2 possible transversions and 1 possible transition.
Types of codon mutation Synonymous vs. non-synonymous G A Leucine T G A Leucine Methionine Synonymous Non-synonymous Leucine pH 5.98 6-fold degeneracy in the genetic code Methionine pH 5.74 Single unique codon ATG CH3-S-(CH2)2-CH(NH2)-COOH (CH3)2-CH-CH2-CH(NH2)-COOH
Example: CTT C T T T T T A T T G T T T C T T A T T G T T T C T T A T T Phe Non-synonymous transition wkm Ile Non-synonymous transversion wm Val Ser Tyr Cys Leu Synonymous transversion m km A T T Leucine G T T T C T T A T T G T T T C T T A T T G
Nielsen and Yang (1998) codon-based model of molecular evolution Mutation rate Synonymous transversion m Synonymous transition km Non-synonymous transversion wm Non-synonymous transition wkm Other Interpretation k Transition-transversion ratio w = dN/dS Relative viability of non-synonymous mutations
codeML Pros Viable method for detecting mode of selection on a codon sequence Cons Categorizes possible values for omega into a small number of discrete intervals Results can be misleading in the presence of recombination
An new method for estimating omega with recombination Part two An new method for estimating omega with recombination
Inference with recombination
Li and Stephens (2003) Approximation to the likelihood
Li and Stephens (2003) Approximation to the likelihood TTTGATACTGTTGCCGAAGGTTTGGGCGAAATTCGCGATTTATTGCGCCGTTATCATCAT TTTGATACCGTTGCCGAAGGTTTGGGTGAAATTCGCGATTTATTGCGCCGTTACCACCGC TTTGATACCGTTGCCGAAGGTTTGGGTAAAATTCGCGATTTATTGCGCCGTTACCACCGC TTTGATACCGTTGCCGAAGGTTTGGGCGAAATTCGTGATTTATTGCGCCGTTATCATCAT
Li and Stephens (2003) Approximation to the likelihood TTTGATACTGTTGCCGAAGGTTTGGGCGAAATTCGCGATTTATTGCGCCGTTATCATCAT TTTGATACCGTTGCCGAAGGTTTGGGTGAAATTCGCGATTTATTGCGCCGTTACCACCGC TTTGATACCGTTGCCGAAGGTTTGGGTAAAATTCGCGATTTATTGCGCCGTTACCACCGC
My modification to Li and Stephens(2003)
Estimating variable omega The problem A constant omega model is prone to averaging positive and negative omegas in a gene Allowing every site its own omega leaves little information for inference The solution A change-point model where windows of adjacent sites share the same omega
Estimating variable omega MCMC moves: Change omega for a single block Extend a block 5’ or 3’ Split an existing block Merge adjacent blocks w1 w2 w3 w4 w5
Does it work? Simulation studies and example data Part three Does it work? Simulation studies and example data
Posterior distribution for known and unknown genealogy
Posterior distribution for known and unknown genealogy
Neutral dataset True omega Posterior mean Posterior HPD interval
Non-neutral dataset True omega Posterior mean Posterior HPD interval
HIV envelope gene Slow Non-Progressors vs Rapid Progressors
HIV envelope gene Slow Non-Progressors vs Rapid Progressors
Neisseria meningitidis PorB3
Neisseria meningitidis PorB3 95% HPD Upper 0.0386 95% HPD Lower 0.0187
Work in progress… Variable recombination rate Model indels Falsifiability test Test for sensitivity to rate heterogeneity
Acknowledgements Gil McVean (Supervisor) Martin Maiden (Supervisor) Ziheng Yang Rachel Urwin (meninge data) Charlie Edwards (HIV data)