In double vision when drunk By Thomas Huber 23 November 2001 Alexandra Headland
What would we like to be able to calculate/predict/model? Which proteins are expressed? –Alternative splicing –Post-translational modifications What is a protein’s evolution? –Ancestor/descendant relationships (Phylogeny) What is a protein’s structure? –Does a sequence adopt a new/known fold? What is a protein’s function? –Functional annotation How stable is a protein? –Thermodynamics How fast does a protein work/fold? –Kinetics Which molecules do interact? –Protein interaction networks
Three basic choices in molecular modelling Representation –Which degrees of freedom are treated explicitly Scoring –Which scoring function (force field) Searching –Which method to search or sample conformational space Two Linages of Protein Structure Prediction The physicist’s approach –Thermodynamics: Structures with low energy are more likely The biologist’s approach –Similar sequences similar structures
Why is the physicist’s approach better? Biologist’s concept: Homologous sequences fold into similar structures Physicist’s concept: Doing what nature does (forces due to structure) Needs reference sequence Reference free
Challenge I: The force field (score function) Simple task –Find a residue interaction function (parameters) that can tell good sequence-structure matches from bad ones Approaches –Scoring functions based on statistics –Scoring functions based on optimisation –Scoring functions based on physics –Combinations of all above Limitations –Limit of accuracy due to resolution of representation 20-40% “success” rate in hard fold recognition predictions
Challenge II Prediction of Structure Individual proteins (for understanding) –ATPase, RNA-Polymerases –Peptide folding on non-linear lattices Lots of proteins (structural genomics) –Predicted models NOT good enough for molecular replacement in Xtallography! Fragment detection for initial phases?? Use of structure factors in model search?? –BUT, models are useful in NMR process and for biochemists How to improve models? –Incomplete (quick&cheap) experimental constraints Crosslinking + MS Incomplete data from NMR ( 1 , 1 J, 3 J) –Iterative refinement Assignment using models
Challenge III Inverse Folding Many industrial applications metallochaperone ribosomal protein acylphosphatase papillomavirus DNA binding domain 11% 10% 8% 11% GlnB Given a structure, is there a “better” sequence?
Whazon 2002 Opportunities for Collaborations CASP5: The protein structure prediction Olympics (4 th participation) Protein score function based on Gaussian mixture models Molecular simulations –On and off lattices Evolutionary analysis based on sequence, structure and energetics –E.g. ATPase Structural genomics –Please introduce me to someone with a MS!! –Mathematician in Fourier space wanted!! Inverse folding –Want to design a new protein and can make it?