Thomas Huber Computational Biology and Bioinformatics Environment ComBinE Department of Mathematics The University of Queensland Protein Scoring Functions: Essential Tools or Fancy Fad?
Why do we (still) care about Protein Structures/Prediction? Academic curiosity? –Understanding how nature works Urgency of prediction – 10 4 structures are determined insignificant compared to all proteins –sequencing = fast & cheap –structure determination = hard & expensive Transistors in Intel processors TrEMBL sequences (computer annotated) SwissProt sequences (annotated) structures in PDB
What would we like to be able to predict? What is a protein’s structure? –Does a sequence adopt a known fold? Fold recognition –Does a sequence adopt a new fold? New fold prediction (dream of structural genomics) How stable is a protein –Thermodynamic stability What is a protein’s function? –Functional annotation
Three basic choices in molecular modelling Representation –Which degrees of freedom are treated explicitly Scoring –Which scoring function (force field) Searching –Which method to search or sample conformational space Two Linages of Protein Structure Prediction The physicist’s approach –Thermodynamics: Structures with low energy are more likely The biologist’s approach –Similar sequences similar structures
Fragment Scoring Proteins are decomposed into overlapping fragments of 7 residues Each fragment is described by Amino acid specific local structure Non-specific environment Fragments are clustered and a statistical model for each cluster is built Total score = fragment scores
Finding Remote Homologues with sausage 572 sequence-structure pairs Structures are similar (FSSP) > 70% structurally aligned < 20% sequence identity
RNA-dependent RNA Polymerases
A Real Case Example RNA-dependent RNA polymerases Dengue virus Bacteriophage 6
Testing/Breaking the Scoring Designed -sheet (Serrano) –12 residues –Forms stable -sheet at room temperature
Another Uniquely Folded Mini-Protein Villin head-piece (36 residues) –High thermodynamic stability (T m >70º) –Folds autonmously
A Uniquely Folded Mini- Protein Zinc finger analoge (Mayo) –28 residues –thermodynamic stable (T m 25º)
Trimer Stability Nitrogen regulation proteins –2 protein (PII (GlnB) and GlnK) –112 residues –sequence: 67% identities, 82% positives –structure: 0.7Å RMSD –trimeric –Dr S. Vasudevan: hetero-trimers
Hetero-trimer Stability What is the most/least stable trimer Why use a low resolution force field? –Structures differ (0.7Å RMSD) –Side chains are hard to optimise Calculation: –GlnB 3 > GlnB 2 -GlnK > GlnB-GlnK 2 > GlnK 3 Experiment: –GlnB 3 > GlnB 2 -GlnK > GlnB-GlnK 2 > GlnK 3 GlnK GlnB
People sausage –Andrew Torda (RSC) –Oliver Martin (RSC) GlnB/GlnK, RdR polymerases –Subhash Vasudevan (JCU) Sausage and Cassandra freely available Increasing urgency for in-silico proteomics Good force fields = essential for success –Different tasks (may) require different scoring schemes Summary