Current Status of Homology Modeling Using MCSG Structures 319 MCSG structures in PDB have over 400,000 sequence homologues. These structures represent ~350 domains. Models are built by MODELLER (Sali) and quality is assessed using PROSA (Sippl). High-quality models can be generated for ~80,000 proteins. Web site has been established that allows automated modeling of sequence homologues and evaluate the quality of the models. Gly140/141 Phe97 Asp96 Tyr95 Gly154/155 Phe103 Trp102 Gly140/141 Phe97 Asp96 Tyr95 Gly154/155 Phe103 Trp102 1t5b domain template Q92LV5 domain model Gly140/141 Phe97 Asp96 Tyr95 Gly154/155 Phe103 Trp102 Gly140/141 Phe97 Asp96 Tyr95 Gly154/155 Phe103 Trp102 1t5b domain template Q92LV5 domain model
Protein Structure Initiative - the Need for Large-Scale Homology Modeling In the next five years PSI can determine approximately 3,000-4,000 protein structures, mainly at course granularity. Reality check: novel structures in PDB will represent very small fraction of sequences in GenBank – reliable homology modeling is critical for obtaining 3D models and extending experimental work. In PSI2 targets for structure determination are selected from large families, therefore determined structures have a large number of sequence homologues at wide range of sequence similarity. Protein often display different function. Homology modeling must provide tools and 3D proteins models that can be used for high-confidence, reliable interpretation of specific structural features in distant (15-25%) sequence homologues, protein function assignment and evolution. Models should provide guide for increasing number of more sophisticated experiments including: (i) aid mutagenesis and biochemical studies, (ii) predicting ligand binding, (iii) predicting oligomerization state, (iv) predicting cellular interactions (protein/protein/DNA/RNA). We need to consider how PSI target selection of protein sequences and subsequent structure determination can improve homology modeling and the quality of the models.
Major Issues with Large-Scale Homology Modeling for Structural Genomics 3D proteins models for distant (15-25%) sequence homologues are often not suitable. Because of sequence divergence for very large families only small fraction of sequences can be reliably modeled (10-20%). Homology modeling must provide input to target selection in fine coverage of protein families. Domain parsing needs improvement. We should be able to model multi-domain proteins from structures of individual domains. We should be able to model neighbouring side chains and important structural and functional features that currently are difficult to assigned and predict correctly. We need methods to predict unusual features and departures from the structure that is used for modelling. Modelling loop and high B factor regions needs improvement.
Structure of P5CR Exemplifies Challenges for Homology Modeling Two structures of P5CR were determined. The proteins share 22% sequence identity and 47% sequence similarity. Structures of monomer are very similar but show individual features. Problems: Protein has two domains and forms oligomers, one domain shows major swapping and protein forms different oligomeric forms in different species
Human Aldose Reductase – SeMet MAD at 0.9 Å Comparison – Experimental vs. Refined Map Refined 0.9 Å, sigmaA (2mF o -DF c ), contour level: 1 sigma Experimental 0.9 Å, F o, contour level: 1 sigma
MAD Map at 3.2 Å, 1.8 Å, 1.6 Å and 1.1 Å
Inhibitor Head Existing in Double Conformation Hard to Interpret at RT (1.45 Å ), Clear at 100 K (0.8 Å ) Tyr 48 His 110