Protein Structure Visualisation PyMOL
What is PyMOL? Open-source molecular viewing program http://www.pymol.org
PyMOL Representations Surfaces Ray-tracing (rendering) Lines, sticks, ribbon, spheres, cartoon(s) Surfaces Transparency, quality Ray-tracing (rendering) Modes
Selections & Objects Every molecule (pdb file) is an object. Selections refer to objects Make smaller or composite objects Changes in representation can affect objects or selections.
Links PDB (protein structure database) PyMOL home: PyMOL manual: www.pdb.org/ PyMOL home: http://pymol.sourceforge.net/ PyMOL manual: http://pymol.sourceforge.net/newman/user/toc.html PyMOL Wiki: http://www.pymolwiki.org/index.php/Main_Page PyMOL settings (documented): http://cluster.earlham.edu/detail/bazaar/software/pymol/modules/pymol/setting.py
Aesthetics of Molecular Images Personal taste General guidelines: Focus on relevant parts Strip away unnecessary information Find good viewing angle! Choice of representation No need for excessive graphics Good figures are (almost) self-explanatory.
Molecular Images Molecular graphics is an example of selective reduction of complexity.
Molecular Images Molecular graphics is an example of selective reduction of complexity.
Molecular Images Molecular graphics is an example of selective reduction of complexity.
Molecular Images Molecular graphics is an example of selective reduction of complexity.
Molecular Images Molecular graphics is an example of selective reduction of complexity.
Molecular Images Molecular graphics is an example of selective reduction of complexity.
Molecular Images Molecular graphics is an example of selective reduction of complexity.
Molecular Images Molecular graphics is an example of selective reduction of complexity.
Molecular Images Molecular graphics is an example of selective reduction of complexity.
Figures in PyMOL
Representations ray_trace_mode 0
Representations ray_trace_mode 1
Representations ray_trace_mode 2
Representations ray_trace_mode 3
Representations
Representations
Representations
Homology Modelling
No Structure = No Go? Do homology modelling. Validate the model. Use known protein structures to make reliable models. Validate the model. Adjust expectations and use accordingly!
Basic concepts - Evolution preserves important functions - Function is performed by specific structure Most of the sequences don't fold
Basic concepts Homologous proteins
Basic concepts Homologous proteins derive from the same ancestral protein
Basic concepts Homologous proteins derive from the same ancestral protein Orthology occurs Paralogy occurs
Basic concepts Homologous proteins derive from the same ancestral protein Orthology occurs with Speciation Structure and function very conserved Paralogy occurs with Duplication Structure conserved
Homology modeling Identify a homolog with solved structure (structural template) using sequence similarity to copy the conserved regions
Homology modeling Identify a homolog with solved structure (structural template) using sequence similarity to copy the conserved regions Structure and function are more conserved than sequence
How Often Can We Do It? Currently ~100,000 structures in the PDB Approx. ~12000 with sequence ident. <30 % and resolution <3.0 Å. 25% of all sequences can be modelled. 50% can be assigned to a fold class.
How Well Can We Do It? Sali, A. & Kuriyan, J. Trends Biochem. Sci. 22, M20–M24 (1999)
How Is It Done? Identify template(s) Improve alignment Initial alignment Improve alignment Backbone generation Loop modelling Side chains Refinement Validation
Template Identification Search with sequence Blast (>60%) Psi-Blast (>30%) profile-profile alignment (>15%) Fold recognition methods (>10%) Use biological information Functional annotation in databases Active site/motifs
alignment pipelines Hhpred (profile-profile comparison) 3 psiblast/hhblits cycles (improve alignment locally with mac) - find homologs with 2 cycles of cs-blast realign with muscle/mafft check ins/dels use the alignment with HHpred
Alignment
1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS F D I C R L P G S A E V 6 -2 -3 2 -1 N 1 5 8 T
1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS F D I C R L P G S A E V 6 -2 -3 2 -1 N 1 5 8 T
Improving the Alignment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS From ”Professional Gambling” by Gert Vriend http://www.cmbi.kun.nl/gv/articles/text/gambling.html
Thumb rules no deletion with terminals far apart in space no insertion in the core check for conserved residues, especially G, P, C check for motif conservation be careful with secondary structure prediction identify disordered/accessory regions
Template Quality Selecting the best template is crucial! The best template may not be the one with the highest % id (best p-value…) Template 1: 93% id, 3.5 Å resolution Template 2: 90% id, 1.5 Å resolution
Backbone Generation Generate the backbone coordinates from the template for the aligned regions. Several programs can do this, most of the groups at CASP6 use Modeller: http://www.salilab.org/modeller/
Side Chains Side chain rotamers are dependent on backbone conformation. Most successful method in CASP6 was SCWRL by Dunbrack et al.: Graph-theory knowledge based method to solve the combinatorial problem of side chain modelling. http://dunbrack.fccc.edu/scwrl4/index.php
Side Chains Prediction accuracy: High for buried residues Low for surface residues Experimental reasons: side chains at the surface are more flexible. Theoretical reasons: much easier to handle hydrophobic packing in the core than the electrostatic interactions including H-bonds to waters. Myoglobin
Side Chains If the seq. id is high, the networks of side chain contacts may be conserved. Keeping the side chain rotamers from the template may be better than predicting new ones.
Error Recovery Errors in the model can NOT be recovered at a later step The alignment can not make up for a bad choice of template. Loop modeling can not make up for a poor alignment. The step where the errors were introduced should be redone.
Model refinement Close loops with Rosetta, modeller, FREAD SCWRL4 for side chains Check for error/clashes before refining don't over-refine (KobaMin)
Validation Most programs will get the bond lengths and angles right. Model Rama. plot ~ template Rama. plot. select a high quality template! Inside/outside distributions of polar and apolar residues.
Model Validation – ProQ ProQ is a neural network-based predictor Structural features quality of a protein model. ProQ is optimized to find correct models… …NOT (necessarily) native structures. Two quality measures: MaxSub & LGscore Arne Elofssons group: http://www.sbc.su.se/~bjorn/ProQ/
Structure Validation Program Suites ProCheck http://www.ebi.ac.uk/thornton-srv/software/PROCHECK/ WhatIf server http://swift.cmbi.ru.nl/whatif/