Protein Structure Visualisation

Protein Structure Visualisation
PyMOL

What is PyMOL? Open-source molecular viewing program

PyMOL Representations Surfaces Ray-tracing (rendering)
Lines, sticks, ribbon, spheres, cartoon(s) Surfaces Transparency, quality Ray-tracing (rendering) Modes

Selections & Objects Every molecule (pdb file) is an object.
Selections refer to objects Make smaller or composite objects Changes in representation can affect objects or selections.

Links PDB (protein structure database) PyMOL home: PyMOL manual:
PyMOL home: PyMOL manual: PyMOL Wiki: PyMOL settings (documented):

Aesthetics of Molecular Images
Personal taste General guidelines: Focus on relevant parts Strip away unnecessary information Find good viewing angle! Choice of representation No need for excessive graphics Good figures are (almost) self-explanatory.

Molecular Images Molecular graphics is an example of selective reduction of complexity.

Figures in PyMOL

Representations ray_trace_mode 0

Representations

Homology Modelling

No Structure = No Go? Do homology modelling. Validate the model.
Use known protein structures to make reliable models. Validate the model. Adjust expectations and use accordingly!

Basic concepts - Evolution preserves important functions
- Function is performed by specific structure Most of the sequences don't fold

Basic concepts Homologous proteins

Basic concepts Homologous proteins
derive from the same ancestral protein

derive from the same ancestral protein Orthology occurs Paralogy occurs

derive from the same ancestral protein Orthology occurs with Speciation Structure and function very conserved Paralogy occurs with Duplication Structure conserved

Homology modeling Identify a homolog with solved structure
(structural template) using sequence similarity to copy the conserved regions

Homology modeling Identify a homolog with solved structure
(structural template) using sequence similarity to copy the conserved regions Structure and function are more conserved than sequence

How Often Can We Do It? Currently ~100,000 structures in the PDB
Approx. ~12000 with sequence ident. <30 % and resolution <3.0 Å. 25% of all sequences can be modelled. 50% can be assigned to a fold class.

How Well Can We Do It? Sali, A. & Kuriyan, J. Trends Biochem. Sci. 22, M20–M24 (1999)

How Is It Done? Identify template(s) Improve alignment
Initial alignment Improve alignment Backbone generation Loop modelling Side chains Refinement Validation 

Template Identification
Search with sequence Blast (>60%) Psi-Blast (>30%) profile-profile alignment (>15%) Fold recognition methods (>10%) Use biological information Functional annotation in databases Active site/motifs

alignment pipelines Hhpred (profile-profile comparison)
3 psiblast/hhblits cycles (improve alignment locally with mac) - find homologs with 2 cycles of cs-blast realign with muscle/mafft check ins/dels use the alignment with HHpred

Alignment

PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO GLU ALA ILE CYS PHE ASN VAL CYS ARG THR PRO GLU ALA ILE CYS F D I C R L P G S A E V 6 -2 -3 2 -1 N 1 5 8 T

Improving the Alignment
PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO GLU ALA ILE CYS PHE ASN VAL CYS ARG THR PRO GLU ALA ILE CYS From ”Professional Gambling” by Gert Vriend

Thumb rules no deletion with terminals far apart in space
no insertion in the core check for conserved residues, especially G, P, C check for motif conservation be careful with secondary structure prediction identify disordered/accessory regions

Template Quality Selecting the best template is crucial!
The best template may not be the one with the highest % id (best p-value…) Template 1: 93% id, 3.5 Å resolution  Template 2: 90% id, 1.5 Å resolution 

Backbone Generation Generate the backbone coordinates from the template for the aligned regions. Several programs can do this, most of the groups at CASP6 use Modeller:

Side Chains Side chain rotamers are dependent on backbone conformation. Most successful method in CASP6 was SCWRL by Dunbrack et al.: Graph-theory knowledge based method to solve the combinatorial problem of side chain modelling.

Side Chains Prediction accuracy: High for buried residues
Low for surface residues Experimental reasons: side chains at the surface are more flexible. Theoretical reasons: much easier to handle hydrophobic packing in the core than the electrostatic interactions including H-bonds to waters. Myoglobin

Side Chains If the seq. id is high, the networks of side chain contacts may be conserved. Keeping the side chain rotamers from the template may be better than predicting new ones.

Error Recovery Errors in the model can NOT be recovered at a later step The alignment can not make up for a bad choice of template. Loop modeling can not make up for a poor alignment. The step where the errors were introduced should be redone.

Model refinement Close loops with Rosetta, modeller, FREAD
SCWRL4 for side chains Check for error/clashes before refining don't over-refine (KobaMin)

Validation Most programs will get the bond lengths and angles right.
Model Rama. plot ~ template Rama. plot. select a high quality template! Inside/outside distributions of polar and apolar residues.

Model Validation – ProQ
ProQ is a neural network-based predictor Structural features  quality of a protein model. ProQ is optimized to find correct models… …NOT (necessarily) native structures. Two quality measures: MaxSub & LGscore Arne Elofssons group:

Structure Validation Program Suites
ProCheck WhatIf server

Protein Structure Visualisation

Similar presentations

Presentation on theme: "Protein Structure Visualisation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Protein Structure Visualisation

Similar presentations

Presentation on theme: "Protein Structure Visualisation"— Presentation transcript:

Similar presentations

About project

Feedback