Protein modelling ● Protein structure is the key to understanding protein function ● Protein structure ● Topics in modelling and computational methods – Comparative/homology modelling – Fold recognition – Fold prediction – Dynamics of proteins
Motivation ● Protein structure determines protein function ● For the majority of proteins the structure is not known
Correlation structure & sequence ● Chothia & Lesk (1986): Correlation between structural divergence and sequence similarity Fold space Time Fold 1 Fold 2 Evolution
Comparative/homology modelling Template sequence Template structure Target sequence Alignment Model
The crucial importance of the alignment ● An alignment defines structurally equivalent positions! Template sequence Template structure Target sequence Alignment Model
Steps in comparative modelling ● Find suitable template(s) ● Build alignment between target and template(s) ● Build model(s) – Replace sidechains – Resolve conflicts in the structure – Model loops (regions without an alignment) ● Evaluate and select model(s)
State of the art in homology modelling ● Template search – (iterative) sequence database searches (PSIBLAST) ● Alignment step – multiple alignment of close to fairly distant homologues ● Modelling step – rigid body assembly – segment matching – satisfaction of spatial constraints
Modelling by spatial restraints ● Generate many constraints: – Homology derived constraints ● Distances and angles between aligned positions should be similar – Stereochemical constraints ● Bond lengths, bond angles, dihedral angles, nonbonded atom-atom contacts ● Model derived by minimizing restraints Modeller: Sali & Blundell (1993)
Loop modelling ● Exposed loop regions usually more variable than protein core ● Often very important for protein function ● Loops longer than 5 residues difficult to built ● Mini-protein folding problem
Model evaluation ● Check of stereochemistry – bond lengths & angles, peptide bond planarity, side- chain ring planarity, chirality, torsion angles, clashes ● Check of spatial features – hydrophobic core, solvent accessibility, distribution of charged groups, atom-atom-distances, atomic volumes, main-chain hydrogen bonding ● 3D profiles/mean force potentials – residue environment
Knowledge-based mean force potentials Melo & Feytmanns (1997) ● Compute typical atomic/residue environments based on known protein structures
● Sequence from different species ● Is binding to ligand conserved? Modelling a transcription factor
Ligand binding domain hydrogen bonds to ligand homo-serine lactone moiety binding acyl moiety binding
DNA binding domain Linker DNA binding domain
Template Target Variable loops New Loop MODELLER output
Ligand binding pocket
Errors in comparative modelling Marti-Renom et al. (2000) a)Side chain packing b)Distortions and shifts c)Loops d)Misalignments e)Incorrect template Template Model True structure
Modelling accuracy Marti-Renom et al. (2000)
Applications of homology modelling Marti-Renom et al. (2000)
Structural genomics ● Post-genomics: – many new sequences, no function ● Aim: a structure for every protein ● High-throughput structure determination – robotics – standard protocols for cloning/expression/crystallization
Structural coverage Vitkup et al. (2001) high quality models Complete models Total = 43 %
Target selection
Protein modelling ● Protein structure is the key to understanding protein function ● Protein structure ● Topics in modelling and computational methods – Comparative/homology modelling – Fold recognition – Fold prediction – Dynamics of proteins
Fold recognition ● Structure is more conserved than sequence Limit of sequence similarity searches Structural similarity Fold space Target Protein structures
Fold recognition / Threading ● Is a sequence compatible with a structure? ● The idea: evolutionary related proteins share common folding motifs ● Contact matrix = motif ● Mean-force potentials to score every contact ● Optimize alignment to minimize pseudo-energy
Protein modelling ● Protein structure is the key to understanding protein function ● Protein structure ● Topics in modelling and computational methods – Comparative/homology modelling – Fold recognition – Fold prediction – Dynamics of proteins
Fold prediction – Rosetta method ● Knowledge based scoring function P(structure) * P(sequence|structure) P(sequence) P(structure|sequence) = P(structure) = probability of a protein-like structure (no clashes, globular shape) P(sequence|structure) = f(residue contacts in native structures) Simons et al. (1997) Bayes' law: protein-like structures sequence consistent local structure near-native structures
Environment specific scoring function ● Environment E i specific interactions ● Environment – defined by the number of neighbours – implicitely distinguishes between buried and exposed residues i i<j cf. mean force potential Simons et al. (1997)
Collection of putative backbone conformations Protein sequence Library of small segments sequencesstructures... For each window of 9 residues: lookup 25 closest (sequence) neighbours in library... Simons et al. (1997)
MC-SA optimization Simons et al. (1997) ● for each random position – pick a random neighbour – replace backbone conformation – calculate probability of new structure ● MC: Monte-Carlo – accept up-hill moves with a certain probability ● SA: simulated annealing – first allow many changes, later less changes
Results ● Small molecules: ok ● Proteins with mostly α-helices: ok ● Proteins with mostly β-sheets: not so ok Simons et al. (1997)
Dynamics of proteins ● Protein structure is the key to understanding protein function ● Protein structure ● Topics in modelling and computational methods – Comparative/homology modelling – Fold recognition – Fold prediction – Dynamics of proteins
Dynamics of proteins ● Local Motions (0.01 to 5 Å, to s) – Atomic fluctuations – Sidechain Motions – Loop Motions ● Rigid Body Motions (1 to 10Å, to 1s) – Helix Motions – Domain Motions (hinge bending) – Subunit motions ● Large-Scale Motions (> 5Å, to 10 4 s) – Helix coil transitions – Dissociation/Association – Folding and Unfolding
Molecular dynamics/molecular modelling ● Molecular mechanics ● Normal mode analysis ● Quantum mechanical simulations ●...
Molecular mechanics ● Atom representation – sphere – charge – topology ● Forces – Bonded interactions – Non-bonded interactions ● Electrostatic interactions ● Van-der-Waals interactions – Forcefields: AMBER, GROMOS,... ● Newton's law of mechanics
Molecular mechanics ● Molecular mechanics simulations take long! – because of the size of the system ● Proteins are large ● Water molecules to consider solvent effects ● to millions of atoms – because of the number of iterations ● update atom positions according to time-scale of fastest fluctuations: bond vibrations ca. 1 fs ● movements of interest frequently have long time-scale, e.g. folding ● 1s => iterations!
Benefit of simulations ● Result is an ensemble of structures – Time-averaged statistical quantities – e.g., relative free energies of different conformations ● Protein engineering – e.g., relative free energies of different mutants ● Physical accuracy of models? – chemical reactions? – cutoff and long-range interactions? – dielectric constant? movie from: C. Letner, G. Alter Journal of Molecular Structure (Theochem) 368 (1996) 205–212
The end Proteins are beautiful!