Protein Structure Prediction and Protein Homology modeling Dinesh Gupta ICGEB, New Delhi 9/19/2018 8:51 PM
Why protein structures Identifying active and binding sites Determine protein’s mechanism (catalysis & interactions) Searching ligands for a binding site Understanding the molecular basis of diseases Designing mutants Drug design Mechanism - type 1 and 2 KdarinimDiseases - myosin 6/7- deafnessPredix Pharmaceuticals Holdings - GPCR ... Drugs Latzhiimer
Why protein structure prediction? There is a discrepancy in number of protein sequences and number of experimentally determined structures. Release 2017_01 of 18th-Jan-17 of UniProtKB/Swiss- Prot contains 5,53,474 sequence entries (www.uniprot.org) whereas only 40094 distinct protein structures in PDB database (www.rcsb.org, 24th-Jan-17). Experimental determination: slow, expensive- failed in certain cases! Hence methods which can reasonably predict protein structures are important. 9/19/2018 8:51 PM
9/19/2018 8:51 PM
Protein structure prediction Methods: Homology (comparative) modeling Threading Ab-initio 9/19/2018 8:51 PM
Protein Homology modeling Homology modeling is an extrapolation of protein structure for a target sequence using the known 3D structure of similar sequence as a template. Basis: proteins with similar sequences are likely to assume same folding Certain proteins with as low as 25% similarity have been observed to assume same 3D structure 9/19/2018 8:51 PM
The accuracy of modeling is proportional to the similarity in primary sequences 9/19/2018 8:51 PM
Homology modeling consists of the following steps Template recognition Database search using BLAST Sequence alignment Pair-wise Multiple alignment Model generation Model validation Statistical potentials Physics-based energy calculations
Template recognition The sequence of the target protein is first scanned against the sequences of known structures using pair wise sequence alignment program like BLAST or FASTA. However more sensitive methods based on multiple sequence alignment like PSI-BLAST are preferred for remote homologs. Selection of best template is decided by several factors like, high sequence identity (> 30%), better coverage of the aligned region, quality of the template structure etc..
Model generation Once an initial target-template alignment is built, a variety of methods are available to construct a 3D model. The most commonly used one is distance geometry and optimization techniques to satisfy spatial restraints obtained from template-target alignment (Modeller). Spatial restraints are extracted from two sources: first, homology-derived restraints on the distances and dihedral angles in the target sequence are extracted from its alignment with the template structures. Second, stereo chemical restraints such as bond length and bond angle preferences are obtained from the molecular mechanics force field of CHARMM-22
Model validation In general the homology model is susceptible to error and needs to be assessed for quality by either statistical potentials or physics-based energy calculations. Energy-based calculations checks if the bond-length and bond-angles are with normal ranges, and if there are lot of bumps in the model which corresponds to high vanderwaals energy (PROCHECK). Statistical potentials are empirical methods based on observed residue-residue contact frequencies among proteins of known structure in the PDB. They assign a probability or energy score to each possible pairwise interaction between amino acids and combine these pair wise interaction scores into a single score for the entire model (Prosa and DOPE).
Software for homology molecular modeling Molecular graphics: PyMol, RasMol, etc. Freeware: available for all OS Downloadable Modeller (Sali, 1998) https://salilab.org/modeller/ DeepView (SwissPDB viewer) http://spdbv.vital-it.ch YASARA (WHATIF) http://yasara.org Web based: SWISS MODEL server (https://swissmodel.expasy.org/interactive) CPH model server (http://www.cbs.dtu.dk/services/CPHmodels) Phyre2 (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index) 9/19/2018 8:51 PM
Common errors in homology modeling Errors in modeling side chains Errors in modeling sequence segments without a template (e.g., variable and/or loop regions) Errors due to selecting the wrong template for modeling 9/19/2018
9/19/2018 8:51 PM Petry, D. and Honig, B. Molecular Cell, Vol. 20, 811–819, December 22, 2005
9/19/2018 8:51 PM
9/19/2018 8:51 PM
9/19/2018
9/19/2018
9/19/2018
9/19/2018
9/19/2018
Databases of protein models 9/19/2018
9/19/2018
9/19/2018
Homology modeling summary Homology modeling methods can efficiently predict structure if the target protein has similarity to known folds The greater the sequence identity to template, the better is the quality of the model Homology modeling can yield high accuracy and high resolution atomic models With increasing numbers of unique structures and solved experimentally homology modeling methods should become more powerful 9/19/2018
Protein structure prediction Methods: Homology (comparative) modelling Threading Ab-initio 9/19/2018 8:51 PM
Threading Structure prediction method that picks up where homology modelling leaves off. Homology Modeling: Align sequence to sequence Threading: Align sequence to structure Threading is based on the fact that only limited number of protein folds are possible in nature. Threading recognizes folds in proteins having no similarity to known proteins structures Very approximate models Check by forcing a sequence of structure into known folds checking the packing of aa residues, including sides chains, in each fold. Scoring functions. 9/19/2018 8:51 PM
9/19/2018 8:51 PM
Threading software I-TASSER http://zhanglab.ccmb.med.umich.edu/I-TASSER/ RaptorX http://raptorx.uchicago.edu 9/19/2018 8:51 PM
9/19/2018 8:51 PM
Protein structure prediction Methods: Homology (comparative) modelling Threading Ab-initio 9/19/2018 8:51 PM
Ab initio structure prediction Still experimental Less accurate ROSETTA (David Baker) http://robetta.bakerlab.org/ 9/19/2018 8:51 PM
Energy minimization (Molecular Mechanics, MM) Heart of any protein modeling technique Energy minimization techniques try to minimize energies of molecules with the help of equations also called force fields which represent energies of molecular systems Assumption: Molecular systems try to achieve lowest energies in equilibrium. MM could be used to calculate large scale conformational changes over long periods of time, but currently computationally infeasible. 9/19/2018 8:51 PM
What do Force Fields represent? How do atoms stretch, vibrate, rotate, etc.? Represent the constraints on atomic motion (e.g. van der Waals, electrostatic, bonds, etc.) Must also represent solvation effects etc. Quantum solutions exist, but are too complex to calculate for such large systems Empirical (approximate) energy functions must be used. No single best function exists. 9/19/2018 8:51 PM