CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555 University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu. www.cse.sc.edu

Outline Experimental limitation of protein structure determination Tertiary Structure Prediction ◦ AB initio ◦ Homology modeling ◦ Threading

Experimental Protein Structure Determination High-resolution structure determination ◦ X-ray crystallography (<1A  ) ◦ Nuclear magnetic resonance (NMR) (~1-2.5A  ) Lower-resolution structure determination ◦ Cryo-EM (electron-microscropy) ~10-15A  Theoretical Models? ◦ Highly variable - but a few equiv to X-ray!

Tertiary Structure Prediction Fold or tertiary structure prediction problem can be formulated as a search for minimum energy conformation ◦ Search space is defined by psi/phi angles of backbone and side- chain rotamers ◦ Search space is enormous even for small proteins! ◦ Number of local minima increases exponentially with number of residues Computationally it is an exceedingly difficult problem!

Levinthal Paradox of Protein Folding: How nature does search? We assume that there are three conformations for each amino acid (ex. α -helix, β -sheet and random coil). If a protein is made up of 100 amino acid residues, a total number of conformations is 3 100 = 515377520732011331036461129765621272702107522001 ≒ 5 x 10 47. If 100 psec (10 -10 sec) were required to convert from a conformation to another one, a random search of all conformations would require 5 x 10 47 x 10 -10 sec ≒ 1.6 x 10 30 years. However, folding of proteins takes place in msec to sec order. Therefore, proteins fold not via a random search but a more sophisticated search process. We want to watch the folding process of a protein using molecular simulation techniques.

Steps in Protein Folding 1- "Collapse"- driving force is burial of hydrophobic aa’s (fast - msecs) 2- Molten globule - helices & sheets form, but "loose" (slow - secs) 3- "Final" native folded state - compaction, some 2' structures rearranged Native state? - assumed to be lowest free energy - may be an ensemble of structures

7 Protein Folding Funnel Local mimina Global minimum Native Structure

Protein Structure Prediction Ab initio ◦ Use just first principles: energy, geometry, and kinematics Homology ◦ Find the best match to a database of sequences with known 3D-structure Combinations Threading Meta-servers and other methods Knowledge based approaches

9 Ab Initio Prediction Basic idea Anfinsen’s theory: Protein native structure corresponds to the state with the lowest free energy of the protein-solvent system. General procedures ◦ Develop a Potential/Energy function  Evaluate the energy of protein conformation  Select native structure ◦ Conformational search algorithm  To produce new conformations  Search the potential energy surface and locate the global minimum (native conformation) Provides both folding pathway & folded structure Can only apply to very small proteins

10 Potential Functions for PSP Potential function ◦ Physical based energy function Empirical all-atom forcefields: CHARMM, AMBER, ECEPP-3, GROMOS, OPLS Parameterization: Quantum mechanical calculations, experimental data Simplified potential: UNRES (united residue) ◦ Solvation energy  Implicit solvation model: Generalized Born (GB) model, surface area based model  Explicit solvation model: TIP3P (computationally expensive)

11 General Form of All-atom Forcefields Electrostatic termH-bonding term Van der Waals term Bond stretching term Dihedral termAngle bending term r Φ Θ ＋ー O H r r r The most time demanding part.

12 Search Potential Energy Surface We are interested in minimum points on Potential Energy Surface (PES) Conformational search techniques Energy Minimization Monte Carlo Molecular Dynamics Others: Genetic Algorithm, Simulated Annealing

13 Energy Minimization Energy minimization Methods First-order minimization: Steepest descent, Conjugate gradient minimization Second derivative methods: Newton-Raphson method Quasi-Newton methods: L-BFGS Local miminum

14 Monte Carlo In molecular simulations, ‘Monte Carlo’ is an importance sampling technique. 1. Make random move and produce a new conformation 2. Calculate the energy change  E for the new conformation 3. Accept or reject the move based on the Metropolis criterion Boltzmann factor If  E 1, accept new conformation; Otherwise: P>rand(0,1), accept, else reject.

Ab initio Prediction – CASP results

Comparative Modeling (Knowledge based approach) Provide folded structure only Two primary methods 1) Homology modeling 2) Threading (fold recognition) Both rely on availability of experimentally determined structures that are "homologous" or at least structurally very similar to target

Homology Modeling 1. Identify homologous protein sequences (  -BLAST) 2. Among available structures, choose the one with closest sequence match to target as template (can combine steps 1 & 2 by using PDB-BLAST) 3. Build model by placing residues in corresponding positions of homologous structure & refine by "tweaking"  Homology modeling - works "well" Computationally? not very expensive Accuracy? higher sequence identity  better model  Requires ~30% sequence identity with sequence for which structure is known

Homology-based Prediction Raw model Loop modeling Side chain placement Refinement

Homology-based Prediction

Threading - Fold Recognition Identify “best” fit between target sequence & template structure  Threading - works "sometimes" Computationally? Can be expensive or cheap, depends on energy function & whether "all atom" or "backbone only" threading Accuracy? in theory, should not depend on sequence identity (should depend on quality of template library & "luck")  Usually, higher sequence identity to protein of known structure  better model

Threading Algorithm for PSP Database of 3D structures and sequences ◦ Protein Data Bank (or non-redundant subset) Query sequence ◦ Sequence < 25% identity to known structures Alignment protocol ◦ Dynamic programming Evaluation protocol ◦ Distance-based potential or secondary structure Ranking protocol 3.3b 21

Threading Basic premise: Statistics from Protein Data Bank (~40,000 structures) Thus, chances for a protein to have a native-like structural fold in PDB are quite good ◦ Note: Proteins with similar structural folds could be either homologs or analogs The number of unique structural folds in nature is fairly small (probably 2000-3000) Until very recently, 90% of new structures submitted to PDB had similar structural folds in PDB

1.Align target sequence with template structures (fold library) from the Protein Data Bank (PDB) 2.Calculate energy score to evaluate goodness of fit between target sequence & template structure 3.Rank models based on energy scores Target Sequence Structure Templates ALKKGF…HFDTSE Steps in Threading

Threading Issues Structure database - must be complete: no decent model if no good template in library! Sequence-structure alignment algorithm: Bad alignment  Bad score! Energy function (scoring scheme):  must distinguish correct sequence-fold alignment from incorrect sequence-fold alignments  must distinguish “correct” fold from close decoys Prediction reliability assessment - How determine whether predicted structure is correct? (or even close?) Find “correct” sequence-structure alignment of a target sequence with its native-like fold in PDB

Threading: Template database Build a database of structural templates (eg, ASTRAL domain library derived from the PDB) Supplement with additional decoys, e.g., generated using ab initio approach such as Rosetta (Baker)

Threading: Energy function Two main methods (and combinations of these)  Structural profile (environmental) physico-chemical properties of aa’s  Contact potential (statistical) based on contact statistics from PDB Miyazawa & Jernigan (ISU)

Protein Threading: Typical energy function How well does a specific residue fit structural environment? What is "probability" that two specific residues are in contact? Alignment gap penalty? Total energy: E p + E s + E g Goal: Find a sequence-structure alignment that minimizes the energy function

CAFASP GOAL The goal of CAFASP is to evaluate the performance of fully automatic structure prediction servers available to the community. In contrast to the normal CASP procedure, CAFASP aims to answer the question of how well servers do without any intervention of experts, i.e. how well ANY user using only automated methods can predict protein structure. CAFASP assesses the performance of methods without the user intervention allowed in CASP.

Performance Evaluation in CAFASP3 Servers (54 in total) Sum MaxSub Score # correct (30 FR targets) 3ds5 robetta5.17-5.2515-17 pmod 3ds3 pmode34.21-4.3613-14 RAPTOR3.9813 shgu3.9313 3dsn3.64-3.9012-13 pcons33.7512 fugu3 orf_c3.38-3.6711-12 ……… pdbblast0.000 (http://ww.cs.bgu.ac.il/~dfischer/CAFASP3, released in December, 2002.)http://ww.cs.bgu.ac.il/~dfischer/CAFASP3 Servers with name in italic are meta servers MaxSub score ranges from 0 to 1 Therefore, maximum total score is 30

One structure where RAPTOR did best Red: true structure Blue: correct part of prediction Green: wrong part of prediction Target Size:144 Super-imposable size within 5A: 118 RMSD:1.9

Some more results by other programs

Summary of current state of the art

Automated Web-Based Homology Modeling  SWISS Model : http://www.expasy.org/swissmod/SWISS- MODEL.htmlhttp://www.expasy.org/swissmod/SWISS- MODEL.html  WHAT IF : http://www.cmbi.kun.nl/swift/servers/http://www.cmbi.kun.nl/swift/servers/  The CPHModels Server : http://www.cbs.dtu.dk/services/CPHmodels/ http://www.cbs.dtu.dk/services/CPHmodels/  3D Jigsaw : http://www.bmm.icnet.uk/~3djigsaw/http://www.bmm.icnet.uk/~3djigsaw/  SDSC1 : http://cl.sdsc.edu/hm.htmlhttp://cl.sdsc.edu/hm.html  EsyPred3D : http://www.fundp.ac.be/urbm/bioinfo/esypred/

Comparative Modeling Server & Program  COMPOSER http://www.tripos.com/sciTech/inSilicoDisc/bioInformatics/matchma ker.html http://www.tripos.com/sciTech/inSilicoDisc/bioInformatics/matchma ker.html  MODELER http://salilab.org/modelerhttp://salilab.org/modeler  InsightII http://www.msi.com/http://www.msi.com/  SYBYL http://www.tripos.com/

CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Similar presentations

Presentation on theme: "CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Similar presentations

Presentation on theme: "CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:"— Presentation transcript:

Similar presentations

About project

Feedback