Download presentation
Presentation is loading. Please wait.
Published bySharon Richard Modified over 9 years ago
1
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary
2
Lecture 12 CS5662 Motivation Holy Grail: Mapping between sequence and structure. Structure = F(Sequence). What is F? Why –Structure dictates chemistry, thermodynamics and therefore function –Not all structures can be (need be?) determined experimentally Cost Experimental limitations
3
Lecture 12 CS5663 Concepts – Prediction spectrum Decreasing reliance on known structures Homology Modeling Threading ab initioQuantum Mechanics
4
Lecture 12 CS5664 Concepts - Common Principles Constraints to reduce search space Consideration of many alternate conformations –Protein backbone dihedral angles (‘Twists along axis of protein’) –Amino-acid geometry (‘Amino-acids can have more than one shape’) Method for local optimization Scoring function to compare conformations
5
Lecture 12 CS5665 Evaluation of quality of prediction RMSD comparison with experimentally known structure Comparison with crystal structure quality criteria –Ramachandran Plot Residue specific dihedral angle distribution CASP (Critical assessment of structure prediction) and CAFASP (..Fully Automated..) competitions
6
Lecture 12 CS5666 Methods Knowledge-based constraints of search space –Homology Modeling –Threading –ab initio (Based on knowledge primitives: not true ab initio) Approaches to refinement –Quantum mechanics (ab initio) Based on quantum mechanical model of elementary particles Unscalable –Molecular mechanics Uses parametric Force Fields (Newton’s laws, Hooke’s law, …) Typically used for local or constrained global optimization Molecular Dynamics or Monte Carlo-based
7
Lecture 12 CS5667 Homology modeling Homology –Based on sequence-sequence similarity ( > ~25%, the higher, the better) –Steps Pair-wise local sequence similarity to identify related structures (possible templates) Refine alignment by global pair-wise sequence similarity and msa Overlay sequence backbone (N-C-C) on template Model loops based on –Statistical knowledge from databases of known structures –Molecular mechanics Model side-chains (approach similar to that of loops) Molecular mechanical unconstrained local optimization Pray for a good solution!
8
Lecture 12 CS5668 Threading Based on sequence-structure similarity Concept –Residues in core adopt fewer conformations than surface Approach –Thread sequence through all known structures –Score match with core of each structure based on Environmental scoring matrices and/or Amino acid neighborhood matrices (a la Dot matrix) –Refine structure using molecular mechanics based on best template(s)
9
Lecture 12 CS5669 Rosetta (“ab initio”) Approach Pioneered by David Baker’s group in the late 1990s Remarkable success in CASP and CAFASP experiments Recently made publicly available on an automated server by Christopher Bystroff’s group Pot pourri of many different approaches Key components –‘Divide and conquer’ strategy with respect to length of sequence to be modeled –Use of knowledge based energy function
10
Lecture 12 CS56610 ‘Divide and conquer’ Mimics natural process of protein folding Compromise between extremes of –Looking for homologous sequences with known structure –Modeling a priori (one amino acid at a time) Use library of 3D structures of fragments of length 3 and 9 derived from the crystal structure database (a priori estimates = 8K and ~ 10 12 ). Break up query sequence into a set of 3mers and 9mers, to find matches with above library – using a sequence profile approach
11
Lecture 12 CS56611 ‘Divide and conquer’ Once matches found, reduces to combinatorial problem of selecting best set of fragments with most energetically favorable structure In practice, Monte Carlo based search of possible combinations is carried out.
12
Lecture 12 CS56612 Knowledge based energy function Fundamentally, ∆G = ∆H - T ∆S Free energy is the enthalpy less an entropic term that is proportional to temperature Entropy is proportional to the natural log of the number of conformations/possible states S = K ln W
13
Lecture 12 CS56613 Knowledge based energy function Hence makes sense to use existing distribution of structures to derive energy function Energy function is based on taking statistical distribution of 3D shapes in database of known structures as the underlying probability distribution For a given structure, deviations from probability distribution are subject to proportional energetic penalties
14
Lecture 12 CS56614 Rosetta – Steps used in CASP4 1.If possible, use PSI-BLAST to find similar sequences A.If found, use the multiple sequence alignment to break down sequence into domains to be modeled independently B.For domains with similarity to known structures, use Homology based approach C.For remaining domains, carry out Rosetta
15
Lecture 12 CS56615 Rosetta - Steps 2.For domains with similarity to other sequences, apply following steps to the homologs as well (consensus modeling) 3.Generate fragment library for each query A.Collect 3mer and 9mer sub-structures from the PDB with similarity to 3mer and 9mer subsequences 4.Use Monte Carlo approach for backbone fragment substitution into query A.Pick a fragment at random from library (~40,000 fragment substitutions for each structure) B.Repeat A several times C.Between 10K and 100K conformations (‘decoys’) generated for each target
16
Lecture 12 CS56616 Rosetta - Steps 5.Filter set of conformations to remove unlikely structures A.Remove structures with minimal long range interactions (low contact order) B.Remove structures with unrealistic strands 6.Add side chains as statistically predicted by the backbone conformation 7.Cluster set of conformations (including, when available, the generated structures of homologues) 8.Representative structures from the top 5 most-populous clusters are candidate structures
17
Lecture 12 CS56617 Summary Methods like Rosetta represents a breakthrough in the ab initio prediction of protein 3D structure and are very useful in cases where homology cannot be observed For CASP4, at least one subsequence longer than 50 residues could be predicted ‘correctly’ (< 6.5 rmsd) in 17 of 21 cases Combination of various approaches works best
18
Lecture 12 CS56618 Summary However, both completeness and accuracy of prediction leave ample room for improvement –RMS error frequently too high to be useful –Even in homology modeling, template per se is often better match! –Often, only subsequences are accurately modeled, and not the whole structure –The Nobel Prize is still up for grabs!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.