4. Modeling of side chains 1. Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space.

Slides:



Advertisements
Similar presentations
Time averages and ensemble averages
Advertisements

Bayesian Belief Propagation
Probabilistic Roadmaps. The complexity of the robot’s free space is overwhelming.
Monte Carlo Methods and Statistical Physics
Exact Inference in Bayes Nets
ROTAMER OPTIMIZATION FOR PROTEIN DESIGN THROUGH MAP ESTIMATION AND PROBLEM-SIZE REDUCTION Hong, Lippow, Tidor, Lozano-Perez. JCC Presented by Kyle.
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Crystallography -- lecture 21 Sidechain chi angles Rotamers Dead End Elimination Theorem Sidechain chi angles Rotamers Dead End Elimination Theorem.
Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Development of SCWRL4 for improved prediction of protein side-chain conformations In collaboration with Moscow Engineering & Physics Institute © George.
Rapid Protein Side-Chain Packing via Tree Decomposition Jinbo Xu Department of Mathematics Computer Science and AI Lab MIT.
Optimization methods Morten Nielsen Department of Systems biology, DTU.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
A COMPLEX NETWORK APPROACH TO FOLLOWING THE PATH OF ENERGY IN PROTEIN CONFORMATIONAL CHANGES Del Jackson CS 790G Complex Networks
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Mutual Information Mathematical Biology Seminar
Taking a Numeric Path Idan Szpektor. The Input A partial description of a molecule: The atoms The bonds The bonds lengths and angles Spatial constraints.
“Inverse Kinematics” The Loop Closure Problem in Biology Barak Raveh Dan Halperin Course in Structural Bioinformatics Spring 2006.
Graphical Models for Protein Kinetics Nina Singhal CS374 Presentation Nov. 1, 2005.
Thomas Blicher Center for Biological Sequence Analysis
Homology Modeling comparative modeling vs. ab initio folding alignment (check gaps) threading loop building re-packing side-chains in core, DEE, SCWRL.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Protein Side Chain Packing Problem: A Maximum Edge-Weight Clique Algorithmic Approach Dukka Bahadur K.C, Tatsuya Akutsu and Tomokazu Seki Proceedings of.
Ensemble Learning (2), Tree and Forest
Elements of the Heuristic Approach
A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry.
Bioinf. Data Analysis & Tools Molecular Simulations & Sampling Techniques117 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling.
Empirical energy function Summarizing some points about typical MM force field In principle, for a given new molecule, all force field parameters need.
Monte Carlo Simulation of Interacting Electron Models by a New Determinant Approach Mucheng Zhang (Under the direction of Robert W. Robinson and Heinz-Bernd.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
COMPARATIVE or HOMOLOGY MODELING
Conformational Sampling
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)
Secondary structure prediction
2. Introduction to Rosetta and structural modeling (From Ora Schueler-Furman) Approaches for structural modeling of proteins The Rosetta framework and.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Conformational Entropy Entropy is an essential component in ΔG and must be considered in order to model many chemical processes, including protein folding,
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Structure prediction: Homology modeling
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Bioinformatics 2 -- lecture 9
Solving and Analyzing Side-Chain Positioning Problems Using Linear and Integer Programming Carleton L. Kingsford, Bernard Chazelle and Mona Singh Bioinformatics.
Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
In silico Protein Design: Implementing Dead-End Elimination algorithm
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
4. Modeling of side chains
Computational Structure Prediction
Subject Name: Operation Research Subject Code: 10CS661 Prepared By:Mrs
A Smoothed Backbone-Dependent Rotamer Library for Proteins Derived from Adaptive Kernel Density Estimates and Regressions  Maxim V. Shapovalov, Roland L.
Volume 25, Issue 11, Pages e3 (November 2017)
Protein structure prediction.
Boltzmann Machine (BM) (§6.4)
Conformational Search
Protein structure prediction
Presentation transcript:

4. Modeling of side chains 1

Protein Structure Prediction: – given: sequence of protein – predict: structure of protein Challenges: – conformation space goal: describe continuous, immense space of conformations in an efficient and representative way – realistic energy function goal: energy minimum at or near experimentally derived structure (native) – efficient and reliable search algorithm goal: locate minimum (global minimum energy conformation GMEC) Prediction of side chain conformations: – subtask of protein structure prediction Side chain modeling is part of structure prediction 2

The importance of side chain modeling Side chain prediction subtask of protein structure prediction given: correct backbone conformation predict: side chain conformations (i.e. whole protein) successful prediction of protein structure depends on successful prediction of the side chain conformations complete details not solved by experiment allows evaluation of protocol at detailed, full-atom level allows flexibility in docking 3

Prediction of side chain conformations 1.rotamer libraries 2.dependence on backbone accuracy 3.approaches that locate GMEC or MECs Rosetta & other approaches DEE - Dead end elimination, SCWRL, PB - Belief propagation, LP -Linear integer programming Today’s menu 4

Side chains are described as rotamers Dihedral angles     define side chain (assuming equilibrium bond and angle values) From wikipedia 5

Serine  1 preferences t=180 o g - =-60 o g + =+60 o Side chains assume discrete conformations Staggered conformations minimize collision with neighboring atoms Lovell,

Rotamer: discrete side chain conformation defined by     Rotamer libraries contain preferred conformations Dunbrack, 2002 Shapovalov and Dunbrack* 2011 BBDEP * Shapovalov & Dunbrack, Structure

Ponder & Richards, 1987: Analysis of ~20 proteins (~2000 side chains) 67 rotamers can adequately represent side chain conformations (for 17/20aa) Representative rotamer libraries are surprisingly small 8

Dunbrack & Karplus, 1993: For each  (20 o x20 o ) bin, derive statistics on     values Reflects dependence of side chain conformation on backbone conformation Backbone dependent rotamer libraries   9

Observed frequency of gauche +, gauche - + trans is very different in different backbone conformations sheet, helix, and coil regions (n=850 proteins, <1.7 Å resolution, and pair-wise seqid < 50%) Rotamer preferences depend on backbone conformation: example Valine 10

use Bayesian statistics to estimate populations for all rotamers, of all side chain types, for each  (10 o x10 o ) bin P(          Bayesian statistical analysis of rotamer library Dunbrack 1997 using Bayesian formalism, combine prior distribution based on P(  *P  ) fully  dependent data … to describe both well-sampled regions sparsely sampled regions 11

Rotamer energy (E dun ): a knowledge-based score 1.Calculate p obs : frequencies of rotamers (or any other feature) 2.Convert into effective potential energy using Boltzmann equation Boas & Harbury, 2007  G = -RTln (p obs /p exp ) 12

Structure determination revisited Refit electron density maps 15% of non-rotameric side chains can be refitted to 1 (or 2) rotameric conformations 13 (Shapovalov & Dunbrack, 2007)

Refit electron density maps Rotameric side chains have lower entropy (dispersion of electron density around  ) than side chains with multiple conformations in pdb, or non-rotameric side chains Structure determination revisited Residue type  1 entropy 14 (Shapovalov & Dunbrack, 2007)

Many good reasons: 1.More structural data 2.Improved set: Electron density calculations - remove highly dynamic side chains 3.Derive accurate and smooth density estimates of rotamer populations (incl. rare rotamers) as continuous function of backbone dihedral angles 4.Derive smooth estimates of the mean values and variances of rotameric side-chain dihedral angles 5.Improve treatment of non-rotameric degrees of freedom 2011: Improved Dunbrack library 15 Shapovalov & Dunbrack, 2011

Calculate rotamer preference for given  bin: Adaptive Kernel density estimation allows: – smoother density function (prevents steep derivatives in Rosetta minimizations!) – more detailed binning The 2011 Dunbrack library 1.For each rotamer r of aa: determine a probability density estimate r  |r) (= Ramachandran distribution for each rotamer) 2.Use Bayes’ rule to invert this density to produce an estimate of the rotamer probability P(r): backbone independent probability of rotamer r 16

Smoother density function P(r = g+| , aa = Ser) histogram Original probability density Using adaptive density kernels (integrate over neighborhood of adaptive size) 17

Not all side chain atoms show rotameric distribution Better description of non-rotameric side chains Original library Met  1 SP3 Met  1 SP3 Gln  3 SP2 Gln  3 SP2 Example: GLN  3 angles for  1 =g+;  2 =t)  New library 18 Alpha helixBeta sheet Loops (polyP II)

Better description of non-rotameric side chains Example: ASN  2 angles for  1 =g+)

…. Leads to slight improvement in modeling 20

Rotamer frequency: rare conformations reflect increased internal strain – important to take frequency into account frequency can be used as energy term: E i = -K ln P i Increasing availability of high-resolution structures narrows distribution around rotamer in library Indicates that errors are responsible for outliers Refitting of electron density maps non-rotameric conformations often incorrectly modeled and high in entropy Some conclusions about rotamer libraries 21

Rotamericity <100%: Include more side chain conformations! – Position-dependent rotamers (example: unbound conformations in docking predictions) – Additional conformations around rotamer (± sd) – Non-rotameric side chain angles: describe as continuous density function Some conclusions about rotamer libraries 22

Prediction of side chain conformations 1.rotamer libraries 2.dependence on backbone accuracy 3.approaches that locate GMEC or MECs Rosetta & other approaches DEE - Dead end elimination, SCWRL, PB - Belief propagation, LP -Linear integer programming Today’s menu 23

Most common local backbone move in ultra-high resolution structures (<1.0Å) Changes side chain orientation without effect on backbone 3 rotations around C  -C  axes In 3% of all residues (1/4=Serine) Two distinct rotamers related by backrub moves for Ile (tt,mm) Backrub Motions: “How protein backbone shrugs when side chain dances” 24 Change of  1,3 Compensatory changes of  1,2 and 2,3 Davis, 2006

Prediction of side chain conformations 1.rotamer libraries 2.dependence on backbone accuracy 3.approaches that locate GMEC or MECs Rosetta & other approaches DEE - Dead end elimination, SCWRL, PB - Belief propagation, LP -Linear integer programming Today’s menu 25

Prediction of side chain conformations using rotamers Given: – protein backbone – for each residue: set of possible conformations (rotamers from library) Wanted: Combination of rotamers that results in lowest total energy GMEC = min (  E ir +  E irjs ) location of GMEC is NP-hard (Fraenkel, 1997; Pierce, 2002) i i+1 i+2 i i+1 i+2 Self energy Pair energy 26

Side chain modeling = find best combination of rotamers How? 1.systematic scan for a protein with – 50 residue, and – 9 rotamers/residue number of combinations to scan: N=50 9 ~ !  feasible only for small proteins  search space needs to be reduced i i+1 i+2 Pos…iaia ibib … … jaja e ia,ja e ib,ja jbjb e ia,jb e ib,jb …. E tot =  i E i +  i,j E ij iaia ibib icic 27

Deterministic Approaches (e.g. DEE): – Guarantee location of GMEC – Can be slow – Advantageous when GMEC is (the only) near- native conformation Heuristic Approaches (e.g. MC): – Locate Population of low-energy models (not necessarily GMEC) – Faster, often converge Search strategies for locating GMEC or MECs 28

DEE (Dead-end elimination): – prune impossible rotamers, determine GMEC from reduced rotamer set Residue-interacting graphs (SCWRL) – use dynamic programming on graph to find GMEC – start with “leafs”: residues with low connectivity in graph Linear Programming (Kingsford) – solve set of linear constraints – can locate GMEC for sparsely connected graphs – dependent on energy function Guaranteed finding of GMEC 29

Approach: remove rotamers that cannot be part of the GMEC Rotamer r at position i can be eliminated if there exists a rotamer t such that: Iterative application of DEE removes many rotamers, at certain positions only one rotamer is left (Note that some rotamers can be removed from the beginning because they clash with the backbone - too high E it ) Dead End Elimination (DEE) r t E Combinations of rotamers at positions j≠i 30 Desmet & Lasters, 1992

Approach: remove rotamers that cannot be part of the GMEC, second criterion: Rotamer r at position i can be eliminated if there exists a rotamer t such that: This criterion allows removing of additional rotamers Refined DEE r t E Combinations of rotamers at positions j≠i 31 Goldstein, 1994

Approach: remove rotamers that cannot be part of the GMEC - additional criterion: Rotamer r at position i can be eliminated if there exists rotamers t 1 and t 2 such that: takes more time to compute at the end, we are left with 1 combination, or with a few combinations only, that need to be evaluated using other criteria More sophisticated DEE criteria…. r t1t2t1t2 E Combinations of rotamers at positions j≠i 32

DEE guarantees to find GMEC… … but may miss conformations that have only slightly worse energy Given that the energy function is not perfect, we want to find also additional conformations with comparable energy Approach used in Orbit: use MC to find additional low-energy combinations that resemble GMEC DEE-based approaches 33

Local sampling starting from GMEC reveals conservation pattern of designs Alignment with zif268 second finger Alignment with zif268 second finger Conservation across 1000 simulations Conservation across 1000 simulations Ranking of predicted sequences sequences Design of a sequence that adopts a zinc finger fold without zinc 34 Dahiyat & Mayo (1997)

SCWRL - residue-interacting graphs DEE - remain with residues with > 1 rotamer: “active residues” undirected graph of active residues: – side chains = vertices – interacting rotamer pairs: connected by edge identify – articulation points (break cluster apart) & – bi-connected components (cannot be broken into different parts by removing one node) Very simple energy function: only dunbrack energy and repulsion 35 Canutescu, 2003

SCWRL - residue-interacting graphs Solve a cluster using bi-connected components For each, calculate best energy given specific rotamer in bi- connected residue Pruning is easy since energy function only positive [Backtracking: when certain threshold is used, a specific rotamer (combination) can be deleted] 36 Canutescu, 2003

Define cutoff values to prune branches that probably do not contain low-energy conformations Mean-field approach, Belief Propagation Self-consistent algorithms Monte-Carlo sampling Heuristic approaches 37

Side chain optimization Rigid body minimization Random perturbation MC Sc modeling in Rosetta: part of a cycle START Random perturbation Side chain optimization Rigid body minimization FINISH Energy Rigid body orientations rigid body optimization backbone optimization 38

Side chain modeling protocols in Rosetta Monte-Carlo procedure: heuristic does not converge – several runs needed to locate solution use backbone-dependent rotamer library (Dunbrack) approaches “Repacking” – model side chain conformation from scratch “Rotamer Trial” – refine side chain conformations “Rotamer Trial with minimization” (RTmin) – off-rotamer sampling by minimization 39

Monte Carlo sampling pre-calculate E ir and E irjt matrix Self energy: Energy between rotamer r at position i with constant part Pairwise energy: between rotamer r at position i and rotamer t at position j (sparse matrix) E total =  i E ir +  i  j E irjt simulated annealing make random change start with high acceptance rate, gradually lower temperature acceptance based on Boltzmann distribution 40

“Repacking”: full combinatorial side chain optimization remove all side chains gradually add side chains: select from backbone-dependent rotamer library add position-specific rotamers (e.g. from unbound conformation): set their energy to minimum rotamer energy, to ensure acceptance use simulated annealing to create increasingly well packed side chains repeat to sample range of low-energy conformations 41

“Rotamer trial”: side chain adjustment Find better rotamers for existing structure pick residue at random search for rotamer with lower energy replace rotamer Repeated until all high-energy positions are improved Fast 42

Side chain modeling based on rotamer libraries  Combinatorial problem Approaches for side chain modeling involve smart reduction of combinatorial complexity (heuristic or exact) Side chain modeling as a “toy model” for structural modeling Side chain modeling can be extended to Design by adding rotamer options of different amino acids Side chain modeling: Summary 43