In silico Protein Design: Implementing Dead-End Elimination algorithm

Slides:



Advertisements
Similar presentations
Homework 2 (due We, Feb. 5): Reading: Van Holde, Chapter 1 Van Holde Chapter 3.1 to 3.3 Van Holde Chapter 2 (we’ll go through Chapters 1 and 3 first. 1.Van.
Advertisements

Lecture 14: Special interactions. What did we cover in the last lecture? Restricted motion of molecules near a surface results in a repulsive force which.
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Thermodynamics of Protein Folding
Protein Structure – Part-2 Pauling Rules The bond lengths and bond angles should be distorted as little as possible. No two atoms should approach one another.
Crystallography -- lecture 21 Sidechain chi angles Rotamers Dead End Elimination Theorem Sidechain chi angles Rotamers Dead End Elimination Theorem.
Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
The Calculation of Enthalpy and Entropy Differences??? (Housekeeping Details for the Calculation of Free Energy Differences) first edition: p
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Forces inter-atomic interactions hydrophobic effect – driving force
Protein Tertiary Structure Prediction. Protein Structure Prediction & Alignment Protein structure Secondary structure Tertiary structure Structure prediction.
Protein Primer. Outline n Protein representations n Structure of Proteins Structure of Proteins –Primary: amino acid sequence –Secondary:  -helices &
Energetics and kinetics of protein folding. Comparison to other self-assembling systems?
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
Protein Basics Protein function Protein structure –Primary Amino acids Linkage Protein conformation framework –Dihedral angles –Ramachandran plots Sequence.
The Geometry of Biomolecular Solvation 1. Hydrophobicity Patrice Koehl Computer Science and Genome Center
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Bioinf. Data Analysis & Tools Molecular Simulations & Sampling Techniques117 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
What are proteins? Proteins are important; e.g. for catalyzing and regulating biochemical reactions, transporting molecules, … Linear polymer chain composed.
COMPARATIVE or HOMOLOGY MODELING
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Proteins: Secondary Structure Alpha Helix
Proteins. Proteins? What is its How does it How is its How does it How is it Where is it What are its.
02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC.
Representations of Molecular Structure: Bonds Only.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I Prof. Corey O’Hern Department of Mechanical Engineering & Materials.
Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.
Rerun of essentials of week 1-3 Protein structure analysis, comparison, and prediction.
Department of Mechanical Engineering
Doug Raiford Lesson 19.  Framework model  Secondary structure first  Assemble secondary structure segments  Hydrophobic collapse  Molten: compact.
Conformational Entropy Entropy is an essential component in ΔG and must be considered in order to model many chemical processes, including protein folding,
INTERACTIONS IN PROTEINS AND THEIR ROLE IN STRUCTURE FORMATION.
Molecular Mechanics Studies involving covalent interactions (enzyme reaction): quantum mechanics; extremely slow Studies involving noncovalent interactions.
Altman et al. JACS 2008, Presented By Swati Jain.
Covalent interactions non-covalent interactions + = structural stability of (bio)polymers in the operative molecular environment 1 Energy, entropy and.
Structure prediction: Homology modeling
Homework 2 (due We, Feb. 1): Reading: Van Holde, Chapter 1 Van Holde Chapter 3.1 to 3.3 Van Holde Chapter 2 (we’ll go through Chapters 1 and 3 first. 1.Van.
Lecture 5 Barometric formula and the Boltzmann equation (continued) Notions on Entropy and Free Energy Intermolecular interactions: Electrostatics.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Chapter 4.1: Overview of Protein Structure CHEM 7784 Biochemistry Professor Bensley.
Interacting Molecules in a Dense Fluid
Bioinformatics 2 -- lecture 9
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Solving and Analyzing Side-Chain Positioning Problems Using Linear and Integer Programming Carleton L. Kingsford, Bernard Chazelle and Mona Singh Bioinformatics.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Mean Field Theory and Mutually Orthogonal Latin Squares in Peptide Structure Prediction N. Gautham Department of Crystallography and Biophysics University.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Protein Structures from A Statistical Perspective Jinfeng Zhang Department of Statistics Florida State University.
Elon Yariv Graduate student in Prof. Nir Ben-Tal’s lab Department of Biochemistry and Molecular Biology, Tel Aviv University.
Computational Structure Prediction
Introduction-2 Important molecular interactions in Biomolecules
Protein Structure Prediction and Protein Homology modeling
Enzyme Kinetics & Protein Folding 9/7/2004
Protein structure prediction.
Analysis of crystal structures
Conformational Search
Accounting for side-chain flexibility in protein-ligand docking: 3D Interaction Homology as an approach of quantifying side-chain flexibility of Tyrosine.
Kristen E. Norman, Hugh Nymeyer  Biophysical Journal 
Christian X. Weichenberger, Manfred J. Sippl  Structure 
Presentation transcript:

In silico Protein Design: Implementing Dead-End Elimination algorithm CS273 Tyrone Anderson, Yu Bai & Caroline E. Moore-Kochlacs May 31st 2005

Computational protein design Backbone scaffold New sequence Iterative refinement Native structure Given backbone coordinates, find the best sequence(s) with which the protein is stable.

Components of the problem The protein design problem can be roughly divided into searching procedure and scoring function. The searching procedure samples the sequence space AND side-chain conformational space to create conformations. The scoring function evaluates each conformation created by the searching procedure. The evaluation scores are used to rank the conformations (and therefore the sequences) and pick the best one to be the final model.

Why is searching procedure difficult ? Consider a short protein with 20 amino acids. Possible sequence: S = 2020 ~1026 Each side chain has on average 2 dihedral angles (χ angles). Assuming that we will sample every 40º in the dihedral angle space, N = (360/40)(202) ~1038 This number S*N is too large to be naively sampled Algorithms that find good solutions by screening only parts of the search space are needed

Rotamer libraries Already in the 70s, Janin et al. showed that different side chain conformations are not found in equal distribution over the dihedral angle space but tend to cluster at specific regions of the space, much as in the Ramachandran plot. In the 80’s, this observation was used to improve modeling of side chain conformations. MAY BE SKIP Today, essentially all programs that model side chain conformations use rotamer libraries.

What do rotamer libraries provide? Rotamer libraries reduce significantly the number of conformations that need to be evaluated during the search. This is done with almost no risk of missing the real conformations. Even small libraries of about 100-150 rotamers cover about 96-97% of the conformations actually found in protein structures. The probabilities of each rotamer in the library can be applied to estimate the potential energy due to interactions within the side chain and with the local backbone atoms, using the Boltzmann distribution. (Not applied in this project) E  ln(P)

Rotamer Library Creation Source: http://honiglab.cpmc.columbia.edu/programs/sidechain/rotamers.html Parsing: Select all Nitrogens (N), Oxygens (O), Alpha Carbons (CA) , & all other Carbons i.e CD, CZ, etc. Exclude all other elements and the end of file Store in a 3D array: Residues (1D)  Rotamers (2D)

Rotamer Library Creation Original Rotamer: ('R', 1) Atom Residue X Y Z N R 4.986 -6.494 10.983 CA 3.99 -5.511 11.369 C 2.774 -6.288 11.848 O 2.209 -7.039 11.068 CB 3.528 -4.689 10.203 CG 4.563 -3.671 9.815 CD 4.228 -2.905 8.546 NE 5.119 -1.768 8.418 CZ 5.354 -1.136 7.268 NH1 4.736 -1.573 6.181 NH2 6.164 -0.097 7.202 HN 4.75 -7.184 10.299 HA 4.344 -4.953 12.119 1HB 3.357 -5.292 9.424 2HB 2.682 -4.216 10.45 1HG 4.656 -3.012 10.561 2HG 5.434 -4.141 9.675 1HD 4.338 -3.509 7.756 2HD 3.281 -2.586 8.592 HE 5.618 -1.384 9.195 1HH1 4.882 -1.115 5.304 2HH1 4.123 -2.361 6.237 1HH2 6.314 0.365 6.328 2HH2 6.628 0.228 8.026 HEAD 1 7.156 -7.614 11.118 2 6.202 -6.499 11.516 3 6.564 -5.625 12.293 Rotamer Library Creation Example: Black: Include in array Red: Exclude from array Blue: *Not part of the array

Aligning with the Backbone Translate backbone and rotamer to origin CA atom of ‘R’, 1 and backbone = (0, 0, 0) Rotate rotamer around X-axis Rotate rotamer around Z-axis Translate rotamer back to original position based on original position of CA atom i.e. CA atom of ‘R’, 1 = (3.99, -5.511 , 11.369)                                                                                 

Rotamer Library Manipulation Retrieve a specific rotamer: Provide the residue and the rotamer number i.e. ‘R’, 1  Gives you the 1st rotamer related to the Arginine residue Rotamer is already aligned with the backbone Only the coordinates of the atoms are returned in a 2D array Aligned Rotamer: ('R', 1) 5.0288367 -5.67582 10.82734 3.99 -5.511 11.369 2.7217014 -5.64128 12.04117 2.1324015 -5.76721 10.94661 3.50813 -5.37317 9.732783 4.587644 -5.20248 9.188313 4.2382361 -5.07404 7.407558 5.4126639 -4.77743 5.614174 24.594637 -4.74892 15.97595

Now, Consider again our protein of 20 amino acids. Each side chain has on average 9 rotamers. Assuming that we search now in the space of rotamers: N = 920 ≈ 1019 The searching space is restricted and oriented but the number of conformations is still too large for a naive search

Algorithms in searching (side-chain) conformational space Greed search (systematically scans the search space) DEE (Algorithmic approaches to reduce the search space) Self consistent algorithms (iterative sequential procedure) Monte Carlo algorithms (random search)

DEE (Dead-End Elimination) Aims to safely eliminates (clusters of) rotamers without loosing the GMEC (Global Minimum Energy Conformation). rotamer ir in force field of backbone only rotamer ir with rotamer(s) of other residues Given residue i, eliminate a rotamer ir if the minimum energy it can obtain by interaction with conformational background (js) is higher (worse) than the maximum possible energy that another rotamer it (of the same residue) can have

E(i,j) is it js rotamer background Desmet et al., 1992

The Goldstein improvement Rotamer ir can be safely eliminated when some other rotamer it exists with lower (better) energy for a certain environment that mostly favors ir. This criteria is much less restrictive and therefore more powerful. It requires though more computational time.

The Goldstein improvement is E(i,j) it js rotamer background

Scoring function: Energy function Terms: Van der Waals represents packing specificity Hydrogen bonding typically represented by an angle dependent, 12-10 hydrogen bond potential Electrostatics Guard against destabilizing interactions between like charged residues Internal coordinate terms ‘bonded’ energies Solvation energy Protein-solvent interactions Entropy Assumes conformational space is completely restricted in the folded state Gordon et al, 1999

Scoring function: Energy function Terms: Van der Waals represents packing specificity Hydrogen bonding typically represented by an angle dependent, 12-10 hydrogen bond potential Electrostatics Guard against destabilizing interactions between like charged residues Internal coordinate terms ‘bonded’ energies Solvation energy Protein-solvent interactions Entropy Assumes conformational space is completely restricted in the folded state Gordon et al, 1999

Van der Waals Interaction between two uncharged atoms Mildly attractive as two atoms approach from a distance Repulsive as they approach too close Represents packing specificity Prefers native-like folded states with well-organized cores over disordered or molten-globule states Gordon et al, 1999

Van der Waals 12-6 Lennard-Jones potential Standard approximation http://employees.csbsju.edu/hjakubowski/classes/ch331/protstructure/ilennardjones2.gif 12-6 Lennard-Jones potential Standard approximation R = distance between atoms R0 = van der Waals radii Dij = well depth Variation from Kuhlman and Baker, 2004 Erep is dampened to account for the fixed backbone and rotamer set being used.

Electrostatics Stability Specificity Moderate temperatures: favorable electrostatic interactions not thought to be strong enough to compensate for the energy of desolvation Extreme conditions: salt bridges may stabilize Specificity folding and functional interactions maybe the more significant role of electrostatics Currently, term guards against destabilizing interactions between like-charged residues Gordon et al, 1999

Electrostatics Approximations: Coulomb’s Law (Gordon et al, 1999) Qi,Qj = charge on amino acid R = distance ε= dielectric constant = 40 Bayesian version (Kuhlman & Baker, 2004) Probability of two amino acids close together given environment and distance (from PDB) aa=amino acid, d = distance, env =environment

Solvation Hydrophobic effects drive folding, modeling solvation effects is critical to a protein design force field Computationally expensive Solvent model from Lazaridis and Karplus, 1999 dij = distance between atoms, rij = van der Waals radii, Vi = atomic volume ΔGref = reference solvation free energy, ΔGfree = solvation free energy of free (isolated) group λ = correlation length

Energy Function: Incomplete model Current standard models include Bayesian terms based on PDB statistics Several terms have not been thoroughly validated as useful for design (Gordon et al, 1999) Hydrogen bonding Electrostatics Internal coordinates Current standard models are ad hoc, physical quantities and variables are weighted based on “what works best”

Integrated algorithm schema ..N1 I2L3D2E1F2. .. . . . . .D1. . . ..N1 L2L3K2N1V1. .. ..W7L3D2K9K10G1. .. Best seq . . . . .D2. . . 2nd DEE Exhaustive search 1st order DEE ..N1 . . .D2... Iterating algorithm till no more rotamers can be eliminated. only one rotamer is left for each of several side chains (i.e., these are part of the GMEC). For several others, only a (hopefully) few are left. DEE is extended to the Second order: cluster size =2, only core design finished An additional exhaustive search is then applied to obtain the final model D … N N1 N2 N3 . . . N1 . . .

Design cold-shock protein (core) & Trp-Cage protein Cold-shock protein (1MJC.pdb) 10 residues (core) Trp-Cage(1L2Y.pdb) 20 residues

cold-shock protein (core) After 1st-order DEE 2 3 7 8 9 5 4 1 6 Residue 0 A 1 F 2 3 I L V W 4 5 6 7 … Residue 3 Residue 8 Residue 1 A 1 F 2 I 3 L V W 5 6 … Residue 3 7 Residue 8 Hydrophobic Amino acids: A (1), F (3), I (3), L (2), V (2), W(7)

Trp-Cage protein All 20 AA After 1st-order DEE . . . Residue 9 H: 1...8 I: 1,2,3 K: 1...87 L: 1,2 M: 1...17 N: 1...9 P: 1 Q: 1...30 R: 1...114 S: 1,2 T: 1 V: 1,2 W:1...7 Y: 1,2,3 After 1st-order DEE . . . Residue 9 A: 1 C: 2 D: 6 E: 6,15 F: 1 G: 1 H: 7 I: 2 K: 18,22,59 L: 1 M: 1,12 N: 6 P: 1 Q: 4 R: 7,107 S: 2 T: 1 V: 1,2 W:6 Y: 1 All 20 AA Both finished the first order DEE, then go to 2nd order DEE, Trp-Cage is still running. core design finished two rounds, and further Evaluation for the best sequences are applied.

Results for cold-shock protein (core) Seq. EScore N: V F I V V I L V F V -46.47 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. . . . I F I I I I L I F V -53.58 I F I I V I L I F V -52.48 V F I I I I L I F V -51.70 I F I I I I L V F V -50.72 V F I I V I L I F V -50.53 I F I I V I L V F V -49.63 I F V V I I L I F V -49.34 V F V I I I L I F V -49.23 I F V I I I L V F V -48.92 V F I I I I L V F V -48.88 Cold-shock protein (1MJC.pdb) 10 residues (core) We are very happy to see the predicted sequences are at minimum 50% identical to the native one, and the rest residues are limited to 2 amino acids which include the native ones. Secondly, there is high similarity of the energies of the sequences which suggests the global E minima of conformational space is smooth and our solution is likely to be stable.

Summary & Future Speed Accuracy Achievement: Naïve ~ 107 sequence X 104 rotamers DEE ~ 3000 sequences X 200 rotamers BioX-cluster(~600 2.8GHz Xeon CPUs) 26 hrs Future: Rotamers ordering (by self-energies) (Gordon 1998) Comparison cluster focusing (Looger 2001) Stronger elimination criteria (Looger 2001) Accuracy Achievement: 50 % identical with native sequence High similarity in total energy Future: Additional energy terms (H-bond, solvation) Incorporate rigorous force field calculators(Gromacs) Structure relaxation Have not had chance to compare with other algorithms( greedy, MC), but simplest demenstration is naïve search needs

Thanks !