Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)

Slides:



Advertisements
Similar presentations
Weighing Evidence in the Absence of a Gold Standard Phil Long Genome Institute of Singapore (joint work with K.R.K. “Krish” Murthy, Vinsensius Vega, Nir.
Advertisements

Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Crystallography -- lecture 21 Sidechain chi angles Rotamers Dead End Elimination Theorem Sidechain chi angles Rotamers Dead End Elimination Theorem.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Planning under Uncertainty
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structural bioinformatics
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University.
Heuristic alignment algorithms and cost matrices
Determination of alpha-helix propensities within the context of a folded protein Blaber et al. J. Mol. Biol 1994.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Thomas Blicher Center for Biological Sequence Analysis
Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, Monticello, Illinois. Editors: J. T. P. DeBrunner and E.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Tertiary protein structure modelling May 31, 2005 Graded papers will handed back Thursday Quiz#4 today Learning objectives- Continue to learn how to manipulate.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
Stochastic roadmap simulation for the study of ligand-protein interactions Mehmet Serkan Apaydin, Carlos E. Guestrin, Chris Varma, Douglas L. Brutlag and.
Sequence Alignment III CIS 667 February 10, 2004.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Computational Structure-Based Redesign of Enzyme Activity Cheng-Yu Chen, Ivelin Georgiev, Amy C.Anderson, Bruce R.Donald A Different computational redesign.
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Computational protein design. Reasons to pursue the goal of protein design In medicine and industry, the ability to precisely engineer protein hormones.
Representations of Molecular Structure: Bonds Only.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.
Flexible Multi-scale Fitting of Atomic Structures into Low- resolution Electron Density Maps with Elastic Network Normal Mode Analysis Tama, Miyashita,
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Protein Design CS273: Final Project Charles Kou Crystal structure of top7 – A novel protein structure created with RosettaDesign.
Secondary structure prediction
Conformational Entropy Entropy is an essential component in ΔG and must be considered in order to model many chemical processes, including protein folding,
Classwork II: NJ tree using MEGA. 1.Go to CDD webpage and retrieve alignment of cd00157 in FASTA format. 2.Import this alignment into MEGA and convert.
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
Altman et al. JACS 2008, Presented By Swati Jain.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Structure prediction: Homology modeling
PROTEIN STRUCTURE SIMILARITY CALCULATION AND VISUALIZATION CMPS 561-FALL 2014 SUMI SINGH SXS5729.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
Programme Last week’s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Summary.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Motif Search and RNA Structure Prediction Lesson 9.
Bioinformatics 2 -- lecture 9
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Step 3: Tools Database Searching
Solving and Analyzing Side-Chain Positioning Problems Using Linear and Integer Programming Carleton L. Kingsford, Bernard Chazelle and Mona Singh Bioinformatics.
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
In silico Protein Design: Implementing Dead-End Elimination algorithm
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
Bioinformatics 2 -- lecture 20 Protein design -- the state of the art.
Protein Structure Prediction and Protein Homology modeling
Dead-End Elimination for Protein Design with Flexible Rotamers
Volume 84, Issue 5, Pages (May 2003)
Volume 90, Issue 11, Pages (June 2006)
Homology Modeling.
Protein structure prediction.
Protein Design What is it ? Why ? Experimental methods What we need
Presentation transcript:

Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's) Extreme protein stabilization (S. Mayo, 1990's) Binding pocket design (H. Hellinga, 2000) New fold design (B. Kuhlman, ) Protein-protein interface design (J. Gray, 2004) Experimental (non-computational) approaches: in vitro evolution phage display **Other names in protein design: Hill, Vriend, Regan, D. Baker, Richardson, Dunbrack, Choma, several more.

The goal of sequence design Given a desired structure, find an amino acid sequence that folds to that structure. MIKYGTKIYRINSDNSG KJHGCKAHNEEEGHA design folding To do this, we must assign an energy to each possible sequence.

Theoretical complexity of sequence design To design THE OPTIMAL sequence, we need the best amino acid, and its best rotamer at every position. We can treat each position as one of 193 possible rotamers. That's 191 rotamers in the Richardson library, plus Gly and Ala (which have no rotamers) How many possible sets of rotamers are there for a protein of length 100? = 3.6* DEE reduces the complexity of sequence design to about (193L) 2 = 3.6*10 8

Sequence space maps to structure space..as many-to-one. This means that there is a lot of potential for "slop" in a sequence design. Moderately big sequence changes are possible, and the sequence can still fold to the same general structure. sequence families fold Good news for protein designers

Dead end elimination theorem E(i r ) +  j min s E(i r j s ) > E(i t ) +  j max s E(i t,j s ) This can be translated into plain English as follows: If the "worst case scenario" for t is better than the "best case scenario" for r, then you always choose t. reminder

DEE algorithm E(r 1 ) Find two columns (rotamers) within the same residue, where one is always better than the other. Eliminate the rotamer that can always be beat. (repeat until only 1 rotamer per residue)

DEE with alternative sequences abcabc r1r1 r2r2 E(r 1,r 2 ) abcabc abcabc abababab abcabcabcabcab a b E(r 2 ) E(r 1 ) Asp Leu “Rotamers” within the DEE framework can have different atoms. i.e. they can be different amino acids. Using DEE, we choose the best set of rotamers. Now we have the sequence of the lowest energy structure. In the example, we have D or L at position 3.

Sequence design using DEE Selected residues (or all) are chosen for mutating. Selected (or all) amino acids are allowed at those positions. For the selected amino acids, all rotamers are considered. Now "rotamer" comes to mean the amino acid identity and its conformation. Since there are as many as 193 rotamers in the rotamer library for all amino acids, each selected position can have as many as 193 "rotamers." If "fine grained" rotamers are used, this number may be much larger.

DEE with alternative sequences and ligands abcabc L r2r2 E(r 1,r 2 ) L abcabc abcabc abababab a bc a b E(r 2 ) E(r 1 ) Asp Leu Ligands can have multiple conformations and locations within the active site. In DEE, each position of the ligand is another “rotamer”, i.e. another row and column in the DEE matrix. Ligand conformers. r1r1

Sidechain modeling Given a backbone conformation and the sequence, can we predict the sidechain conformations? Energy calculations are sensitive to small changes. So the wrong sidechain conformation will give the wrong energy. ≠

Goal of sidechain modeling Desmet et al, Nature v.356, pp (1992) Given the sequence and only the backbone atom coordinates, accurately model the positions of the sidechains. fine lines = true structure think lines = sidechain predictions using the method of Desmet et al.

Sidechain space is discrete, almost A random sampling of Phenylalanine sidechains, when superimposed, fall into three classes: rotamers. This simplifies the problem of sidechain modeling. All we have to do is select the right rotamers and we're close to the right answer.

What determines rotamers CG H H H O=CO=C N CA CB CG H H H O=CO=C N CA CB CG H H H O=CO=C N CA CB "m" "p" "t" -60° gauche 180° anti/trans+60° gauche 3-bond or 1-4 interactions define the preferred angles, but these may differ greatly in energy depending on the atom groups involved.

Rotamer Libraries Rotamer libraries have been compiled by clustering the sidechains of each amino acid over the whole database. Each cluster is a representative conformation (or rotamer), and is represented in the library by the best sidechain angles (chi angles), the "centroid" angles, for that cluster. Two commonly used rotamer libraries: *Jane & David Richardson: Roland Dunbrack: *rotamers of W on the previous page are from the Richardson library.

Dead end elimination theorem There is a global minimum energy conformation (GMEC), where each residue has a unique rotamer. In other words: GMEC is the set of rotamers that has the lowest energy. Energy is a pairwise thing. Total energy can be broken down into pairwise interactions. Each atom is either fixed (backbone) or movable (sidechain). fixed-movablemovable-movablefixed-fixed E is a constant, =E template E depends on rotamer, but independent of other rotamers E depends on rotamer, and depends on surrounding rotamers

Theoretical complexity of sidechain modeling The Global Minimum Energy Configuration (GMEC) is one, unique set of rotamers. How many possible sets of rotamers are there? n 1 n 2 n 3 n 4 n 5 … n L where n 1 is the number of rotamers for residue 1, and so on. Estimated complexity for a protein of 100 residue, with an average of 5 rotamers per position: = 8*10 69 DEE reduces the complexity of the problem from 5 L to approximately (5L) 2

Dead end elimination theorem Each residue is numbered (i or j) and each residue has a set of rotamers (r, s or t). So, the notation i r means "choose rotamer r for position i". The total energy is the sum of the three components: NOTE: E global ≥ E GMEC for any choice of rotamers. E global = E template +  i E(i r ) +  i  j E(i r,j s ) where r and s are any choice of rotamers. fixed-fixed fixed-movable movable-movable

Dead end elimination theorem If i g is in the GMEC and i t is not, then we can separate the terms that contain i g or i t and re-write the inequality. E(i r ) +  j min s E(i r j s ) > E(i t ) +  j max s E(i t,j s ) E GMEC = E template + E(i g ) +  j E(i g,j g ) +  j E(j g ) +  j  k E(j g,k g ) E notGMEC = E template + E(i t ) +  j E(i t,j g ) +  j E(j g ) +  j  k E(j g,k g )...is less than... E(i r ) +  j E(i r j s ) > E(i g ) +  j E(i g,j s ) Canceling all terms in black, we get: So, if we find two rotamers i r and i t, and: Then i r cannot possibly be in the GMEC.

Dead end elimination theorem E(i r ) +  j min s E(i r j s ) > E(i t ) +  j max s E(i t,j s ) If the "worst case scenario" for rotamer t is better than the "best case scenario" for rotamer r, then you can eliminate r. This can be translated into plain English as follows:

Exercise: Dead End Elimination Using the DEE worksheet: (1) Find a rotamer that satisfies the DEE theorem. (2) Eliminate it. (3) Repeat until each residue has only one rotamer. What is the final GMEC energy?

DEE exercise abcabc Three sidechains. Each with three rotamers. Therefore, there are 3x3x3=27 ways to arrange the sidechains. Each rotamer has an energy E(r), which is the non-bonded energy between sidechain and template. Each pair of rotamers has an interaction energy E(r 1, r 2 ), which is the non-bonded energy between sidechains.

DEE exercise r1r1 r2r2 E(r 1,r 2 ) abcabc abcabc abcabc abcabcabcabcabcabc E(r 2 ) E(r 1 )

DEE exercise: instructions (1) The best (worst) energies are found using the worksheet: Add E(r 1 ) to the sum of the lowest (highest) E(r 1,r 2 ) that have not been previously eliminated. (2) There are 9 possible DEE comparisons to make: 1a versus 1b, 1a versus 1c, 1b versus 1c, 2a versus 2b, etc. etc. For each comparison, find the minimum and maximum energy choices of the other rotamers. If the maximum energy of r 1 is less than the minimum energy of r 2, eliminate r 2. (3) Scratch out the eliminated rotamer and repeat until one rotamer per position remains. If the “best case scenario” for r 1 is worse than the “worst case scenario” for r 2 you can eliminate r 1.