7. Protein Design 1. Protein design 2 Structure prediction and design are inverse problems.

Slides:



Advertisements
Similar presentations
Protein Structure C483 Spring 2013.
Advertisements

Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Protein Structure Prediction using ROSETTA
Protein Structure – Part-2 Pauling Rules The bond lengths and bond angles should be distorted as little as possible. No two atoms should approach one another.
Putting biology to work for you: In vitro (directed) evolution and other techniques.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Protein Tertiary Structure Prediction
Structural bioinformatics
9. Protein interface Alanine Scanning and Design
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Mining frequent patterns in protein structures: A study of protease families Dr. Charles Yan CS6890 (Section 001) ST: Bioinformatics The Machine Learning.
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Thomas Blicher Center for Biological Sequence Analysis
FLEX* - REVIEW.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
Chapter 3 (part 2) – Protein Function. Test Your Knowledge (True/False) All proteins bind to other molecules. Explain. What sort chemical interactions.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Design of a novel globular protein with atomic-level accuracy.
Computational Structure-Based Redesign of Enzyme Activity Cheng-Yu Chen, Ivelin Georgiev, Amy C.Anderson, Bruce R.Donald A Different computational redesign.
Bioinf. Data Analysis & Tools Molecular Simulations & Sampling Techniques117 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling.
Protein modelling ● Protein structure is the key to understanding protein function ● Protein structure ● Topics in modelling and computational methods.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
Computational protein design. Reasons to pursue the goal of protein design In medicine and industry, the ability to precisely engineer protein hormones.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Protein “folding” occurs due to the intrinsic chemical/physical properties of the 1° structure “Unstructured” “Disordered” “Denatured” “Unfolded” “Structured”
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.
De novo Protein Design Presented by Alison Fraser, Christine Lee, Pradhuman Jhala, Corban Rivera.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Classwork II: NJ tree using MEGA. 1.Go to CDD webpage and retrieve alignment of cd00157 in FASTA format. 2.Import this alignment into MEGA and convert.
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
9. Protein interface Alanine Scanning and Design, continued 1.
Altman et al. JACS 2008, Presented By Swati Jain.
Structure prediction: Homology modeling
Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.
JG/10-09 NMR for structural biology DNA purification Protein domain from a database Protein structure possible since 1980s, due to 2-dimensional (and 3D.
Protein Structure Prediction Graham Wood Charlotte Deane.
X-ray crystallography – an overview (based on Bernie Brown’s talk, Dept. of Chemistry, WFU) Protein is crystallized (sometimes low-gravity atmosphere is.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein backbone Biochemical view:
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Protein Tertiary Structure Prediction Structural Bioinformatics.
In silico Protein Design: Implementing Dead-End Elimination algorithm
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
9. Protein interface Alanine Scanning and Design, continued 1.
8. Protein Design.
Dead-End Elimination for Protein Design with Flexible Rotamers
Enzyme Kinetics & Protein Folding 9/7/2004
Homology Modeling.
Protein structure prediction.
Homology modeling in short…
Presentation transcript:

7. Protein Design 1

Protein design 2 Structure prediction and design are inverse problems

Protein design – why? Industrial applications – Design of thermostable/super-soluble proteins (e.g. Dantas Kuhlman & Baker 2003; Malakauskas & Mayo, 1998) Improve fold prediction – Enrich PSSM of a fold family using designed sequences (e.g. Koehl & Levitt 1999; Kuhlman & Baker 2000) Identify functional sites – Positions conserved in native but not designed sequences (e.g. Cheng, Samudrala & Baker 2004) Crucial step before functional design (e.g. Liang & Baker, 2008; Ashworth & Baker 2007, etc) 3

Protein Design - Overview Protein design: Design a sequence that fits to a given structure 1.Recover sequence profiles (redesign) 2.Design new protein fold 3.Enzyme design 4.Multiple state design 5.Negative design 6.Interface design 4

Design a sequence that fits a given structure Assumption: design can retrieve the sequences that fit to a given structure – can retrieve sequences that occur in nature for this protein fold ATCSFFGRKL.. 5

Protein design – an extension of side chain modeling Given: – protein backbone – for each residue: set of possible conformations (rotamers from library) for different amino acids Wanted: Combination of rotamers that results in lowest total energy GMEC = min (  E ir +  E irjs ) GMEC defines designed sequence i i+1 i+2 i i+1 i+2 Self energy Pair energy 6

Protein design – an extension of side chain modeling Sampling Same techniques used as in side chain modeling, e.g. – DEE – MC, etc Scoring Add term that reflects amino acid preference E aa optimized for recovery of aa frequencies in natural sequences i i+1 i+2 i i+1 i+2 7

Fold-tree formulation of design Filled colored circles - flexible sc o empty colored circles – flexible amino acid: design 8

1. Recover sequence profiles with protein design (RosettaDesign) Native protein sequences are close to optimal for their structures (Kuhlman & Baker 2000) 9

Evaluate best sequence for given backbone using: amino acids: 19 aa (Cys excluded) side chains: Rotamer library scaffold: Constant backbone coordinates Location of (G)MEC: Monte Carlo procedure for aa and rotamer assignment Energy function: additional term: E aa reference energy to reproduce amino acid frequencies observed in native proteins trained (optimized) on a set of 30 proteins RosettaDesign: Basic approach 10

Dataset: 108 sequences with solved structure (res ≤ 3Å) ≤ 30% sequence identity Results: 51% of the core residues in the designed sequence were identical to the naturally occurring residues 27% of all residues in the designed sequence were Identical to the naturally occurring residues Computational Experiment 11

Sequence conservation in designed sequences correlates with conservation in protein families (core residues) 12

Analysis of SH3 domain designed sequences Input: 233 known sequences of SH3 domains 11 crystal structures Procedure: design 1000 sequences for each structure derive amino acid profile of the sequences compared to profile of the native sequences Results: good match between profiles evolution has sampled most of the sequence space compatible SH3 domain equilibrium reached 13

Amino acid profiles for six core residues in SH3 domains Shaded – designed Empty - natural Shaded – designed Empty - natural 14

The templates retain information about their specific sequence use a set of different templates to improve sequence profile recovery add backbone sampling to improve profile recovery Energy function is adequate to reproduce a large fraction of the naturally occurring amino acids of a given fold Low-energy sequences are close to native (not necessarily the GMEC) Stability is the major constraint in evolution of core residues Native protein sequences are close to optimal to their structures Redesign – conclusions 15

First complete redesign of protein: Zn Finger that folds without Zn (Dahiyat & Mayo, 1997) 16

Restrict allowed amino acids – 3 types of position based sets 1)Exposed Class - A,S,T,H,D,N,E,Q,K,R 2)Core Class - A,V,L,I,F,Y,W 3)Boundary Class (in between) - Combined (1) & (2)  1.9X10 27 combinations ORBIT Protocol for Zn finger design (Dahiyat & Mayo, 1997) Scoring: VdW, solvation, h-bonds and ss propensity. Use rotamer library & DEE algorithm to locate GMEC Find additional low-energy solutions by local sampling starting from GMEC 17

Local sampling starting from GMEC reveals conservation pattern of designs Alignment with zif268 second finger Alignment with zif268 second finger Conservation across 1000 simulations Conservation across 1000 simulations Ranking of predicted sequences sequences Design of a sequence that adopts a zinc finger fold without zinc Dahiyat & Mayo (1997) 18

Similarity to zif268: 6/28 identical 11/28 similar Core: additional hydrophobic aa fill space vacated by removal of metal-binding site C/H replaced with F/A/K Helix: stabilized by N-capping interactions NMR: RMSD within 2Å No similar sequences found by BLAST FSD-1 (Full Sequence Design) (Dahiyat & Mayo, 1997) 19

Prediction of Functional Residues Functional Residues are: Conserved within set of proteins of same function Energetically unfavorable In general accessible to substrate Identify positions in alignment that are Conserved in sequence Cannot be recovered by design (high in energy)  Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design (Cheng, … & Baker, NAR 2005) 20

Prediction of Functional Residues: Conserved & Energetically Unstable Functional residues are more conserved than others Functional residues are less well recovered by design than others (Cheng, … & Baker, NAR 2005) 21

Prediction of Functional Residues: Conserved & Energetically Unstable Identify positions in alignment that are Conserved in sequence Cannot be recovered by design (high in energy) Arginine kinase – Colored according to sequence conservation (red…..blue) – Colored according to predicted functionality (red…..blue) (Functional residues in spacefill) (Cheng, … & Baker, NAR 2005) 22

2. Design new protein from scratch ATCSFFGRKL.. 23

TOP7 – Design of a new fold Kuhlman, Dantas, … & Baker Science, Define new scaffold not observed in Nature 2. Find sequence that will fold into scaffold Iterate between Structure prediction (with fixed sequence) and Sequence design (with fixed structure) 24

Creation scheme of TOP7 1. Derive constraints from sketch 3. Design sequences for backbone templates all aa at 71/93 positions polar only for surface positions 3. Design sequences for backbone templates all aa at 71/93 positions polar only for surface positions 2. Build backbones that fulfill constraints (150) 4. Optimize backbone conformations Initial perturbation, followed by side chain optimization, and backbone torsion angle minimization 4. Optimize backbone conformations Initial perturbation, followed by side chain optimization, and backbone torsion angle minimization 5x15 cycles 25

Blue: model; Red: xray Assessment of Design (1) Structure 1.17Å backbone rmsd highly accurate! (2) Stability stable at 98 0 C! stable at ~5M Gu-Hcl! 26

TOP7 No sequence memory → more stringent test of force field and minimization procedure Optimized steric packing prevents molten globules No similarity to natural sequences (using basic protocols; psiblast finds somewhat similar sequences) → What can we learn from a protein that did not undergo natural selection?? 27

Folding of TOP7 Watters et al. Cell 2007 Measure folding pathway using stopped-flow kinetics, circular dichroism, and NMR experiments No evolutionary pressure acting on the folding free energy landscapes ±3 distinct folding phases Non-native conformation is stable at equilibrium  Folding of Top7 is less cooperative than native proteins  Cooperative folding & smooth free energy landscapes are not general properties of folding proteins, but a product of natural selection 28

Follow-up: Folding rules for the design of ideal structures Koga N, Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012 Goal: Define basic rules that govern simple tertiary motifs, and in turn, more complex structures (independent on amino acids in sequence, only on segment lengths) 29

Follow-up: Folding rules for the design of ideal structures Koga N, Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012 Rosetta simulations and statistical analysis (sequence- independent) -> dependence of orientation on loop length (1) bb rule 30

Follow-up: Folding rules for the design of ideal structures Koga N, Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012  rule 31

Follow-up: Folding rules for the design of ideal structures Koga N, Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012 (3)  rule 32

De novo design of 5 folds using these rules Koga N, Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012 Ferredoxin-like Rossmann 2x2 IF3-like P-loop 2x2 Rossmann 3x1 33

De novo design of 5 folds using these rules Koga N, Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012 Accuracy compared to NMR Ferredoxin-like Rossmann 2x2 IF3-like P-loop 2x2 Rossmann 3x1 34

Goal: design artificial enzymes Enzymes: – lower the activation barrier, by – stabilizing transition state – shielding reactants 3. Design of a novel enzyme Roethlisberger et al. 2008; Liang et al.,

Approach: 1.Model transition state of reaction (QM) 2.Stabilize with carefully placed chemical groups around it Design of a novel enzyme Rothlisberger et al. 2008; Liang et al., Graft resulting active site into an existing protein 4. Alter the sequence of the protein to accommodate the active site 36

Kemp Elimination Water mediated Model transition state 37

Search for Template Inside-out: – Build inverse rotamer tree starting from catalytic site – Search for fitting backbone templates (geometric hashing) RosettaMatch: Outside- in: – Place side chains and transition state model at each position – Search for transition state model orientations that fit several positions 38

Find match and redesign surrounding region 39

Validation 1: enzyme is active 40

Validation 2: accurate structure prediction 41

4. Multiple state design – design switches Design of a protein that fits two different conformations – Manipulate equilibrium between conformations (e.g. metal binding, phosphorylation, etc) Design protein that binds two (or more) different partners – Improve sequence recovery by addressing several constraints 42

Design of protein switches (a) parallel- antiparallel helices (oxidation dependent) (b) trimeric coiled- coil – zn finger (metal dependent) (c) homeo-domain – zn finger (metal dependent) Ambroggio & Kuhlman COSB

5. Negative design Problem: optimization for a given fold does not guarantee that other alternative folds are not favorable for a sequence Solubility: prevent aggregation Compactness: prevent molten globule states Specificity: Negative design prevents alternative conformations 44

Negative design against hetero-dimer Sequence 2 is better than Sequence 1: specific, even though higher in energy Design of Homo-dimeric coiled-coils (Havranek & Harbury NSB 2003) 45

Design of protein-interaction specificity gives selective bZIP- binding peptides (Grigoryan et al, Nature 2009) bZip transcription factor family: – Leucine zipper: Coiled-coil – Homodimerize, heterodimerize – Human: ~53 bZip, 20 different classes Challenge: design of inhibitor specific leucine zippers (prevent side-effects due to binding of inhibitor to other bZips in genome) 46

Bzip proteins Basic region Zipper region 47

Leucine zipper is responsible for dimerization specificity GCN4- GCN4 Jun- JunFos- JunFos- Fos Jun- Jun Bzip region alone acts as inhibitor 48

Hydrophobic packing at a-d, Salt bridge at e-g positions 49

Design of protein-interaction specificity gives selective bZIP- binding peptides (Grigoryan et al, Nature 2009) Challenge: design specific inhibitors to 46 human bzips Scheme: + Binding to target -No binding to self -No binding to 19 other classes of human bzip proteins Tradeoff: maximize affinity & optimize specificity 50

Design of protein-interaction specificity gives selective bZIP-binding peptides CLASSY (cluster expansion and linear programming- based analysis of specificity and stability ) integer linear programming (ILP) – find optimal sequence cluster expansion - convert a structure-based interaction model into sequence-based scoring function (very fast)  simultaneous consideration of many different competing sequences possible (efficient negative design ) Here: include additional constrain: compatibility with bzip PSSM 51

Design of protein-interaction specificity gives selective bZIP-binding peptides Approach: “Specificity Sweep” - minimize sacrifice in stability when increasing energy gaps from competing complexes

Design of protein-interaction specificity gives selective bZIP- binding peptides (Grigoryan et al, Nature 2009) 53

Design of protein-interaction specificity gives selective bZIP- binding peptides (Grigoryan et al, Nature 2009) Results: Specific design: highest affinity to target (or target sibling) Good inhibitors: target binds better to design than to its original partner 54

Design of protein-interaction specificity gives selective bZIP- binding peptides (Grigoryan et al, Nature 2009) Analysis of sequence diversity and specificity designed sequences are less diverse, but contribute many more Interactions Conclusion: interaction space was not fully sampled by evolution: 1900 new possible interactions Excellent for synthetic biology!! natural designs 55