Presentation is loading. Please wait.

Presentation is loading. Please wait.

7. Protein Design 1. Protein design 2 Structure prediction and design are inverse problems.

Similar presentations


Presentation on theme: "7. Protein Design 1. Protein design 2 Structure prediction and design are inverse problems."— Presentation transcript:

1 7. Protein Design 1

2 Protein design 2 Structure prediction and design are inverse problems

3 Protein design – why? Industrial applications – Design of thermostable/super-soluble proteins (e.g. Dantas Kuhlman & Baker 2003; Malakauskas & Mayo, 1998) Improve fold prediction – Enrich PSSM of a fold family using designed sequences (e.g. Koehl & Levitt 1999; Kuhlman & Baker 2000) Identify functional sites – Positions conserved in native but not designed sequences (e.g. Cheng, Samudrala & Baker 2004) Crucial step before functional design (e.g. Liang & Baker, 2008; Ashworth & Baker 2007, etc) 3

4 Protein Design - Overview Protein design: Design a sequence that fits to a given structure 1.Recover sequence profiles (redesign) 2.Design new protein fold 3.Enzyme design 4.Multiple state design 5.Negative design 6.Interface design 4

5 Design a sequence that fits a given structure Assumption: design can retrieve the sequences that fit to a given structure – can retrieve sequences that occur in nature for this protein fold ATCSFFGRKL.. 5

6 Protein design – an extension of side chain modeling Given: – protein backbone – for each residue: set of possible conformations (rotamers from library) for different amino acids Wanted: Combination of rotamers that results in lowest total energy GMEC = min (  E ir +  E irjs ) GMEC defines designed sequence i i+1 i+2 i i+1 i+2 Self energy Pair energy 6

7 Protein design – an extension of side chain modeling Sampling Same techniques used as in side chain modeling, e.g. – DEE – MC, etc Scoring Add term that reflects amino acid preference E aa optimized for recovery of aa frequencies in natural sequences i i+1 i+2 i i+1 i+2 7

8 Fold-tree formulation of design Filled colored circles - flexible sc o empty colored circles – flexible amino acid: design 8

9 1. Recover sequence profiles with protein design (RosettaDesign) Native protein sequences are close to optimal for their structures (Kuhlman & Baker 2000) 9

10 Evaluate best sequence for given backbone using: amino acids: 19 aa (Cys excluded) side chains: Rotamer library scaffold: Constant backbone coordinates Location of (G)MEC: Monte Carlo procedure for aa and rotamer assignment Energy function: additional term: E aa reference energy to reproduce amino acid frequencies observed in native proteins trained (optimized) on a set of 30 proteins RosettaDesign: Basic approach 10

11 Dataset: 108 sequences with solved structure (res ≤ 3Å) ≤ 30% sequence identity Results: 51% of the core residues in the designed sequence were identical to the naturally occurring residues 27% of all residues in the designed sequence were Identical to the naturally occurring residues Computational Experiment 11

12 Sequence conservation in designed sequences correlates with conservation in protein families (core residues) 12

13 Analysis of SH3 domain designed sequences Input: 233 known sequences of SH3 domains 11 crystal structures Procedure: design 1000 sequences for each structure derive amino acid profile of the 11000 sequences compared to profile of the native sequences Results: good match between profiles evolution has sampled most of the sequence space compatible SH3 domain equilibrium reached 13

14 Amino acid profiles for six core residues in SH3 domains Shaded – designed Empty - natural Shaded – designed Empty - natural 14

15 The templates retain information about their specific sequence use a set of different templates to improve sequence profile recovery add backbone sampling to improve profile recovery Energy function is adequate to reproduce a large fraction of the naturally occurring amino acids of a given fold Low-energy sequences are close to native (not necessarily the GMEC) Stability is the major constraint in evolution of core residues Native protein sequences are close to optimal to their structures Redesign – conclusions 15

16 First complete redesign of protein: Zn Finger that folds without Zn (Dahiyat & Mayo, 1997) 16

17 Restrict allowed amino acids – 3 types of position based sets 1)Exposed Class - A,S,T,H,D,N,E,Q,K,R 2)Core Class - A,V,L,I,F,Y,W 3)Boundary Class (in between) - Combined (1) & (2)  1.9X10 27 combinations ORBIT Protocol for Zn finger design (Dahiyat & Mayo, 1997) Scoring: VdW, solvation, h-bonds and ss propensity. Use rotamer library & DEE algorithm to locate GMEC Find additional low-energy solutions by local sampling starting from GMEC 17

18 Local sampling starting from GMEC reveals conservation pattern of designs Alignment with zif268 second finger Alignment with zif268 second finger Conservation across 1000 simulations Conservation across 1000 simulations Ranking of predicted sequences sequences Design of a sequence that adopts a zinc finger fold without zinc Dahiyat & Mayo (1997) 18

19 Similarity to zif268: 6/28 identical 11/28 similar Core: additional hydrophobic aa fill space vacated by removal of metal-binding site C/H replaced with F/A/K Helix: stabilized by N-capping interactions NMR: RMSD within 2Å No similar sequences found by BLAST FSD-1 (Full Sequence Design) (Dahiyat & Mayo, 1997) 19

20 Prediction of Functional Residues Functional Residues are: Conserved within set of proteins of same function Energetically unfavorable In general accessible to substrate Identify positions in alignment that are Conserved in sequence Cannot be recovered by design (high in energy)  Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design (Cheng, … & Baker, NAR 2005) 20

21 Prediction of Functional Residues: Conserved & Energetically Unstable Functional residues are more conserved than others Functional residues are less well recovered by design than others (Cheng, … & Baker, NAR 2005) 21

22 Prediction of Functional Residues: Conserved & Energetically Unstable Identify positions in alignment that are Conserved in sequence Cannot be recovered by design (high in energy) Arginine kinase – Colored according to sequence conservation (red…..blue) – Colored according to predicted functionality (red…..blue) (Functional residues in spacefill) (Cheng, … & Baker, NAR 2005) 22

23 2. Design new protein from scratch ATCSFFGRKL.. 23

24 TOP7 – Design of a new fold Kuhlman, Dantas, … & Baker Science, 2003 1.Define new scaffold not observed in Nature 2. Find sequence that will fold into scaffold Iterate between Structure prediction (with fixed sequence) and Sequence design (with fixed structure) 24

25 Creation scheme of TOP7 1. Derive constraints from sketch 3. Design sequences for backbone templates all aa at 71/93 positions polar only for surface positions 3. Design sequences for backbone templates all aa at 71/93 positions polar only for surface positions 2. Build backbones that fulfill constraints (150) 4. Optimize backbone conformations Initial perturbation, followed by side chain optimization, and backbone torsion angle minimization 4. Optimize backbone conformations Initial perturbation, followed by side chain optimization, and backbone torsion angle minimization 5x15 cycles 25

26 Blue: model; Red: xray Assessment of Design (1) Structure 1.17Å backbone rmsd highly accurate! (2) Stability stable at 98 0 C! stable at ~5M Gu-Hcl! 26

27 TOP7 No sequence memory → more stringent test of force field and minimization procedure Optimized steric packing prevents molten globules No similarity to natural sequences (using basic protocols; psiblast finds somewhat similar sequences) → What can we learn from a protein that did not undergo natural selection?? 27

28 Folding of TOP7 Watters et al. Cell 2007 Measure folding pathway using stopped-flow kinetics, circular dichroism, and NMR experiments No evolutionary pressure acting on the folding free energy landscapes ±3 distinct folding phases Non-native conformation is stable at equilibrium  Folding of Top7 is less cooperative than native proteins  Cooperative folding & smooth free energy landscapes are not general properties of folding proteins, but a product of natural selection 28

29 Follow-up: Folding rules for the design of ideal structures Koga N, Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012 Goal: Define basic rules that govern simple tertiary motifs, and in turn, more complex structures (independent on amino acids in sequence, only on segment lengths) 29

30 Follow-up: Folding rules for the design of ideal structures Koga N, Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012 Rosetta simulations and statistical analysis (sequence- independent) -> dependence of orientation on loop length (1) bb rule 30

31 Follow-up: Folding rules for the design of ideal structures Koga N, Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012  rule 31

32 Follow-up: Folding rules for the design of ideal structures Koga N, Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012 (3)  rule 32

33 De novo design of 5 folds using these rules Koga N, Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012 Ferredoxin-like Rossmann 2x2 IF3-like P-loop 2x2 Rossmann 3x1 33

34 De novo design of 5 folds using these rules Koga N, Koga R, … & Baker, Nature 2012; Hoecker B, Nature 2012 Accuracy compared to NMR Ferredoxin-like Rossmann 2x2 IF3-like P-loop 2x2 Rossmann 3x1 34

35 Goal: design artificial enzymes Enzymes: – lower the activation barrier, by – stabilizing transition state – shielding reactants 3. Design of a novel enzyme Roethlisberger et al. 2008; Liang et al., 2008 35

36 Approach: 1.Model transition state of reaction (QM) 2.Stabilize with carefully placed chemical groups around it Design of a novel enzyme Rothlisberger et al. 2008; Liang et al., 2008 3. Graft resulting active site into an existing protein 4. Alter the sequence of the protein to accommodate the active site 36

37 Kemp Elimination Water mediated Model transition state 37

38 Search for Template Inside-out: – Build inverse rotamer tree starting from catalytic site – Search for fitting backbone templates (geometric hashing) RosettaMatch: Outside- in: – Place side chains and transition state model at each position – Search for transition state model orientations that fit several positions 38

39 Find match and redesign surrounding region 39

40 Validation 1: enzyme is active 40

41 Validation 2: accurate structure prediction 41

42 4. Multiple state design – design switches Design of a protein that fits two different conformations – Manipulate equilibrium between conformations (e.g. metal binding, phosphorylation, etc) Design protein that binds two (or more) different partners – Improve sequence recovery by addressing several constraints 42

43 Design of protein switches (a) parallel- antiparallel helices (oxidation dependent) (b) trimeric coiled- coil – zn finger (metal dependent) (c) homeo-domain – zn finger (metal dependent) Ambroggio & Kuhlman COSB 2006 43

44 5. Negative design Problem: optimization for a given fold does not guarantee that other alternative folds are not favorable for a sequence Solubility: prevent aggregation Compactness: prevent molten globule states Specificity: Negative design prevents alternative conformations 44

45 Negative design against hetero-dimer Sequence 2 is better than Sequence 1: specific, even though higher in energy Design of Homo-dimeric coiled-coils (Havranek & Harbury NSB 2003) 45

46 Design of protein-interaction specificity gives selective bZIP- binding peptides (Grigoryan et al, Nature 2009) bZip transcription factor family: – Leucine zipper: Coiled-coil – Homodimerize, heterodimerize – Human: ~53 bZip, 20 different classes Challenge: design of inhibitor specific leucine zippers (prevent side-effects due to binding of inhibitor to other bZips in genome) 46

47 Bzip proteins Basic region Zipper region 47

48 Leucine zipper is responsible for dimerization specificity GCN4- GCN4 Jun- JunFos- JunFos- Fos Jun- Jun Bzip region alone acts as inhibitor 48

49 Hydrophobic packing at a-d, Salt bridge at e-g positions 49

50 Design of protein-interaction specificity gives selective bZIP- binding peptides (Grigoryan et al, Nature 2009) Challenge: design specific inhibitors to 46 human bzips Scheme: + Binding to target -No binding to self -No binding to 19 other classes of human bzip proteins Tradeoff: maximize affinity & optimize specificity 50

51 Design of protein-interaction specificity gives selective bZIP-binding peptides CLASSY (cluster expansion and linear programming- based analysis of specificity and stability ) integer linear programming (ILP) – find optimal sequence cluster expansion - convert a structure-based interaction model into sequence-based scoring function (very fast)  simultaneous consideration of many different competing sequences possible (efficient negative design ) Here: include additional constrain: compatibility with bzip PSSM 51

52 Design of protein-interaction specificity gives selective bZIP-binding peptides Approach: “Specificity Sweep” - minimize sacrifice in stability when increasing energy gaps from competing complexes 1 1 2 2 3 3 4 4 52

53 Design of protein-interaction specificity gives selective bZIP- binding peptides (Grigoryan et al, Nature 2009) 53

54 Design of protein-interaction specificity gives selective bZIP- binding peptides (Grigoryan et al, Nature 2009) Results: Specific design: highest affinity to target (or target sibling) Good inhibitors: target binds better to design than to its original partner 54

55 Design of protein-interaction specificity gives selective bZIP- binding peptides (Grigoryan et al, Nature 2009) Analysis of sequence diversity and specificity designed sequences are less diverse, but contribute many more Interactions Conclusion: interaction space was not fully sampled by evolution: 1900 new possible interactions Excellent for synthetic biology!! natural designs 55


Download ppt "7. Protein Design 1. Protein design 2 Structure prediction and design are inverse problems."

Similar presentations


Ads by Google