Computational Protein Redesign and the Non-Ribosomal Code

Computational Protein Redesign and the Non-Ribosomal Code
Ivelin Georgiev Cheng-Yu Chen Bruce R. Donald Donald Lab Thank you! The topic of my presentation today is Computational Protein Redesign and the Non-Ribosomal Code. Specifically, I will tell you about a set of computational protein design techniques developed in our lab, and their successful application to redesign a Nonribosomal Peptide Synthetase enzyme. The redesign of NRPS domains can provide insight into the structural basis of specificity for these enzymes, and help decipher the non-ribosomal code. This is joint work with Cheng-Yu Chen and our adviser Prof. Bruce Donald.

NonRibosomal Peptide Synthetases (NRPS)
NRPS enzymes found in some fungi and bacteria NRPS enzymes make peptide-like products with pharmaceutical properties (antifungal, antineoplastic, antibacterial ) e.g. vancomycin, penicillin, gramicidin, bacitracin, cyclosporin, bleomycin, … NRPS similar to PKS Nonribosomal Peptide Synthetases (NRPS, for short) complement the traditional ribosomal peptide synthesis in a variety of bacteria and fungi. NRPS enzymes are big multidomain proteins that produce peptide-like natural products. Among the NRPS products are natural antibiotics, immunosuppressants and antineoplastics. Penicillin is probably the most well-known NRPS product (by the way, here is the original paper), but other NRPS products include gramicidin, bacitracin, vancomycin. NRPS are similar to the polyketide synthases. As the famous story goes, Fleming went away for a weekend and when he returned he noticed that bacterial (staphylococcus aureus) growth was inhibited near a fungal colony. "that's funny..." and the rest is history. Fleming and his colleagues did a series of experiments that demonstrated the ability of extracts of the fungus to kill various gram-positive bacteria, including Staphylococcus pyogenes, S. viridans, Micrococcus spp., and a few others. Effectiveness against gram-negative bacteria (such as Escherichia coli and Klebsiella pneumoniae) was limited, with very high concentrations of penicillin needed to kill those organisms. We now know that a membrane protects those gram-negative bacteria against penicillin. Prevents cross-linking of elements of the bacterial cell wall causing bacterial cell lysis. Stops the last step in the wall synthesis, the cross linking of two peptidoglycan strands. FPVOL

Walsh Group Here are some examples of PKS and NRPS products.
first row: antibiotics PKS: Macrolides (Erythromycin), Tetracycline (Doxycycline) immunosuppressants: FK506, rapamycin epothilone [e’pothuloun] Walsh Group

gramicidin S Walsh Group
The NRPS product of main interest for our lab is gramicidin S. At the top here, you can see some of the steps in the biosynthesis of bacitracin, and these are very similar to the biosynthesis of gramicidin S. Gramicidin S is produced by the two enzymes, Gramicidin Synthetase A (which consists of three domains) and Gramicidin Synthetase B (thirteen domains). There are a total of five adenylation domains in these two enzymes, and each of these domains specifically adenylates a particular amino acid, which is then incorporated into the peptide chain. Two of these pentapeptides are then cyclized by the termination domain to make the decapeptide gramicidin. gramicidin S

Protein Redesign (NRPS)
GrsA APhe T E GrsB C APro T C AVal T C AOrn T C ALeu T TE The phenylalanine adenylation domain in GrsA is responsible for the incorporation of this phenylalanine here into the polypeptide. So, one can imagine redesigning PheA to switch its specificity towards a non-cognate substrate, so that the new substrate will replace Phe in the polypeptide.

GrsA ALeu T E GrsB C APro T C AVal T C AOrn T C ALeu T TE For example, if we redesign that domain to specifically adenylate Leu, we may obtain a modified version of gramicidin, in which Phe is replaced by Leu (switch back and forth with previous slide)

GrsA ALeu T E GrsB C APro T C AVal T C AGlu T C ALeu T TE Further, if we also redesign an adenylation domain in GrsB, we can get a gramicidin derivative that differs from gramicidin in four of the ten positions. But how can such a redesign be performed?

Sequence Homology (Signature Sequences)
APhe T E WT Domain-Swapping ALeu T E Chimera C APro T C AVal T C AOrn T C ALeu ALeu T TE Sequence Homology (Signature Sequences) APhe Many techniques for protein redesign have been proposed and applied with different levels of success. Among the most popular ones are sequence homology, domain swapping, and, the approach that we use in our lab, structure-based protein design. As we all know, the amino acid sequence of a protein determines its structure, which in turn determines its function, and it is exactly this relationship that structure-based protein design tries to exploit. In contrast to domain-swapping and sequence homology, structure-based protein design preserves both domain-domain interface- and structure-specific information. APhe APhe T E AA sequence → structure → function Structure-based protein design

Structure-based Protein Design
wildtype pairwise energy function input structure rotamer library protein design algorithm The input to a protein design algorithm typically includes the following: A wildtype fixed-backbone structure which is to be redesigned; A rotamer library of low-energy side-chain conformations that discretizes the continuous side-chain conformation space and makes the computational search feasible; and An energy function for scoring and ranking the candidate structures. The energy function typically consists of some of the standard molecular mechanics energy terms (such as vdW, electrostatics, and solvation energies), but may also include some statistical and other terms. Based on the input model, the protein design algorithm makes predictions about mutations to the wildtype sequence, in order to achieve a desired novel functionality, such as improving the thermostability of the protein, switching the enzyme specificity towards a novel substrate, or redesigning the protein to perform a completely novel function. stability specificity novel function mutant

Redesign of GrsA-PheA input structure 1amu (1.9Å)
The input to a protein design algorithm typically includes the following: A wildtype fixed-backbone structure which is to be redesigned; A rotamer library of low-energy side-chain conformations that discretizes the continuous side-chain conformation space and makes the computational search feasible; and An energy function for scoring and ranking the candidate structures. The energy function typically consists of some of the standard molecular mechanics energy terms (such as vdW, electrostatics, and solvation energies), but may also include some statistical and other terms. Based on the input model, the protein design algorithm makes predictions about mutations to the wildtype sequence, in order to achieve a desired novel functionality, such as improving the thermostability of the protein, switching the enzyme specificity towards a novel substrate, or redesigning the protein to perform a completely novel function. 1amu (1.9Å)

rotamer library Phe Richardsons’ Penultimate 150 rotamers AA
#  dihedrals Ala - Val 3 1 Leu 5 2 Ile 7 Phe 4 Tyr Trp Cys Met 13 Ser Thr Lys 27 Arg 34 His 8 Asp Glu Asn Gln 9 Gly Richardsons’ Penultimate 150 rotamers The input to a protein design algorithm typically includes the following: A wildtype fixed-backbone structure which is to be redesigned; A rotamer library of low-energy side-chain conformations that discretizes the continuous side-chain conformation space and makes the computational search feasible; and An energy function for scoring and ranking the candidate structures. The energy function typically consists of some of the standard molecular mechanics energy terms (such as vdW, electrostatics, and solvation energies), but may also include some statistical and other terms. Based on the input model, the protein design algorithm makes predictions about mutations to the wildtype sequence, in order to achieve a desired novel functionality, such as improving the thermostability of the protein, switching the enzyme specificity towards a novel substrate, or redesigning the protein to perform a completely novel function. Phe

Amber van der Waals + electrostatic pairwise energy function
LJ 12-6 r qiqj + electrostatic Coulomb + - εrij pairwise energy function rotamer deviation penalty + dihedral The input to a protein design algorithm typically includes the following: A wildtype fixed-backbone structure which is to be redesigned; A rotamer library of low-energy side-chain conformations that discretizes the continuous side-chain conformation space and makes the computational search feasible; and An energy function for scoring and ranking the candidate structures. The energy function typically consists of some of the standard molecular mechanics energy terms (such as vdW, electrostatics, and solvation energies), but may also include some statistical and other terms. Based on the input model, the protein design algorithm makes predictions about mutations to the wildtype sequence, in order to achieve a desired novel functionality, such as improving the thermostability of the protein, switching the enzyme specificity towards a novel substrate, or redesigning the protein to perform a completely novel function. + implicit solvation EEF1 + AA reference energies folded vs. unfolded

Three-Step Enzyme Redesign
protein design algorithm Three-Step Enzyme Redesign Hybrid-K*: active site mutations Entropy step: mutatable positions GMEC step: bolstering mutations 1. 2. 3. provable heuristic provable

Hybrid-K* ∫ 1 Z conformational ensembles (weighted average) sequence K* TIAAIC 7.3 GIRMQM 3.1 TGIAIV 2.9 LMLAIS 1.7 TWAIGY 0.3 a K*: provably-accurate approximation to the binding constant via conformational ensembles

Hybrid-K*: Ensembles Method
Volume filter seq1 seqn C … C DEE pruning DEE pruning C’ C’ A* search (E lower bounds) A* search (E lower bounds) … … … full E minimization p’ full E minimization q* q* q* ≥ (1-ε)q q* ≥ (1-ε)q

2. Entropy step SCMF high entropy: mutation tolerance ir js rotamer
Mayo 2001 rotamer probabilities AA probabilities mutatable positions SCMF residue entropy high entropy: mutation tolerance Boltzmann ir js

3. GMEC step min single lowest-energy conformation

Dead-End Elimination (DEE)
Desmet et al., 1992 it ir E lower bound E upper bound ir rotamer pruning O(q2n2) E it Dead-End Elimination (DEE) is a provably-accurate deterministic algorithm that reduces the search space for protein design problems by pruning rotamers that are provably not part of the GMEC. Specifically, for a given residue position i, DEE compares the pruning candidate rotamer i_r (here, the notation i_r means rotamer identity r at residue position i) against a competitor rotamer i_t. If a lower bound on the energy of any conformation with i_r is still greater than an upper bound on the energy of any conformation with i_t, then we can always obtain a lower-energy conformation by switching from rotamer i_r to rotamer i_t. Thus, rotamer i_r can be provably pruned from further consideration, since it cannot belong to the GMEC. This pruning step is repeated for all rotamers as pruning candidates, for all competitors, and for all residue positions, until no more rotamers can be provably pruned. This typically results in a significantly reduced set of conformations that have to be subsequently enumerated, making the mutation search computationally feasible. Unfortunately, DEE is provably-accurate only for a fixed backbone. The question then arises, can there be an algorithm for backbone flexibility that incorporates the same provable guarantees as DEE for a fixed backbone. conformations Enumerate GMEC

Traditional-DEE with Rigid Rotamers/Backbone
Conformations Energy it

MinDEE Conformations min Energy max it

√ √ √ X C C C C GMEC fixed backbone flexible backbone Desmet 1992
JCC’08 C √ BD BB minimization provably-correct Bionformatics’07 C C traditional-DEE traditional-DEE MinDEE BB/c minimization No minimization c minimization Nice! X √ √ not provably-correct provably-correct provably-correct GMEC

3. GMEC step C O(n2r2) B(c) > E(best) minGMEC MinDEE pruning C’
A* search (E lower bounds) … … full E minimization … minGMEC B(c) > E(best)

Redesign of GrsA-PheA pairwise energy function rotamer library
input structure rotamer library Three-step algorithm The input to a protein design algorithm typically includes the following: A wildtype fixed-backbone structure which is to be redesigned; A rotamer library of low-energy side-chain conformations that discretizes the continuous side-chain conformation space and makes the computational search feasible; and An energy function for scoring and ranking the candidate structures. The energy function typically consists of some of the standard molecular mechanics energy terms (such as vdW, electrostatics, and solvation energies), but may also include some statistical and other terms. Based on the input model, the protein design algorithm makes predictions about mutations to the wildtype sequence, in order to achieve a desired novel functionality, such as improving the thermostability of the protein, switching the enzyme specificity towards a novel substrate, or redesigning the protein to perform a completely novel function. mutation predictions stability specificity

Hybrid-K* Computational 9 hrs. on 24 processors
Conf. Remaining Pruning Factor (%) Initial 6.8 x 108 - Volume Filter 2.04 x 108 3.33 (70.0) MinDEE Filter 4.13 x 106 49.43 (98.0) Steric Filter 3.86 x 106 1.07 (6.5) A* Filter 7.82 x 104 49.41 (98.0) all (99.99) Computational 9 hrs. on 24 processors K* w/o filters: ≈ 3,263 days Top 40 Mutations – Hybrid-K* Predictions Experimental validation T278M/A301G (Stachelhaus et al., 1999) ranked 3rd G301 in all known natural Leu adenylation domains

Entropy mutatable residues: 45, 187, 207, 210, 238, 277, 447

Entropy (SCMF) Sequence Alignment of 402 Sequences
from AMP-binding Domains (Pfam)

Entropy (SCMF) MinDEE G N T F I I L L L L I I A
V A T N G S G N T F I I L L L L I I V L A Allowed AA types (DEE input) 45 ala leu ile phe asn gly 187 ala val leu ile ser thr asp glu asn gly 207 ala leu ile ser thr asn gly 210 ala leu ile phe tyr asn gly 238 ala leu ile ser thr asn gly 277 ala val leu ile ser thr asn gly 447 ala val leu ile ser thr asn gly

MinDEE Sequence Alignment of 402 Sequences
from AMP-binding Domains (Pfam) F L I V A T N G S N I T I I L L L L V A

Three-Step Enzyme Redesign
a top-scoring prediction: T278L/A301G + {V187L,I277L,S447N} active site bolstering step 1 step 3

Binding Pocket of PheA I330 S331 A322 I299 D235 A301 A236 K517 W239

K* Prediction: Mutants to bind L-Leu
W T I A A I S

Enzymes coupled continuous assay
KM kcat E + ATP + a.a. E•a.a•ATP E•a.a-AMP + PPi UDPG Pyrophosphorylase UDPG + PPi UTP + Glucose 1-Phosphate Phosphoglucomutase Glucose 1-Phosphate Glucose 6-Phosphate G-6-PDH Glucose 6-P + NAD 6-Phosphogluconate + NADH 340nm

Steady-state Kinetics
KM kcat E + ATP + a.a. E•a.a•ATP E•a.a-AMP + PPi Vmax KM

Specificity switched from L-Phe to L-Leu

Bolstering mutants L-Phe L-Leu

The Reaction Pathway Random sequential order KM kcat E + ATP + a.a.
E•a.a•ATP E•a.a-AMP + PPi E·ATP k1[a.a] Katp k-1 k2 k4 E E·ATP·a.a E·a.a-AMP·PPi E·AMP-a.a + PPi k3[ATP] k-2 k-4[PPi] Kaa k-3 E·a.a. Random sequential order

Stopped-Flow Kinetics
Dead time (Before measurement) = 1.2 ms Total reaction time = 0.5 ~ 5s Sampling points = 1000 # of exponential terms = # of steps

k1 k2 E•ATP + L-Phe E•a.a•ATP E*•a.a•ATP k-1 kobs1 = k1[L-Phe] + k-1
A*e-kobs1*t+B*e-kobs2*t+C A*e-kobs1*t + C

A301G/T278L Wild-type A301G A301G/T278M kobs1 = k1[L-Leu] + k-1 k1
E•ATP + L-Leu E•Leu•ATP k-1 A301G/T278L Wild-type A301G A301G/T278M kobs1 = k1[L-Leu] + k-1

No ATP Saturating ATP Binding of ATP causes conformational change to screen the right substrate

Free Energy Profile E‡S ES‡ ES +ATP ∆G‡ ∆G ∆G‡ = -RT[ln(k1)-ln(kB/Th)]
(kcal/mole) E‡S +ATP ∆G‡ ES‡ ∆G ES ∆G‡ = -RT[ln(k1)-ln(kB/Th)]

Stability of Wild-type and Mutant PheA

Conclusion Switching specificity not only improves activity but also affects binding Binding of ATP causes conformational change for the enzyme to select its right substrate Active site mutation lowers the energy barrier (activation energy) for the enzyme to bind different substrate.

What’s Next 4- and 5-point mutants (additivity?), other substrates
Bolstering mutations: stability vs. specificity Rate of binding: Use of Binding Energy in Catalysis Effects of mutants on PheATE We still need to determine why the predicted bolstering mutations have such a significant effect on substrate specificity. Do they also affect the stability of the protein? How are stability and specificity related in our experiments? Rate of binding: By comparison of the energy profile of each mutant with wild-type enzyme, we would like to know how the binding is affected by the influence of each side chain mutation. permissive mutations reference: Eric A. Ortlund, Jamie T. Bridgham, Matthew R. Redinbo, Joseph W. Thornton. Crystal Structure of an Ancient Protein: Evolution by Conformational Epistasis. Science, 2007.

What’s Next John MacMaster
NMR structures and dynamics of WT GrsA-PheA and mutants John MacMaster Mention: dynamics and the assembly of the domains (A+T+E) 800 mHz [1H, 15N]-TROSY-HSQC spectrum of perdeuterated, 15N-labeled WT GrsA-PheA

Acknowledgments Funding: John MacMaster Tony Yan Dan Keedy
All members of Donald Lab Funding: NIH

Structure-Based Protein Design
dry-lab protein design algorithm mutant 1 mutant 2 mutant n … … Briefly: computational predictions followed by experimental validation kinetics binding wet-lab

energy function AA reference energies native sequence optimization
ala 1.27 val -0.04 leu -0.48 ile 0.95 phe -0.85 tyr -4.38 trp -5.34 cys -4.77 met ser -3.85 thr -3.99 lys -8.57 arg -22.44 hip -7.70 asp -9.28 glu -10.04 asn -10.30 gln -10.51 gly 0.20 AA reference energies < 1.7Å 135L.pdb 1GBS.pdb 1IQQ.pdb 1KOE.pdb 1MLA.pdb 1QTS.pdb 1SJY.pdb 1VCC.pdb 1AGI.pdb 1EW4.pdb 1GCU.pdb 1HZT.pdb 1JWR.pdb 1LN4.pdb 1RC9.pdb 1TP6.pdb 1ULR.pdb 1NM8.pdb 1VF8.pdb 1OGO.pdb 2PGE.pdb native sequence optimization energy-function dependent The input to a protein design algorithm typically includes the following: A wildtype fixed-backbone structure which is to be redesigned; A rotamer library of low-energy side-chain conformations that discretizes the continuous side-chain conformation space and makes the computational search feasible; and An energy function for scoring and ranking the candidate structures. The energy function typically consists of some of the standard molecular mechanics energy terms (such as vdW, electrostatics, and solvation energies), but may also include some statistical and other terms. Based on the input model, the protein design algorithm makes predictions about mutations to the wildtype sequence, in order to achieve a desired novel functionality, such as improving the thermostability of the protein, switching the enzyme specificity towards a novel substrate, or redesigning the protein to perform a completely novel function.

energy function The input to a protein design algorithm typically includes the following: A wildtype fixed-backbone structure which is to be redesigned; A rotamer library of low-energy side-chain conformations that discretizes the continuous side-chain conformation space and makes the computational search feasible; and An energy function for scoring and ranking the candidate structures. The energy function typically consists of some of the standard molecular mechanics energy terms (such as vdW, electrostatics, and solvation energies), but may also include some statistical and other terms. Based on the input model, the protein design algorithm makes predictions about mutations to the wildtype sequence, in order to achieve a desired novel functionality, such as improving the thermostability of the protein, switching the enzyme specificity towards a novel substrate, or redesigning the protein to perform a completely novel function.

∫ GMEC-based Ensemble-based min 1 Z single lowest-energy conformation
weighted average sequence K* TIAAIC 7.3 GIRMQM 3.1 TGIAIV 2.9 LMLAIS 1.7 TWAIGY 0.3 a K*: provably-accurate approximation to the binding constant via conformational ensembles

Why provably-accurate algorithms?
Provable Algorithms Heuristics Computational Guarantees √ X in vitro/in vivo Comparison structure input rotamer library energy function protein design algorithm One question that we often get is why use provably-accurate algorithms if they are so slow. Well, for a given input model, the advantage of provably-accurate algorithms over heuristics is clear: provable algorithms can guarantee the generation of the best solution for that model, whereas heuristics cannot, and it has in fact been shown that in some cases approaches like MC and SCMF can significantly deviate from the best solution. However, since the input model is only an approximation to the real biochemical processes, the results from provable algorithms and heuristic approaches can have similar accuracy when compared to experimental data. If heuristics are used, however, there is no way to decouple the inaccuracies of the predictions resulting from inaccuracies in the model and inaccuracies in the algorithm. Incorporation of experimental feedback back into the model is therefore more readily achievable with provable algorithms. prediction in vitro experiments

NP-hard!!! not provable (MC, SCMF, GA) provable (DEE) faster slower
Unfortunately, for a given input model with a fixed backbone, a pairwise energy function, and a rotamer library, finding the best solution is NP-hard. As a result, many heuristic… NP-hard!!! low-energy conformation Global Minimum Energy Conformation (GMEC)

Activity Assays ↑ specificity for Leu Cheng-Yu Chen
I would say use the language from the Update (Sec 2 and 4.1) mutants: ↓ specificity for Phe ↑ specificity for Leu

GrsA-PheA Redesign Cheng-Yu Chen mutants: switch of specificity wt
A301G A301G/T278L A301G/T278M A301G/A322V T278L/A301G/S447N I277L/T278L/A301G V187L/T278L/A301G mutants: switch of specificity

E(ir , js) χir χjs lower / upper energy bounds ir MinDEE it js ir
We can then compute lower and upper energy bounds on the rotamer-to-backbone and pairwise rotamer-to-rotamer interaction energies within the restraining boxes for each rotamer. For example, we compute a lower energy bound for a rotamer pair i_r – j_s by allowing the backbone to flex in order to minimize the interaction energy between these two rotamers, within the rotamer restraining boxes. The minimization process in (phi, psi) space is shown by the red path in the figure on the right. χir ir js χjs

MinDEE: - - > 0 lower / upper energy bounds ir it js lower bound
pruning candidate lower / upper energy bounds ir it competitor js witness MinDEE: - - > 0 The E(minus) terms represent a lower bound on the rotamer-to-template and rotamer-to-rotamer interaction energies that involve the pruning candidate rotamer i_r, and the E(plus) terms are the respective upper energy bounds that involve the competitor rotamer i_t. The main difference from the traditional-DEE criterion is the inclusion of the E(delta) term, which takes into account possible energy changes due to backbone movement. lower bound on ir conformation energies upper bound on it conformation energies possible energy changes due to rotamer movement not in trad-DEE

MinDEE: Side-chain Dihedral Flexibility
traditional-DEE MinDEE Also add some of Bruce Tidor’s summary of the MinDEE algorithm (appropriate manipulation of 1- and 2-body energy bounds) from the K* grant.

Energy Lower Bounds

Computational Protein Redesign and the Non-Ribosomal Code

Similar presentations

Presentation on theme: "Computational Protein Redesign and the Non-Ribosomal Code"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computational Protein Redesign and the Non-Ribosomal Code

Similar presentations

Presentation on theme: "Computational Protein Redesign and the Non-Ribosomal Code"— Presentation transcript:

Similar presentations

About project

Feedback