Ivelin Georgiev Bruce Donald Donald Lab Duke University

Ivelin Georgiev Bruce Donald Donald Lab Duke University
OSPREY Tutorial Ivelin Georgiev Bruce Donald Donald Lab Duke University

Distribution of Structures
min( ) Maximum Likelihood (pick most probable) Global Minimum Energy Conformation Bayesian ò 1 Z (average over all conformations) Probability « Energy using Boltzmann distribution

Distribution of Structures
min( ) Maximum Likelihood (pick most probable) Global Minimum Energy Conformation `Bayesian’ ò 1 Z (weighted average over all conformations) Probability « Energy using Boltzmann distribution

maximum likelihood GMEC BD traditional-DEE MinDEE
DEE-based predictions aim at identifying the single best solution, and in that, they are similar to a maximum-likelihood approach. In contrast, the computational predictions may be based on an ensemble of low-energy conformations, as is the case in our K* algorithm. K* is an enzyme-ligand binding prediction algorithm. K* computes partition functions over ensembles of conformations, where the contribution of each conformation to the partition function is weighed using Boltzmann probabilities. The ratio of the partition functions for the bound and unbound states is then used to compute a provably-accurate approximation to the binding constant for the given enzyme-ligand complex. As an example, here is what an ensemble of conformations for a given enzyme active site may look like when bound to a ligand. In our computational experiments, we use a mix of the DEE-based and K* protein design algorithms.

via conformational ensembles
traditional-DEE GMEC maximum likelihood MinDEE BD ∫ 1 Z weighted average K*: provably-accurate approximation to the binding constant via conformational ensembles a DEE-based predictions aim at identifying the single best solution, and in that, they are similar to a maximum-likelihood approach. In contrast, the computational predictions may be based on an ensemble of low-energy conformations, as is the case in our K* algorithm. K* is an enzyme-ligand binding prediction algorithm. K* computes partition functions over ensembles of conformations, where the contribution of each conformation to the partition function is weighed using Boltzmann probabilities. The ratio of the partition functions for the bound and unbound states is then used to compute a provably-accurate approximation to the binding constant for the given enzyme-ligand complex. As an example, here is what an ensemble of conformations for a given enzyme active site may look like when bound to a ligand. In our computational experiments, we use a mix of the DEE-based and K* protein design algorithms. sequence K* TIAAIC 7.3 GIRMQM 3.1 TGIAIV 2.9 LMLAIS 1.7 TWAIGY 0.3 Application: Enzyme-Ligand Binding

K* s1 s2 … si … sk thousands of sequences!!! MinDEE A* BD
J. Comp. Chem. (2008) conformations pruned 1 - ε partition function ε approximation si fraction evaluated confs K* is fully compatible both with MinDEE and BD. K* uses MinDEE and BD as an initial pruning filter and the A* branch-and-bound search in the enumeration stage. Previously, partition functions could only be computed for a handful of sequences. With K*, however, we routinely compute partition functions in mutation searches for up to 20,000 sequences. One of the main reasons that allows us to perform computations of this size is the provable epsilon-approximation algorithm for partition functions. Since each conformation contributes exponentially in the negative of its energy to the partition function, only a very small subset of low-energy conformations must be evaluated by K* in order to guarantee that the computed function is at least (1-epsilon) from the full partition function where all conformations are included. As is seen in the figure on the left here, which represents the fraction of conformations evaluated by K* for a set of several hundred test sequences, K* is extremely efficient, evaluating at most 0.5% of all conformations for any given sequence. … sk

Cheng-Yu Chen Ivelin Georgiev Amy Anderson Bruce Donald
Example (PNAS, 2009) Cheng-Yu Chen Ivelin Georgiev Amy Anderson Bruce Donald

NonRibosomal Peptide Synthetases (NRPS)
NRPS enzymes found in some fungi and bacteria NRPS enzymes make peptide-like products with pharmaceutical properties (antifungal, antineoplastic, antibacterial ) e.g. vancomycin, penicillin, gramicidin, bacitracin, cyclosporin, bleomycin, … NRPS similar to PKS Nonribosomal Peptide Synthetases (NRPS, for short) complement the traditional ribosomal peptide synthesis in a variety of bacteria and fungi. NRPS enzymes are big multidomain proteins that produce peptide-like natural products. Among the NRPS products are natural antibiotics, immunosuppressants and antineoplastics. Penicillin is probably the most well-known NRPS product (by the way, here is the original paper), but other NRPS products include gramicidin, bacitracin, vancomycin. NRPS are similar to the polyketide synthases. As the famous story goes, Fleming went away for a weekend and when he returned he noticed that bacterial (staphylococcus aureus) growth was inhibited near a fungal colony. "that's funny..." and the rest is history. Fleming and his colleagues did a series of experiments that demonstrated the ability of extracts of the fungus to kill various gram-positive bacteria, including Staphylococcus pyogenes, S. viridans, Micrococcus spp., and a few others. Effectiveness against gram-negative bacteria (such as Escherichia coli and Klebsiella pneumoniae) was limited, with very high concentrations of penicillin needed to kill those organisms. We now know that a membrane protects those gram-negative bacteria against penicillin. Prevents cross-linking of elements of the bacterial cell wall causing bacterial cell lysis. Stops the last step in the wall synthesis, the cross linking of two peptidoglycan strands. FPVOL

NRPS: GrsA-PheA Redesign
gramicidin S Phe Leu The system in which our lab is mainly interested is the Phe adenylation domain of the NRPS enzyme GrsA, or GrsA-PheA for short. GrsA in concert with GrsB makes the antibiotic gramicidin S. Our goal is to redesign GrsA-PheA to switch its specificity towards different non-cognate substrates, which eventually may be incorporated into modified versions of gramicidin S.

Protein Redesign (NRPS)
Three-dimensional structure of GrsA PheA domain [Conti et al., 1997] Three-dimensional structure of GrsA PheA domain is known [Conti, Stachelhaus, Marahiel, Brick. 1997]

Change specificity from Phe to Leu by allowing any 2 (of 9) mutations
Mutations to GAVLIFYWM Appx Mutation Sequences = ,000,000 Conformations (78,200 after pruning) - +H3N CO2 r = 9 s = 2 Leu

Crystal Structure: 1amu (1.9 Å) 563 a.a., 65 kD (K517) I330 C331
AMP D235 A322 A301 A236 Here shows the binding pocket from the crystal structure. 9 resides that are important in binding of the substrate are shown here. W239 T278 I299

Three-Step Enzyme Redesign
K*: active site mutations Entropy step: mutatable positions MinDEE: bolstering mutations Ivelin Georgiev, Cheng-Yu Chen 1. 2. 3. K* allows us to predict mutations to the active site of a protein in order to improve its specificity for a given target substrate. In our most recent work, we have extended K* to a three-step enzyme redesign algorithm that, in addition to the active site mutations, predicts mutations both close to and far away from the active site for further improvement in the substrate specificity. K* is applied in the first step of the three-step algorithm. The second-step uses an entropy-based approach to predict residue positions anywhere in the protein that may be tolerant to mutations – I will briefly describe this step shortly. In the third step, MinDEE is used to predict mutations to the residue positions from step 2 for additional improvement in the desired substrate specificity. Step 2 heuristically identifies additional residue positions for mutation. Steps 1 and 3 are provably with respect to the input model. provable heuristic provable Computational Structure-Based Redesign of Enzyme Activity. PNAS (2009)

T278L/A301G with Leu AMP K517 #1 3,000 sequences
6.8  108 rotameric conformations PNAS (2009)

Mutations Outside the Active Site
rotamer probabilities AA probabilities mutatable positions SCMF residue entropy Boltzmann MinDEE S447 I277 V187 F45 V238 I207 L210 The entropy step uses an approach based on work by Steve Mayo and co-workers published in Self-Consistent Mean Field is used to estimate AA probabilities, and consequently, the residue entropy for each residue position in the protein. Residues with high entropy imply they can accommodate multiple amino acid types, and so may be tolerant to mutation. In Mayo’s paper, this approach was used as a preprocessing step for directed evolution experiments. We have modified and extended this approach to predict high-entropy residue positions that are subsequently redesigned using our MinDEE algorithm. PNAS (2009)

6.8  108 rotameric conformations
All top 10 3,000 sequences 6.8  108 rotameric conformations A301G/T278L [L-Leu] mM Rate (AU/s * 10^{-6}) Kd by stopped-flow for A301G/T278D:Leu is improved >1000X over WT:Leu. if you know the Vmax and the total [enzyme], you can calculate Kcat as Kcat [Etotal] = Vmax. Also, Kcat/Km = v/[E][S]. Leu Phe Normalized kcat/ KM PNAS (2009)

T278D/A301G with Arg AMP D235 K517 301G 278D W239 L-Arg WT L-Lys WT
Arg: #1 of 2511 sequences Lys: #4 of 2511 sequences >9  108 conformations WT AMP [L-arg] mM D235 K517 D – Aspartic Acid (Asp) Rate (AU/s * 10^{-6}) 301G L-Lys W239 278D WT PNAS (2009)

Tutorial

Installation Setup Running OSPREY

Installation √ 32-bit 64-bit Java mpiJava MPICH2 may require
special instructions

Setup Compute Nodes Input Structure Rotamer Library Energy Function

Compute Nodes Select MPI nodes: linux1 Select job-specific nodes:
mpdboot mpdboot -n 5 -f mpd.hosts Select job-specific nodes: linux1 linux2 linux3 mpirun java OSPREY mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi -c KStar.cfg

Input Structure KiNG missing atoms model delete possible
REMARK 470 MISSING ATOM REMARK 470 THE FOLLOWING RESIDUES HAVE MISSING ATOMS (M=MODEL NUMBER; REMARK 470 RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSEQ=SEQUENCE NUMBER; REMARK 470 I=INSERTION CODE): REMARK 470 M RES CSSEQI ATOMS REMARK 470 GLU A 34 CG CD OE1 OE2 REMARK 470 GLU A 63 CD OE1 OE2 missing atoms KiNG model delete possible over-constraint possible under-constraint

Accelrys DS Visualizer
Input Structure adding hydrogens proteins general compounds recommended: MolProbity recommended: Accelrys DS Visualizer Check: protonation states missing protons

Input Structure His residues HIP HIE HID

Input Structure steric shell close to design site significant speedup

Input Structure Other considerations: protein, ligand, cofactor
ligand: natural AA, small molecule water molecules no chain ID’s unique residue numbers protein-peptide, protein-protein connectivity (good input structures)

Check and double-check!!!
Input Structure Check and double-check!!!

Rotamer Library rotamers Richardsons’ Penultimate general proteins
compounds # dihed # rot name TYR 2 4 N CA CB CG CA CB CG CD1 62 90 TYR 2 5 N CA CB CG CA CB CG CD1 62 90 FCL 2 4 N CA CB CG CA CB CG CD1 62 90 1 2 one rotamer

Energy Function add params for new atom types antechamber typically
parm96a.dat all_amino94X.in all_nuc94_and_gr.in atom types dihedral parameters vdW parameters amino acids partial charges connectivity general compounds partial charges connectivity add params for new atom types antechamber typically no changes add params for new compounds antechamber can modify partial charges user control: distance-dependent dielectric, dielectric value, vdW radii scaling, solvation energy scaling, dihedral energies switch

Running OSPREY GMEC-based Ensemble-based Residue entropy

GMEC-based doDEE DACS input structure rotamer library energy function
mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi -c KStar.cfg doDEE System.cfg DEE.cfg input structure rotamer library energy function mutation search parameters doDEE energy minimization (MinDEE, BD, BRDEE) DACS 1 MET GLY ASP ARG FCL unMinE: minE: bestE: 2 MET GLY ASP MET FCL unMinE: minE: bestE: 3 MET GLY ASP ARG FCL unMinE: minE: bestE: 1 MET GLY SER ARG FCL unMinE: minE: bestE: 2 MET GLY SER ARG FCL unMinE: minE: bestE:

GMEC-based genStructDEE rank input structure rotamer library
java -Xmx1024M KStar -c KStar.cfg genStructDEE System.cfg GenStruct.cfg input structure rotamer library energy function struct generation parameters genStructDEE energy minimization (MinDEE, BD, BRDEE) 1 MET GLY SER ARG FCL unMinE: minE: bestE: 2 MET GLY SER ARG FCL unMinE: minE: bestE: 3 MET GLY ASP ARG FCL unMinE: minE: bestE: 1 MET GLY ASP ARG FCL unMinE: minE: bestE: 2 MET GLY ASP MET FCL unMinE: minE: bestE: 3 MET GLY ASP ARG FCL unMinE: minE: bestE: 1 MET GLY SER ARG FCL unMinE: minE: bestE: 2 MET GLY SER ARG FCL unMinE: minE: bestE: rank

Ensemble-based: Protein-ligand binding
mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi KSMaster System.cfg MutSearch.cfg bound structure rotamer library energy function K* mutation search parameters KSMaster energy minimization (MinDEE, BD, BRDEE) doSinglePartFn E+24 ILE TRP ILE ALA ALA ILE E+24 TRP ASP ILE GLY ALA ILE E+24 ILE THR ILE PHE ALA ILE E+24 VAL THR ILE PHE ALA ILE E+24 ILE THR ILE TYR ALA ILE

Residue entropy doResEntropy input structure rotamer library
mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi doResEntropy System.cfg ResEntropy.cfg input structure rotamer library energy function mutation search parameters doResEntropy entropy res ID # prox res AA probabilities

Some important parameters
mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi -c KStar.cfg doDEE System.cfg DEE.cfg KStar.cfg: hElect true hVDW false hSteric false distDepDielect true dielectConst 6.0 vdwMult 0.95 doDihedE true doSolvationE true solvScale 0.8 stericThresh 0.4 softStericThresh 1.5 rotFile LovellRotamer.dat grotFile GenericRotamers.dat volFile AAVolumes.dat energy function steric filter rotamer libraries volume filter

mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi -c KStar.cfg doDEE System.cfg DEE.cfg System.cfg: pdbName 1amuFH.pdb numInAS 4 residueMap pdbLigNum 566 ligAA false numCofRes 1 cofMap 567 input pdb design site ligand cofactor

mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi -c KStar.cfg doDEE System.cfg DEE.cfg DEE.cfg (partial): doDACS true distrDACS false initDepth 2 subDepth 1 diffFact 6 doMinimize false minimizeBB false doBackrubs false backrubFile none useEref true ligPresent false ligType none resAllowed0 gly ala val leu ile tyr phe trp met … resAllowed3 gly ala val leu ile tyr phe trp met resumeSearch false resumeFilename runInfo.out.partial DACS minimization reference energies ligand in search allowed mutations resuming

mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi KSMaster System.cfg MutSearch.cfg MutSearch.cfg (partial): mutFileName 1amuFCL_2MUT.mut numMutations 2 targetVolume 620.0 volumeWindow doMinimize false minimizeBB false doBackrubs false backrubFile none epsilon 0.03 gamma 0.01 repeatSearch true useUnboundStruct false unboundPdbName none resAllowed0 gly ala val leu ile tyr phe trp met resumeSearch false resumeFilename 1amuFCL_MutSearch.partial volume filter/ candidate mutants minimization (1-ε) accuracy inter-mutation at most 1 repeat unbound struct allowed mutations resuming

Citing OSPREY General citation: K* and MinDEE: BD: BRDEE: DACS:
Original K* publication:

OSPREY is open source!!!

Acknowledgements Funding: Bruce Donald Ryan Lilien Faisal Reza
Kyle Roberts Daniel Keedy Pablo Gainza Donald Lab Funding: NIH

Ivelin Georgiev Bruce Donald Donald Lab Duke University

Similar presentations

Presentation on theme: "Ivelin Georgiev Bruce Donald Donald Lab Duke University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ivelin Georgiev Bruce Donald Donald Lab Duke University

Similar presentations

Presentation on theme: "Ivelin Georgiev Bruce Donald Donald Lab Duke University"— Presentation transcript:

Similar presentations

About project

Feedback