9. Protein Docking.

9. Protein Docking

Prediction of protein-protein interactions
How do proteins interact? Can we predict and manipulate those interactions? Prediction of Structure – Docking Prediction of Binding Design – creation of new interactions

de novo Structure Prediction (ROSETTA) Docking (ROSETTADOCK)
Docking vs. ab initio modeling de novo Structure Prediction (ROSETTA) Docking (ROSETTADOCK) ADEFFGKLSTKK……. Sequence Monomers + Building Blocks: backbone & side chains Rigid body degrees of freedom 3 translation 3 rotation CASP CAPRI Structure Complex

Monomers change structure upon binding to partner
+ = Solution 1: Tolerate clashes Fast Weak discrimination of correct solution + = Solution 2: Model changes + = Slow Precise

Protein-protein docking
Sampling strategies Fast detection of shape complementarity Fast Fourier Transform (FFT): Cluspro Geometric hashing: PatchDock High-resolution docking: model changes explicitly 3. Rosettadock Data-driven docking Haddock Template-based docking COTH

Find shape complementarity: 1. Fast Fourier Transform (FFT)
Ephraim Katzir +

Find shape complementarity - FFT
Ephraim Katzir

Find shape complementarity: Fast Fourier Transform (FFT)
Ephraim Katzir Test all possible positions of ligand and receptor: For each rotation of ligand (R) evaluate all translations (T) of ligand grid over receptor grid Y Translation Correlation X Translation Discrete convolution AXB = iFFT(FFT(A)*FFT(B)) z = correlation product: can be calculated by FFT

Ephraim Katzir correlation product can be calculated by FFT: Discrete convolution AXB = iFFT( FFT(A) * FFT(B) ) Discrete convolution AXB = iFFT(FFT(A)*FFT(B))

Ephraim Katzir R R Fast Fourier Transform A=DFT(a) Discretize Correlation function C=A*B S=iDFT(C) L Fast Fourier Transform B=DFT(b) L Rotate L Discretize Surface Interior 1 <0 for R >0 for L From

Y Translation Correlation X Translation Increases the speed by 107 L R IFFT Surface Interior Binding Site From

Some FFT-based docking protocols
Zdock (Weng) Cluspro & PIPER (Vajda, Camacho, Kozakov) Molfit (Eisenstein) DOT (TenEyck) HEX (Ritchie) – FFT in rotation space

Piper – improved discrimination by cluster size
Low energy conformations will cluster Clusters -> representatives Clusters size ~ “region of attraction” ~ entropic contributions to free energy Pk = Zk / Z, Z = Σj exp(-Ej /RT), Zk = Σj exp(-Ej /RT) Z = N exp(-E /RT), Zk = K exp(-E /RT), Pk = Zk / Z= K / N Kozakov Biophysical J., 2005, 89:867.

Piper – improved discrimination by cluster stability
Combine FFT (PIPER) with MC (Rosetta; see later in lecture ): Re-sampling by Monte Carlo Minimization 7 Å Stable cluster Unstable cluster Kozakov. Proteins, 72 :993, 2008

Shape complementarity: 2
Shape complementarity: 2. Geometric hashing (patchdock, Wolfson & Nussinov) Matching of puzzle pieces Define geometric patches (concave, convex, flat) Surface patch matching Filtering and scoring From

Hashing: alpha shapes Formalizes the idea of “shape”
In 2D an “edge” between two points is “alpha-exposed” if there exists a circle of radius alpha such that the two points lie on the surface of the circle and the circle contains no other points from the point set

Hashing – sparse surface representation
Slide from Jens Meiler

Docking with geometric hashing
PATCHDOCK (by Dina Schneidmann ) Fast and versatile approach Speed allows easy extension to multiple protein docking, flexible hinge docking, etc A extension of this protocol, FIREDOCK, includes side chain optimization (RosettaDock-like) – very flexible, fast and accurate protocol

3. Explicit modeling of conformational changes in the monomers - High-resolution docking with Rosettadock Random Start Position Random Start Position Low-Resolution Monte Carlo Search Filters High-Resolution Refinement Predictions Clustering 105

Choosing starting orientations
Euler angles are independent and guarantee non-biased search Global search Random Translation Random Rotation (Euler Angles) STARTING perturbation values: -dock_pert n p r Normal perturbation (A) def 3 Parallel perturbation (A) def 8 Rotation perturbation (degrees) def 8 (gaussian distribution with zero mean and pert std) Low res perturb values: 10,50,0.7,5.0 (10 cycles of 50 steps; 0.7 translational magnitude, 5.0 rotational magnitude) initial magnitudes are adapted acc to success rate (high success: larger magnitudes: *1.1; low success rate (<0.5): smaller values: *0.9 High res perturb values: 50 cycles; 0.1 translation 0.05 rad rotation (around xyz axis) (~ 3degrees) Minimization only if delta score –best <15 Full sc repack every 8th cycle (followed by rtmin) Tilt direction [0..360o] Tilt angle [0:90o] Spin angle [0..360o]

Choosing starting orientations
Local Refinement Translation 3Å normal, 8Å parallel Rotation 80 Tilt direction [0±8o] Tilt angle Spin angle

Overview of docking algorithm
Random Start Position Low-Resolution Monte Carlo Search Filters High-Resolution Refinement Predictions Clustering 105

Low-resolution search
Perturbation Monte Carlo search Rigid body translations and rotations Residue-scale interaction potentials Protein representation: backbone atoms + average centroids Low res perturb values: 10,50,0.7,5.0 (10 cycles of 50 steps; 0.7 translational magnitude, 5.0 rotational magnitude) initial magnitudes are adapted acc to success rate (high success: larger magnitudes: *1.1; low success rate (<0.5): smaller values: *0.9 Mimics physical diffusion process

Random Start Position Low-Resolution Monte Carlo Search Filters High-Resolution Refinement Predictions Clustering 105

High resolution optimization: Monte Carlo with Minimization (MCM)
Cycles of iterative optimization START Random perturbation Side chain optimization Energy Rigid body orientations Random perturbation Side chain optimization Rigid body minimization In the high-resolution refinement step the side chain atoms are added back. Side chains are selected from a rotamer library that contains a discrete set of likely conformations. A full atom energy function calibrated on different Rosetta protocols is now used to evaluate and optimize models. It contains mainly van der waals attractive and repulsive terms, and in addition accounts for solvation and hydrogen bonding. We then use monte-carlo with minimization, in short MCM, to search the minimum energy conformation in the vast space of possible configurations. HANDS Let’s try to find the nearby optimum of a given RB orientation. For this we make a small change in the RB orientation. This might lead to clashes. Rearrangement of the side chains at the interface can sometimes relieve those clashes. Only then we further optimize the RB orientation. This mimics conformational changes upon binding at the side chain level. At the end of such a step, we evaluate whether a better conformation has been obtained. Let’s look at how the monte-carlo cycle that I just described looks on the energy landscape: Each of the possible rigid body orientations of the two monomers can be characterized by its energy, and its distance to some arbitrary reference. Each monte carlo cycle starts with a small change of the protein’s relative orientation in our search for the minimum. Instead of minimizing the RB orientation directly however, we first allow the monomer conformation to adapt to the new orientation, by optimizing the side chain conformations. This allows to relieve a clash that involves the side chains at the interface. This of course also changes the energy landscape: The energy of a given RB orientation is changed if the monomer conformations are different. We now optimize the rb orientation by deepest descent minimization. This locates a minimum that might not have been apparent in the previous landscape. Finally, we apply the MC criterion, accepting in addition to downward moves also moves that increase the energy, by a boltzman-dependent probability. Repeated cycles of this procedure allow us to efficiently find the minimum energy conformation near the starting point. MC Rigid body minimization FINISH

Random Start Position Filters Low-Resolution Monte Carlo Search High-Resolution Refinement Clustering Predictions 105

Clustering Compare all top-scoring decoys pairwise
Cluster decoys hierarchically Decoys within e.g. 2.5Å form a cluster Represents ENTROPY

Assessment 1: Benchmark studies
Benchmark set contains 54 targets for which bound and unbound structures are known Bound-Bound Start with bound complex structure, but remove the side chain configurations so they must be predicted α-chymotrypsin + inhibitor subtilisin + inhibitor trypsin + inhibitor barnase + barstar Unbound-Unbound Start with the individually-crystallized component proteins in their unbound conformation Bound-Unbound (Semibound) lysozyme + antibodies hemagglutinin + antibody actin + deoxyribonuclease I subtilisin + prosegment

Assessment of method on benchmark (54 proteins, Gray et al., 2003)
Overall performance funnel - 3/5 top-scoring models within 5A rmsd Bound Docking Perturbation1 42/54 Unbound Docking Perturbation2 32/54 Unbound Docking Global3 28/32 …….. More than three of top five decoys (by score) that have rmsd less than 5 Å More than three of top five decoys (by score) that predict more than 25% native residue contacts The rank of the first cluster with >25% native residue contacts

Limitation of “rotamer-based” modeling
Near-native model with clash Non-native model without clash Trp 172 Trp 215 Orange and red: native complex; Blue: docking model. PDB code: 1CHO

RosettaDock simulation
1 model/simulation: energy vs RMSD (structural similarity to starting model) Final model selected based on energy (and/or sample density) At the end of a simulation, we have optimized a large number of random orientations, each yielding a model that is characterized by its energy (y-axis) and its distance to the starting model (x-axis). The latter is measured as the RMSD between the ligands (the second partner) when the receptor structure is superimposed, or by looking at the same measure calculated for the interface residues only. This is a typical plot. The background contains a large number of different orientations, all with similar energy, while a small number of similar models that contain significantly better energy values are clustered together at a similar RMSD value. This is the signature of a likely correct prediction, as will become clear in the following. The low-energy conformations that are significantly lower in energy are then further characterized by local optimization. Energy Rigid body orientations: RMSD to arbitrary starting structure (Å)

RosettaDock simulation
Initial Search Refinement Energy (Å) RMSD to arbitrary starting structure RMSD to starting structure of refinement

CAPRI Target 12 Cohesin-Dockerin
Side chain flexibility is important CAPRI Target 12 Cohesin-Dockerin Dockerin 0.27Å interface rmsd 87% native contacts 6% wrong contacts Overall rank 1 Cohesin red,orange– xray blue – model; green – unbound Carvalho et. al (2003) PNAS

Details of T12 interface Dockerin Cohesin red,orange– xray
blue - model

Similar landscapes for different Rosetta predictions
Docking energy landscape Folding energy landscape Energy function describes well principles underlying the correct structure of monomers and complexes Its not trivial that two (or three) pretty different questions can be addressed by the same energy function, and this indicates that it adequately describes some basic physical principles that govern the protein structure and protein-protein interactions.

A Challenging Target RF1-HEMK (T20)
Challenge: Large complex RF1 to be modeled from RF2 Disordered Q-loop Hope: Q235 methylated A Gln analog in HemK crystal Strategy: Trimming – Docking – Loop Modeling - Refining Q252 RF1 loop1 loop2 Q-loop Q-235 HemK Q252 RF1 loop1 loop2 Q-loop Q-235 HemK Q252 Q-235 HemK RF1 loop1 loop2 Q-loop Q252 Q-235 HemK RF1 Keys to success: Location of interface with truncated protein Separate modeling of large conformational change in key loop

Prediction of large conformational change
Gln235 GLN235 C atom shift:14.13Ǻ to 3.91 Ǻ Q-loop global C rmsd: 11.8 Ǻ to 4.8 Ǻ I_rmsd 2.34 Ǻ F_nat 34.2% Red, orange – bound; Green,– unbound; Blue -- model

Docking with backbone minimization
2SNI N 1 C 1’ Fold tree Interface energy START random perturbation repack minimization FINISH Red: bound rigid Green: unbound rigid Blue: unbound flexible Interface RMSD # of “hits” in top 10 models Rigid-body Backbone Sidechain Docking Monte Carlo Minimization (MCM)

Docking with loop minimization
1 1’ C 2 2’ x Fold-tree Minimize rigid-body and loop simultaneously Correctly predicted loop conformation Flexible Docking All-atom energy Interface RMSD Red, orange – bound (1T6G, Sansen, S. et al, J.B.C.(2004)); Blue – model; Green – unbound (1UKR, Krengel U. et al, JMB (1996))

Docking with loop rebuilding
Ligand RMSD All-atom energy Bound rigid 1BTH unbound rigid unbound flexible loop

Flexible backbone protein–protein docking using ensembles
Incorporate backbone flexibility by using a set of different templates Generation of set of ensembles: with Rosetta relax protocol, from NMR ensembles, etc Fig. 1. Ligand ensemble generation and CS. (a) Crystal structures are idealized and then relaxed at low and high resolutions to generate an ensemble of 10 structures. (b) Conformers in an NMR ensemble are idealized and refined at high resolution only. (c) During low-resolution docking, conformers are superposed along the current conformer's interface residues, and a conformer is selected from a partition function using Boltzmann-weighted energies. Chaudhury & Gray, (2008)

Sampling among conformers during docking
Exchange between templates during protocol Fig. 2. The flexible docking algorithm. The low-resolution phase models the formation of an encounter complex, and the high-resolution phase models its transition to a bound complex. CS and CS/IF docking include the CS step (green box). IF and CS/IF docking include backbone minimization (orange box).

Evaluation of 4 different protocols
key-lock (KL) model rigid-backbone docking 2. conformer selection (CS) model ensemble docking algorithm induced fit (IF) model energy-gradient-based backbone minimization 4. combined conformer selection/induced fit (CS/IF) model Can teach us about the possible binding mechanism (e.g. induced fit vs key-lock) Histogram of hit quality (quality of the top-ranked decoy for all runs, with at least 5 of the 10 top-scoring decoys being of medium or high quality) for each docking method for (a) crystal structure targets and (b) NMR targets. CAPRI criteria: high-quality decoys are shown in brown, and medium-quality decoys are shown in orange. Brown: high-quality decoys Orange: medium-quality decoys

RosettaDock4.0 Improvement due to better sampling and scoring:
Adaptive conformer selection (ACS): Ensemble docking: conformer swapping rate adapted (few/many acceptances  more/less sampling) Replace low-resolution centroid score with Motif-Dock Score (MDS, using residue-pair transform score, RPX): Parameters: aa1,aa2, 6D transformation for superimposing N– Cα–C backbone atoms onto each other (for Cα < 10A) Lookup in pre-tabulated database Fig. 2. The flexible docking algorithm. The low-resolution phase models the formation of an encounter complex, and the high-resolution phase models its transition to a bound complex. CS and CS/IF docking include the CS step (green box). IF and CS/IF docking include backbone minimization (orange box). Marze et al. Bioinformatics 2018

RosettaDock4.0 Calibration: Use ensembles for ligand and receptor:
Rosetta Rosetta 4.0 Calibration: Use ensembles for ligand and receptor: 40 Backrub 30 relax 30 Normal mode  8x more near-native models retained after low-resolution step Fig. 2. The flexible docking algorithm. The low-resolution phase models the formation of an encounter complex, and the high-resolution phase models its transition to a bound complex. CS and CS/IF docking include the CS step (green box). IF and CS/IF docking include backbone minimization (orange box). Marze et al. Bioinformatics 2018

RosettaDock4.0 Improved docking at low-resolution step ( faster), resulting in overall improvement (mainly difficult cases) N5: # of near-native models (interface RMSD <5A) among top5 scoring Fig. 2. The flexible docking algorithm. The low-resolution phase models the formation of an encounter complex, and the high-resolution phase models its transition to a bound complex. CS and CS/IF docking include the CS step (green box). IF and CS/IF docking include backbone minimization (orange box). Marze et al. Bioinformatics 2018

RosettaDock - summary First program to introduce general (side chain) flexibility during docking Advanced the docking field towards unbiased high-resolution modeling Many other protocols have since then incorporated RosettaDock as a high-resolution final step Targeted introduction of backbone flexibility can improve modeling dramatically

4. Data-driven docking Challenges:
Large conformational space to sample Conformational changes of proteins upon binding Approach: restrict search space by previous information HADDOCK (High Ambiguity Driven protein-protein Docking)

Scheme of Haddock Bonvin, JACS 2003
Information about complex can be retrieved from several sources

Haddock computational scheme
Derive Ambiguous Interaction Restraints (AIRs): Active residues: involved in interaction, and solvent accessible Passive residues: neighbors of active residues Create CNS restraints file (Used in NMR structure determination) Rational: include AIRs in energy function find protein complex structure with minimum energy Similar to solving a structure by NMR Homology modeling with constraints (e.g. Modeler)

Overview of Haddock Start Position Rigid body energy minimization:
rotational minimization rotational & translational Align molecules if anisotropic data is available Satisfy maximum number of AIC Retain top200 Predictions Semi-flexible simulated annealing (SA) High temperature rigid body search Rigid body SA Semi-flexible SA with flexible side-chains at the interface Semi-flexible SA with fully flexible interface (both backbone and side-chains) Explain 105 Flexible explicit solvent refinement Improves energy ranking Clustering

5. Template-based docking: COTH
Szilágyi & Zhang (2014). Current Opinion in Structural Biology, 24:10 Mukherjee & Zhang (2011). Structure 19:955

Docking – Summary & Outlook
Efficient search using fast sampling techniques (e.g. FFT, Geometric hashing), or/and Restraints to relevant region (e.g. biological constraints, etc) Or: skip this and perform template-based docking  Challenge: conformational changes in the partners Introduction of flexibility improves model quality Full side chain flexibility (Rosetta) Targeted introduction of backbone flexibility Larger changes can be incorporated using techniques such as Normal Mode Analysis

9. Protein Docking.

Similar presentations

Presentation on theme: "9. Protein Docking."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

9. Protein Docking.

Similar presentations

Presentation on theme: "9. Protein Docking."— Presentation transcript:

Similar presentations

About project

Feedback