Download presentation
Presentation is loading. Please wait.
1
9. Protein Docking
2
Prediction of protein-protein interactions
How do proteins interact? Can we predict and manipulate those interactions? Prediction of Structure – Docking Prediction of Binding Design – creation of new interactions
3
de novo Structure Prediction (ROSETTA) Docking (ROSETTADOCK)
Docking vs. ab initio modeling de novo Structure Prediction (ROSETTA) Docking (ROSETTADOCK) ADEFFGKLSTKK……. Sequence Monomers + Building Blocks: backbone & side chains Rigid body degrees of freedom 3 translation 3 rotation CASP CAPRI Structure Complex
4
Monomers change structure upon binding to partner
+ = Solution 1: Tolerate clashes Fast Weak discrimination of correct solution + = Solution 2: Model changes + = Slow Precise
5
Protein-protein docking
Sampling strategies Fast detection of shape complementarity Fast Fourier Transform (FFT): Cluspro Geometric hashing: PatchDock High-resolution docking: model changes explicitly 3. Rosettadock Data-driven docking Haddock Template-based docking COTH
6
Find shape complementarity: 1. Fast Fourier Transform (FFT)
Ephraim Katzir +
7
Find shape complementarity - FFT
Ephraim Katzir
8
Find shape complementarity: Fast Fourier Transform (FFT)
Ephraim Katzir Test all possible positions of ligand and receptor: For each rotation of ligand (R) evaluate all translations (T) of ligand grid over receptor grid Y Translation Correlation X Translation Discrete convolution AXB = iFFT(FFT(A)*FFT(B)) z = correlation product: can be calculated by FFT
9
Find shape complementarity: Fast Fourier Transform (FFT)
Ephraim Katzir correlation product can be calculated by FFT: Discrete convolution AXB = iFFT( FFT(A) * FFT(B) ) Discrete convolution AXB = iFFT(FFT(A)*FFT(B))
10
Find shape complementarity: Fast Fourier Transform (FFT)
Ephraim Katzir R R Fast Fourier Transform A=DFT(a) Discretize Correlation function C=A*B S=iDFT(C) L Fast Fourier Transform B=DFT(b) L Rotate L Discretize Surface Interior 1 <0 for R >0 for L From
11
Find shape complementarity: Fast Fourier Transform (FFT)
Y Translation Correlation X Translation Increases the speed by 107 L R IFFT Surface Interior Binding Site From
12
Some FFT-based docking protocols
Zdock (Weng) Cluspro & PIPER (Vajda, Camacho, Kozakov) Molfit (Eisenstein) DOT (TenEyck) HEX (Ritchie) – FFT in rotation space
13
Piper – improved discrimination by cluster size
Low energy conformations will cluster Clusters -> representatives Clusters size ~ “region of attraction” ~ entropic contributions to free energy Pk = Zk / Z, Z = Σj exp(-Ej /RT), Zk = Σj exp(-Ej /RT) Z = N exp(-E /RT), Zk = K exp(-E /RT), Pk = Zk / Z= K / N Kozakov Biophysical J., 2005, 89:867.
14
Piper – improved discrimination by cluster stability
Combine FFT (PIPER) with MC (Rosetta; see later in lecture ): Re-sampling by Monte Carlo Minimization 7 Å Stable cluster Unstable cluster Kozakov. Proteins, 72 :993, 2008
15
Shape complementarity: 2
Shape complementarity: 2. Geometric hashing (patchdock, Wolfson & Nussinov) Matching of puzzle pieces Define geometric patches (concave, convex, flat) Surface patch matching Filtering and scoring From
16
Hashing: alpha shapes Formalizes the idea of “shape”
In 2D an “edge” between two points is “alpha-exposed” if there exists a circle of radius alpha such that the two points lie on the surface of the circle and the circle contains no other points from the point set
17
Hashing – sparse surface representation
Slide from Jens Meiler
18
Docking with geometric hashing
PATCHDOCK (by Dina Schneidmann ) Fast and versatile approach Speed allows easy extension to multiple protein docking, flexible hinge docking, etc A extension of this protocol, FIREDOCK, includes side chain optimization (RosettaDock-like) – very flexible, fast and accurate protocol
19
3. Explicit modeling of conformational changes in the monomers - High-resolution docking with Rosettadock Random Start Position Random Start Position Low-Resolution Monte Carlo Search Filters High-Resolution Refinement Predictions Clustering 105
20
Choosing starting orientations
Euler angles are independent and guarantee non-biased search Global search Random Translation Random Rotation (Euler Angles) STARTING perturbation values: -dock_pert n p r Normal perturbation (A) def 3 Parallel perturbation (A) def 8 Rotation perturbation (degrees) def 8 (gaussian distribution with zero mean and pert std) Low res perturb values: 10,50,0.7,5.0 (10 cycles of 50 steps; 0.7 translational magnitude, 5.0 rotational magnitude) initial magnitudes are adapted acc to success rate (high success: larger magnitudes: *1.1; low success rate (<0.5): smaller values: *0.9 High res perturb values: 50 cycles; 0.1 translation 0.05 rad rotation (around xyz axis) (~ 3degrees) Minimization only if delta score –best <15 Full sc repack every 8th cycle (followed by rtmin) Tilt direction [0..360o] Tilt angle [0:90o] Spin angle [0..360o]
21
Choosing starting orientations
Local Refinement Translation 3Å normal, 8Å parallel Rotation 80 Tilt direction [0±8o] Tilt angle Spin angle
22
Overview of docking algorithm
Random Start Position Low-Resolution Monte Carlo Search Filters High-Resolution Refinement Predictions Clustering 105
23
Low-resolution search
Perturbation Monte Carlo search Rigid body translations and rotations Residue-scale interaction potentials Protein representation: backbone atoms + average centroids Low res perturb values: 10,50,0.7,5.0 (10 cycles of 50 steps; 0.7 translational magnitude, 5.0 rotational magnitude) initial magnitudes are adapted acc to success rate (high success: larger magnitudes: *1.1; low success rate (<0.5): smaller values: *0.9 Mimics physical diffusion process
24
Overview of docking algorithm
Random Start Position Low-Resolution Monte Carlo Search Filters High-Resolution Refinement Predictions Clustering 105
25
High resolution optimization: Monte Carlo with Minimization (MCM)
Cycles of iterative optimization START Random perturbation Side chain optimization Energy Rigid body orientations Random perturbation Side chain optimization Rigid body minimization In the high-resolution refinement step the side chain atoms are added back. Side chains are selected from a rotamer library that contains a discrete set of likely conformations. A full atom energy function calibrated on different Rosetta protocols is now used to evaluate and optimize models. It contains mainly van der waals attractive and repulsive terms, and in addition accounts for solvation and hydrogen bonding. We then use monte-carlo with minimization, in short MCM, to search the minimum energy conformation in the vast space of possible configurations. HANDS Let’s try to find the nearby optimum of a given RB orientation. For this we make a small change in the RB orientation. This might lead to clashes. Rearrangement of the side chains at the interface can sometimes relieve those clashes. Only then we further optimize the RB orientation. This mimics conformational changes upon binding at the side chain level. At the end of such a step, we evaluate whether a better conformation has been obtained. Let’s look at how the monte-carlo cycle that I just described looks on the energy landscape: Each of the possible rigid body orientations of the two monomers can be characterized by its energy, and its distance to some arbitrary reference. Each monte carlo cycle starts with a small change of the protein’s relative orientation in our search for the minimum. Instead of minimizing the RB orientation directly however, we first allow the monomer conformation to adapt to the new orientation, by optimizing the side chain conformations. This allows to relieve a clash that involves the side chains at the interface. This of course also changes the energy landscape: The energy of a given RB orientation is changed if the monomer conformations are different. We now optimize the rb orientation by deepest descent minimization. This locates a minimum that might not have been apparent in the previous landscape. Finally, we apply the MC criterion, accepting in addition to downward moves also moves that increase the energy, by a boltzman-dependent probability. Repeated cycles of this procedure allow us to efficiently find the minimum energy conformation near the starting point. MC Rigid body minimization FINISH
26
Overview of docking algorithm
Random Start Position Filters Low-Resolution Monte Carlo Search High-Resolution Refinement Clustering Predictions 105
27
Clustering Compare all top-scoring decoys pairwise
Cluster decoys hierarchically Decoys within e.g. 2.5Å form a cluster Represents ENTROPY
28
Assessment 1: Benchmark studies
Benchmark set contains 54 targets for which bound and unbound structures are known Bound-Bound Start with bound complex structure, but remove the side chain configurations so they must be predicted α-chymotrypsin + inhibitor subtilisin + inhibitor trypsin + inhibitor barnase + barstar Unbound-Unbound Start with the individually-crystallized component proteins in their unbound conformation Bound-Unbound (Semibound) lysozyme + antibodies hemagglutinin + antibody actin + deoxyribonuclease I subtilisin + prosegment
29
Assessment of method on benchmark (54 proteins, Gray et al., 2003)
Overall performance funnel - 3/5 top-scoring models within 5A rmsd Bound Docking Perturbation1 42/54 Unbound Docking Perturbation2 32/54 Unbound Docking Global3 28/32 …….. More than three of top five decoys (by score) that have rmsd less than 5 Å More than three of top five decoys (by score) that predict more than 25% native residue contacts The rank of the first cluster with >25% native residue contacts
30
Limitation of “rotamer-based” modeling
Near-native model with clash Non-native model without clash Trp 172 Trp 215 Orange and red: native complex; Blue: docking model. PDB code: 1CHO
31
RosettaDock simulation
1 model/simulation: energy vs RMSD (structural similarity to starting model) Final model selected based on energy (and/or sample density) At the end of a simulation, we have optimized a large number of random orientations, each yielding a model that is characterized by its energy (y-axis) and its distance to the starting model (x-axis). The latter is measured as the RMSD between the ligands (the second partner) when the receptor structure is superimposed, or by looking at the same measure calculated for the interface residues only. This is a typical plot. The background contains a large number of different orientations, all with similar energy, while a small number of similar models that contain significantly better energy values are clustered together at a similar RMSD value. This is the signature of a likely correct prediction, as will become clear in the following. The low-energy conformations that are significantly lower in energy are then further characterized by local optimization. Energy Rigid body orientations: RMSD to arbitrary starting structure (Å)
32
RosettaDock simulation
Initial Search Refinement Energy (Å) RMSD to arbitrary starting structure RMSD to starting structure of refinement
33
CAPRI Target 12 Cohesin-Dockerin
Side chain flexibility is important CAPRI Target 12 Cohesin-Dockerin Dockerin 0.27Å interface rmsd 87% native contacts 6% wrong contacts Overall rank 1 Cohesin red,orange– xray blue – model; green – unbound Carvalho et. al (2003) PNAS
34
Details of T12 interface Dockerin Cohesin red,orange– xray
blue - model
35
Similar landscapes for different Rosetta predictions
Docking energy landscape Folding energy landscape Energy function describes well principles underlying the correct structure of monomers and complexes Its not trivial that two (or three) pretty different questions can be addressed by the same energy function, and this indicates that it adequately describes some basic physical principles that govern the protein structure and protein-protein interactions.
36
A Challenging Target RF1-HEMK (T20)
Challenge: Large complex RF1 to be modeled from RF2 Disordered Q-loop Hope: Q235 methylated A Gln analog in HemK crystal Strategy: Trimming – Docking – Loop Modeling - Refining Q252 RF1 loop1 loop2 Q-loop Q-235 HemK Q252 RF1 loop1 loop2 Q-loop Q-235 HemK Q252 Q-235 HemK RF1 loop1 loop2 Q-loop Q252 Q-235 HemK RF1 Keys to success: Location of interface with truncated protein Separate modeling of large conformational change in key loop
37
Prediction of large conformational change
Gln235 GLN235 C atom shift:14.13Ǻ to 3.91 Ǻ Q-loop global C rmsd: 11.8 Ǻ to 4.8 Ǻ I_rmsd 2.34 Ǻ F_nat 34.2% Red, orange – bound; Green,– unbound; Blue -- model
38
Docking with backbone minimization
2SNI N 1 C 1’ Fold tree Interface energy START random perturbation repack minimization FINISH Red: bound rigid Green: unbound rigid Blue: unbound flexible Interface RMSD # of “hits” in top 10 models Rigid-body Backbone Sidechain Docking Monte Carlo Minimization (MCM)
39
Docking with loop minimization
1 1’ C 2 2’ x Fold-tree Minimize rigid-body and loop simultaneously Correctly predicted loop conformation Flexible Docking All-atom energy Interface RMSD Red, orange – bound (1T6G, Sansen, S. et al, J.B.C.(2004)); Blue – model; Green – unbound (1UKR, Krengel U. et al, JMB (1996))
40
Docking with loop rebuilding
Ligand RMSD All-atom energy Bound rigid 1BTH unbound rigid unbound flexible loop
41
Flexible backbone protein–protein docking using ensembles
Incorporate backbone flexibility by using a set of different templates Generation of set of ensembles: with Rosetta relax protocol, from NMR ensembles, etc Fig. 1. Ligand ensemble generation and CS. (a) Crystal structures are idealized and then relaxed at low and high resolutions to generate an ensemble of 10 structures. (b) Conformers in an NMR ensemble are idealized and refined at high resolution only. (c) During low-resolution docking, conformers are superposed along the current conformer's interface residues, and a conformer is selected from a partition function using Boltzmann-weighted energies. Chaudhury & Gray, (2008)
42
Sampling among conformers during docking
Exchange between templates during protocol Fig. 2. The flexible docking algorithm. The low-resolution phase models the formation of an encounter complex, and the high-resolution phase models its transition to a bound complex. CS and CS/IF docking include the CS step (green box). IF and CS/IF docking include backbone minimization (orange box).
43
Evaluation of 4 different protocols
key-lock (KL) model rigid-backbone docking 2. conformer selection (CS) model ensemble docking algorithm induced fit (IF) model energy-gradient-based backbone minimization 4. combined conformer selection/induced fit (CS/IF) model Can teach us about the possible binding mechanism (e.g. induced fit vs key-lock) Histogram of hit quality (quality of the top-ranked decoy for all runs, with at least 5 of the 10 top-scoring decoys being of medium or high quality) for each docking method for (a) crystal structure targets and (b) NMR targets. CAPRI criteria: high-quality decoys are shown in brown, and medium-quality decoys are shown in orange. Brown: high-quality decoys Orange: medium-quality decoys
44
RosettaDock4.0 Improvement due to better sampling and scoring:
Adaptive conformer selection (ACS): Ensemble docking: conformer swapping rate adapted (few/many acceptances more/less sampling) Replace low-resolution centroid score with Motif-Dock Score (MDS, using residue-pair transform score, RPX): Parameters: aa1,aa2, 6D transformation for superimposing N– Cα–C backbone atoms onto each other (for Cα < 10A) Lookup in pre-tabulated database Fig. 2. The flexible docking algorithm. The low-resolution phase models the formation of an encounter complex, and the high-resolution phase models its transition to a bound complex. CS and CS/IF docking include the CS step (green box). IF and CS/IF docking include backbone minimization (orange box). Marze et al. Bioinformatics 2018
45
RosettaDock4.0 Calibration: Use ensembles for ligand and receptor:
Rosetta Rosetta 4.0 Calibration: Use ensembles for ligand and receptor: 40 Backrub 30 relax 30 Normal mode 8x more near-native models retained after low-resolution step Fig. 2. The flexible docking algorithm. The low-resolution phase models the formation of an encounter complex, and the high-resolution phase models its transition to a bound complex. CS and CS/IF docking include the CS step (green box). IF and CS/IF docking include backbone minimization (orange box). Marze et al. Bioinformatics 2018
46
RosettaDock4.0 Improved docking at low-resolution step ( faster), resulting in overall improvement (mainly difficult cases) N5: # of near-native models (interface RMSD <5A) among top5 scoring Fig. 2. The flexible docking algorithm. The low-resolution phase models the formation of an encounter complex, and the high-resolution phase models its transition to a bound complex. CS and CS/IF docking include the CS step (green box). IF and CS/IF docking include backbone minimization (orange box). Marze et al. Bioinformatics 2018
47
RosettaDock - summary First program to introduce general (side chain) flexibility during docking Advanced the docking field towards unbiased high-resolution modeling Many other protocols have since then incorporated RosettaDock as a high-resolution final step Targeted introduction of backbone flexibility can improve modeling dramatically
48
4. Data-driven docking Challenges:
Large conformational space to sample Conformational changes of proteins upon binding Approach: restrict search space by previous information HADDOCK (High Ambiguity Driven protein-protein Docking)
49
Scheme of Haddock Bonvin, JACS 2003
Information about complex can be retrieved from several sources
50
Haddock computational scheme
Derive Ambiguous Interaction Restraints (AIRs): Active residues: involved in interaction, and solvent accessible Passive residues: neighbors of active residues Create CNS restraints file (Used in NMR structure determination) Rational: include AIRs in energy function find protein complex structure with minimum energy Similar to solving a structure by NMR Homology modeling with constraints (e.g. Modeler)
51
Overview of Haddock Start Position Rigid body energy minimization:
rotational minimization rotational & translational Align molecules if anisotropic data is available Satisfy maximum number of AIC Retain top200 Predictions Semi-flexible simulated annealing (SA) High temperature rigid body search Rigid body SA Semi-flexible SA with flexible side-chains at the interface Semi-flexible SA with fully flexible interface (both backbone and side-chains) Explain 105 Flexible explicit solvent refinement Improves energy ranking Clustering
52
5. Template-based docking: COTH
Szilágyi & Zhang (2014). Current Opinion in Structural Biology, 24:10 Mukherjee & Zhang (2011). Structure 19:955
53
Docking – Summary & Outlook
Efficient search using fast sampling techniques (e.g. FFT, Geometric hashing), or/and Restraints to relevant region (e.g. biological constraints, etc) Or: skip this and perform template-based docking Challenge: conformational changes in the partners Introduction of flexibility improves model quality Full side chain flexibility (Rosetta) Targeted introduction of backbone flexibility Larger changes can be incorporated using techniques such as Normal Mode Analysis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.