Comparative Evaluation of 11 Scoring Functions for Molekular Docking Authors: Renxiao Wang, Yipin Lu and Shaomeng Wang Presented by Florian Lenz
Today‘s Docking Programs 1. Sampling 2. Selecting Scoring function are needed for both! –Guiding the sampling –Evaluating the results
Previous Studies Compared combinations of docking programs / scoring functions –one combination fails: blame the Scoring Function, the Docking Program, or the combination? –Even if all the functions are tested under the same conditions: A unmonitored sampling process could yield inadequate samples
Solution Only use ONE docking program, and a wide range of parameters Monitor the sampling results 100 different complexes Three kinds of tests: –Reproduce experimental determined structure –Reproduce experimental determined binding affinities –Describe a funnel shaped energy surface
Selecting the test cases Starting point: 230 complexes Only these with a resolution better then 2.5 Å are used (172) Creating a diverse ensemble (100)
Sampling AutoDock using Genetic Algorithms Protein-Conformation is fixed Ligand: –Every rotatable single bond may rotate –Flexibility of cyclic part is neglected –Translation: 0.5 Å, Rotation: 15°, Torsion: 15° Docking Box: 30x30x30 Å around the observed binding position For each complex: 100 sampled conformation and the „real“ conformation
Monitoring Repetition: Aim is not to find energy minimum, but to create a diverse test set –RMSD must cover a wide range (0 to 15 Å) –# of clusters between 30 and 70 –Enough results near the “real” position and meaningful conformations. Key Parameter: Length of the GA-Runs –Too short -> Results are too close to initial position –Too long -> Results enrich at very few clusters
Problems with too long/short runs For every complex, the numbers of generations have to be determined separately If even 200 generations don‘t lead to a satisfying result, the complex is discarded
Example for a monitored ensemble
The 11 scoring functions 3 force-field based: AutoDock, G-Score and D-Score 6 empirical: LigScore, PLP, LUDI, F- Score, ChemScore and X-Score Knowledge-based: PMF and DrugScore
First Tests: Docking Accuracy „How close is the ligand in the best scored solution to its “real” position?“
1. Tests: Docking Accuracy
Type of Interaction vs. Docking Accuracy (C VDW )(VDW) + (C H-bond )(HB) + (C hydrophobic )(HS) + (C rotor )(RT)+C 0
Consensus Scoring Example: 1st place with X-Score, 7th place with LigScore = ((1+7)/2=) 4th place X-Score+LigScore
2nd Test: Binding Affinity Prediction Compare the ranking by scores with the ranking of the free energies. Using Spearman Correlation: dj is the distance between the rank by score and the rank by free energy for complex number j Rs = 1 correspond to a perfect correlation Rs= -1 correspond to a perfect inverse correlation Rs = 0 correspond to a complete disorder
2nd Test: Binding Affinity Prediction Best Result: X-Score (Rs = th best result: G-Score (Rs = 0.569)
2nd Test: Binding Affinity Prediction
3rd Test: Funnel Shaped Energy Surface Theory stems from Protein Folding Ligand is guided by decreasing free energy Scoring functions should show a correlation between RMSD Value and score How does the Ligand reach the binding pocket of the Protein?
3rd Test: Funnel Shaped Energy Surface Example: PDB Entry 1cbx (Carboxypeptidase with Benzylsuccinate) X-Score (Rs: 0.877)LigScore (Rs: 0.135)
3rd Test: Funnel Shaped Energy Surface
Side Result: The Outliers In seven ensembles, none of the 11 function was able to pick a conformation with a RMSD below 2.0 Å Analysis of these shows the general problems of today’s scoring functions –Indirect interactions (1CLA, 2CLA, 3CLA) –Very shallow groove instead of binding pocket (1THA, 1RGL, 1TET)
Indirect Interactions In samples, water molecules are not included F-Score predicted that the ligand binds on the surface DrugScore, LigScore and PLP found another little hole in the protein to put the ligand in
Very shallow groove Correct “binding pocket” But only partial overlapping and wrong orientation
Most important results Empirical Function worked best in Docking Accuracy Consensus scoring of the six best functions greatly improves the success rate (above 80%) Prediction of Binding Affinities was less encouraging There are examples, to which none function could find a good solution to
Thank You