Ligand Building with ARP/wARP
Automated Model Building Given the native X-ray diffraction data and a phase-set To rapidly deliver a complete, accurate and error free model
Building Ligands from Dummy Atoms / Seed Points Back to about 2000: a side project for a PhD student
Nearest Neighbour Distance Distribution Given a coordinate error, the inter-atomic distances in a protein model change:
Fit that into that ! Building a Ligand into a Difference Map imagine: a ligand consisting of N atoms a density map containing M points the only thing to do is to correctly select N out of M !
A Simple Example: Select 3 out of 4 The task is to find an equilateral triangle Prior knowledge: edges should have a length 1.0 Å Reliability: error on data (distances) is 0.01 Å a b c d abcd a01.07 Å0.98 Å1.01 Å b 7 Å2.10 Å c 2 15 Å d 1 110 5 0 TriangleLog likelihoodProbability abc *
A Simple Example: Select 3 out of 4 The task is to find an equilateral triangle Prior knowledge: edges should have a length 1.0 Å Reliability: error on data (distances) is 0.01 Å a b c d TriangleLog likelihoodProbability abc * abd abcd a01.07 Å0.98 Å1.01 Å b 7 Å2.10 Å c 2 15 Å d 1 110 5 0
A Simple Example: Select 3 out of 4 The task is to find an equilateral triangle Prior knowledge: edges should have a length 1.0 Å Reliability: error on data (distances) is 0.01 Å a b c d TriangleLog likelihoodProbability abc * abd bcd abcd a01.07 Å0.98 Å1.01 Å b 7 Å2.10 Å c 2 15 Å d 1 110 5 0
A Simple Example: Select 3 out of 4 The task is to find an equilateral triangle Prior knowledge: edges should have a length 1.0 Å Reliability: error on data (distances) is 0.01 Å a b c d TriangleLog likelihoodProbability abc * abd bcd acd abcd a01.07 Å0.98 Å1.01 Å b 7 Å2.10 Å c 2 15 Å d 1 110 5 0
N atoms in the ligand molecule M points in a density map WXYZ ABCD Ligand Building as a Label Swapping Problem Sources of possible prior information: –Chemical composition of a ligand –Bonding distances –Angle bonded distances –Chirality –VdW interactions Combinatorial Explosion
Label Swapping Initial map349 grid points Complexity10 59 Sparse map58 grid points Complexity atoms molecule of retinoic acid Topological Extension (a branch and bound approach)
Retinoic acid - topological extension Topology of the sparse mapTopology of the ligand
Real Space Fit for Final Selection of the Model 22 atoms molecule of retinoic acid: among 100 “top” models: 21 are less than 0.5 Å r.m.s.d. from the final model the “best” model is 0.14 Å r.m.s.d. from the final model
MTZ file Protein without ligand Ligand Ligand Building Module in ARP/wARP 6.1 Take the largest object in the difference map Build the ligand there (label assignment) Real space refinement of the ligand
Ligand Building Module in ARP/wARP 6.1 Location unknownLocation known Single known ligand Yes (if the largest)No A ligand out of the list of expected ligands No Partially ordered ligand No
Working sample Ligand building Performance Assessment Run with default parameters - PDB and MTZ from the EDS - Ligand PDB from HICUP - Exclude DNA - Exclude ligands covalently bound to the chain - Exclude ligands with partial occupancies (3821 structures) Large-Scale Test Name-by-nameNearest neighbour Assume the PDB structure to be correct
Atomic scale (correctly built ligand into correct site) Ligand scale (correct site incorrectly built ligand) Protein scale (incorrect site) Accuracy of Ligand Building Process
Size of the Largest Ligand in the Working Sample 2981 structures with Ligand size structures
Dependence on Resolution of the Data
Dependence on Ligand Disorder B factors
Dependence on Ligand Disorder R.m.s.d (Ligand_Bfactors)
Dependence on Ligand Size
What is the Ligand Site / Largest Object ? Typically it is the largest set (cluster) of connected map points where the density is above a threshold It is however mostly the case that at different thresholds there are different (and even non-overlapping) clusters Take the largest object in the difference map Build the ligand there (label assignment) Real space refinement of the ligand
At each density threshold count the number of clusters. A maximum is reached at typically ~1.5 sigma density level. Density Clusters and a Fragmentation Tree
1ED5 (nitric oxide synthase), 1.8 Å resolution, Rfactor 21 % (with CNS) Ligands: 2 x HEM and NGR (N-omega-nitro-L-arginine) Fragmentation Tree: an Example
1ED5 (nitric oxide synthase), 1.8 Å resolution, Rfactor 21 % (with CNS) Ligands: 2 x HEM and NGR (N-omega-nitro-L-arginine) Fragmentation Tree: an Example
Looking for HEM, finding HEM Scoring of Density Clusters Looking for NGR, finding NGR Looking for NGR, finding HEMLooking for HEM, finding NGR
Selection of Correct Density Cluster
Other Lessons ? Take the largest object in the difference map Build the ligand there (label assignment) Real space refinement of the ligand
Ligand Building: ARP/wARP 6.1 and perspectives Location unknownLocation known Single known ligand Yes (if the largest) Yes No Yes A ligand out of the list of expected ligands No Yes No Yes Partially ordered ligand No May be
Developers EMBL Hamburg: Guillaume Evrard, Johan Hattne, Gerrit Langer, Venkat Parthasarathy, Tilo Strutz, Victor Lamzin and many in-house friends NKI Amsterdam: Serge Cohen, Diederick De Vries, Marouane Jelloul, Krista Joosten, Tassos Perrakis Former members and collaborators Richard Morris, Peter Zwart, Francisco Fernandez, Olga Kirillova, Matheos Kakaris, Gleb Bourenkov, Garib Murshudov, Alexei Vagin, Andrey Lebedev, Peter Briggs, Eleanor Dodson, Keith Wilson, Zbyszek Dauter, Gerard Klejwegt ARP/wARP - the people