Ligand Building with ARP/wARP. Automated Model Building Given the native X-ray diffraction data and a phase-set To rapidly deliver a complete, accurate.

Slides:



Advertisements
Similar presentations
Scientific & technical presentation Structure Visualization with MarvinSpace Oct 2006.
Advertisements

Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Gao Song 2010/04/27. Outline Concepts Problem definition Non-error Case Edge-error Case Disconnected Components Simulated Data Future Work.
Structure Outline Solve Structure Refine Structure and add all atoms
Ciarán Carolan Model Completion using ARP/wARP. What?? Ligands Nucleotides Solvent C. Carolan: Model Completion using ARP/wARPJune 13th,
Two cases of chemometrics application in protein crystallography European Molecular Biology Laboratory (EMBL), Hamburg, Germany Andrey Bogomolov.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
With thanks to Zhijun Wu An introduction to the algorithmic problems of Distance Geometry.
Docking of Protein Molecules
Macromolecular structure refinement Garib N Murshudov York Structural Biology Laboratory Chemistry Department University of York.
Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio Jude Shavlik
Chapter 3 (part 2) – Protein Function. Test Your Knowledge (True/False) All proteins bind to other molecules. Explain. What sort chemical interactions.
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
Structure validation Everything that can go wrong, will go wrong. Everything that could go wrong has gone wrong. Especially with something as complicated.
High Throughput Processing of the Structural Information of the Protein Data Bank Zoltán Szabadka, Vince Grolmusz Department of Computer Science Eötvös.
Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak.
A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Refinement with REFMAC
Comparative Evaluation of 11 Scoring Functions for Molekular Docking Authors: Renxiao Wang, Yipin Lu and Shaomeng Wang Presented by Florian Lenz.
Being a binding site: Characterizing Residue-Composition of Binding Sites on Proteins joint work with Zoltán Szabadka and Gábor Iván, Protein Information.
ClusPro: an automated docking and discrimination method for the prediction of protein complexes Stephen R. Comeau, David W.Gatchell, Sandor Vajda, and.
Introduction to Macromolecular X-ray Crystallography Biochem 300 Borden Lacy Print and online resources: Introduction to Macromolecular X-ray Crystallography,
A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006.
Conformational Sampling
28 Mar 06Automation1 Overview of developments within CCP4 Generation 1 ccp4i tasks Generation 2 isolated scripts / web service Generation 3 integrated.
Increasing the Value of Crystallographic Databases Derived knowledge bases Knowledge-based applications programs Data mining tools for protein-ligand complexes.
A Molecular Replacement Pipeline Garib Murshudov Chemistry Department, University of York 
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
Data quality and model parameterisation Martyn Winn CCP4, Daresbury Laboratory, U.K. Prague, April 2009.
Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.
Flexible Multi-scale Fitting of Atomic Structures into Low- resolution Electron Density Maps with Elastic Network Normal Mode Analysis Tama, Miyashita,
Coot Tools for Model Building and Validation
Applied common sense The why, what and how of validation (and what EM can learn of X-ray) Gerard J. Kleywegt Protein Data Bank in Europe EMBL-EBI, Cambridge,
Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)
SimBioSys Inc.© 2004http:// Conformational sampling in protein-ligand complex environment Zsolt Zsoldos SimBioSys Inc., © 2004 Contents:
Molecular Specification Anan Wu Typical Gaussian Input Molecular specification This input section mainly specifies the nuclear positions.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Structural constrains
University “Ss. Cyril and Methodus” SKOPJE Cluster-based MDS Algorithm for Nodes Localization in Wireless Sensor Networks Ass. Biljana Stojkoska.
EBI is an Outstation of the European Molecular Biology Laboratory. Sanchayita Sen, Ph.D. PDB Depositions Validation & Structure Quality.
Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Direct Use of Phase Information in Refmac Abingdon, University of Leiden P. Skubák.
Atomic structure model
Refinement is the process of adjusting an atomic model to:
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
Zhijun Wu Department of Mathematics Program on Bio-Informatics and Computational Biology Iowa State University Joint Work with Tauqir Bibi, Feng Cui, Qunfeng.
Automated Refinement (distinct from manual building) Two TERMS: E total = E data ( w data ) + E stereochemistry E data describes the difference between.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
Lecture 53: X-ray crystallography. Electrons deflect x-rays We try to recreate electron density from the x-ray diffraction pattern Each point in space.
Stony Brook Integrative Structural Biology Organization
Reduce the need for human intervention in protein model building
Douglas Kojetin, Ph.D. UC College of Medicine
CCP4 from a user perspective
Bayesian Refinement of Protein Functional Site Matching
1.b What are current best practices for selecting an initial target ligand atomic model(s) for structure refinement from X-ray diffraction data?
Prediction of Protein Structure and Function on a Proteomic Scale
Experimental phasing in Crank2 Pavol Skubak and Navraj Pannu Biophysical Structural Chemistry, Leiden University, The Netherlands
Volume 25, Issue 11, Pages e3 (November 2017)
Rosetta: De Novo determination of protein structure
Version 5.3 From SMILE string to dictionary (LIBCHECK): Now coot uses it Segment id is now used Automatic adjustment for weights Improved bond order extraction.
Analysis of crystal structures
Volume 19, Issue 10, Pages (October 2011)
Volume 17, Issue 7, Pages (July 2009)
Atomic-Level Protein Structure Refinement Using Fragment-Guided Molecular Dynamics Conformation Sampling  Jian Zhang, Yu Liang, Yang Zhang  Structure 
Assessing Exhaustiveness of Stochastic Sampling for Integrative Modeling of Macromolecular Structures  Shruthi Viswanath, Ilan E. Chemmama, Peter Cimermancic,
Presentation transcript:

Ligand Building with ARP/wARP

Automated Model Building Given the native X-ray diffraction data and a phase-set To rapidly deliver a complete, accurate and error free model

Building Ligands from Dummy Atoms / Seed Points Back to about 2000: a side project for a PhD student

Nearest Neighbour Distance Distribution Given a coordinate error, the inter-atomic distances in a protein model change:

Fit that into that ! Building a Ligand into a Difference Map imagine: a ligand consisting of N atoms a density map containing M points the only thing to do is to correctly select N out of M !

A Simple Example: Select 3 out of 4 The task is to find an equilateral triangle Prior knowledge: edges should have a length 1.0 Å Reliability: error on data (distances) is 0.01 Å a b c d abcd a01.07 Å0.98 Å1.01 Å b 7  Å2.10 Å c 2  15  Å d 1  110  5  0 TriangleLog likelihoodProbability abc *

A Simple Example: Select 3 out of 4 The task is to find an equilateral triangle Prior knowledge: edges should have a length 1.0 Å Reliability: error on data (distances) is 0.01 Å a b c d TriangleLog likelihoodProbability abc * abd abcd a01.07 Å0.98 Å1.01 Å b 7  Å2.10 Å c 2  15  Å d 1  110  5  0

A Simple Example: Select 3 out of 4 The task is to find an equilateral triangle Prior knowledge: edges should have a length 1.0 Å Reliability: error on data (distances) is 0.01 Å a b c d TriangleLog likelihoodProbability abc * abd bcd abcd a01.07 Å0.98 Å1.01 Å b 7  Å2.10 Å c 2  15  Å d 1  110  5  0

A Simple Example: Select 3 out of 4 The task is to find an equilateral triangle Prior knowledge: edges should have a length 1.0 Å Reliability: error on data (distances) is 0.01 Å a b c d TriangleLog likelihoodProbability abc * abd bcd acd abcd a01.07 Å0.98 Å1.01 Å b 7  Å2.10 Å c 2  15  Å d 1  110  5  0

N atoms in the ligand molecule M points in a density map WXYZ ABCD Ligand Building as a Label Swapping Problem Sources of possible prior information: –Chemical composition of a ligand –Bonding distances –Angle bonded distances –Chirality –VdW interactions Combinatorial Explosion

Label Swapping Initial map349 grid points Complexity10 59 Sparse map58 grid points Complexity atoms molecule of retinoic acid Topological Extension (a branch and bound approach)

Retinoic acid - topological extension Topology of the sparse mapTopology of the ligand

Real Space Fit for Final Selection of the Model 22 atoms molecule of retinoic acid: among 100 “top” models: 21 are less than 0.5 Å r.m.s.d. from the final model the “best” model is 0.14 Å r.m.s.d. from the final model

MTZ file Protein without ligand Ligand Ligand Building Module in ARP/wARP 6.1 Take the largest object in the difference map Build the ligand there (label assignment) Real space refinement of the ligand

Ligand Building Module in ARP/wARP 6.1 Location unknownLocation known Single known ligand Yes (if the largest)No A ligand out of the list of expected ligands No Partially ordered ligand No

Working sample Ligand building Performance Assessment Run with default parameters - PDB and MTZ from the EDS - Ligand PDB from HICUP - Exclude DNA - Exclude ligands covalently bound to the chain - Exclude ligands with partial occupancies (3821 structures) Large-Scale Test Name-by-nameNearest neighbour Assume the PDB structure to be correct

Atomic scale (correctly built ligand into correct site) Ligand scale (correct site incorrectly built ligand) Protein scale (incorrect site) Accuracy of Ligand Building Process

Size of the Largest Ligand in the Working Sample 2981 structures with Ligand size  structures

Dependence on Resolution of the Data

Dependence on Ligand Disorder B factors

Dependence on Ligand Disorder R.m.s.d (Ligand_Bfactors)

Dependence on Ligand Size

What is the Ligand Site / Largest Object ? Typically it is the largest set (cluster) of connected map points where the density is above a threshold It is however mostly the case that at different thresholds there are different (and even non-overlapping) clusters Take the largest object in the difference map Build the ligand there (label assignment) Real space refinement of the ligand

At each density threshold count the number of clusters. A maximum is reached at typically ~1.5 sigma density level. Density Clusters and a Fragmentation Tree

1ED5 (nitric oxide synthase), 1.8 Å resolution, Rfactor 21 % (with CNS) Ligands: 2 x HEM and NGR (N-omega-nitro-L-arginine) Fragmentation Tree: an Example

1ED5 (nitric oxide synthase), 1.8 Å resolution, Rfactor 21 % (with CNS) Ligands: 2 x HEM and NGR (N-omega-nitro-L-arginine) Fragmentation Tree: an Example

Looking for HEM, finding HEM Scoring of Density Clusters Looking for NGR, finding NGR Looking for NGR, finding HEMLooking for HEM, finding NGR

Selection of Correct Density Cluster

Other Lessons ? Take the largest object in the difference map Build the ligand there (label assignment) Real space refinement of the ligand

Ligand Building: ARP/wARP 6.1 and perspectives Location unknownLocation known Single known ligand Yes (if the largest) Yes No Yes A ligand out of the list of expected ligands No Yes No Yes Partially ordered ligand No May be

Developers EMBL Hamburg: Guillaume Evrard, Johan Hattne, Gerrit Langer, Venkat Parthasarathy, Tilo Strutz, Victor Lamzin and many in-house friends NKI Amsterdam: Serge Cohen, Diederick De Vries, Marouane Jelloul, Krista Joosten, Tassos Perrakis Former members and collaborators Richard Morris, Peter Zwart, Francisco Fernandez, Olga Kirillova, Matheos Kakaris, Gleb Bourenkov, Garib Murshudov, Alexei Vagin, Andrey Lebedev, Peter Briggs, Eleanor Dodson, Keith Wilson, Zbyszek Dauter, Gerard Klejwegt ARP/wARP - the people