EBI is an Outstation of the European Molecular Biology Laboratory. MSDchem and the chemistry of the wwPDB EMBO 22nd-26th September 2008 EMBL-EBI Hinxton UK
The PDB Chemical components PDB has more than the folding of standard polymers in 3-D It gives an insight of interesting special chemistry Bound ligands Modified aminoacids Non-standard chemical components are often the most interesting The PDB ligand dictionary has served for many years As the reference dictionary for the chemical definition of 3 letter codes in the PDB data
The ligand dictionary has been maintained by the curators in all wwPDB sites Problems were accumulated Duplicate entries Impossible chemistry The definition of what a 3 letter code represents was not clear and consistent Stereo-chemistry was ignored
The MSDchem database The database that supported the chemical component dictionary in the MSD. The curation team had an explicit clear definition about ligands, right from the start A distinct stereo-isomer; connectivity, bond orders, absolute stereo-descriptors of atoms and bonds This was reflected in the design and the implementation of the MSDChem database
The ligand identity Atom, elements, bonds and bond orders Atom and bond absolute stereo-descriptors (Cahn-Ingold-Prelog) Equivalent to a canonical stereo-smile or INCHI string MSDchem ligand definition DCF C4' R C3' S C1' R DCM C4' S C3' R C1' S
Other properties Atom names, and atom/bond ordering Representative coordinates Derived properties Aromatic bonds Smiles – INCHI strings Systematic names Idealised coordinates Rings – planes Atom Energy types
For known ligands coordinates are checked with ligand definition (Program DOHLC) Atom labeling is checked A new ligand may have to be defined For a new ligand Fundamental properties are checked Derived properties are generated Is it identical to an existing ligand with another code? (DOHLC) Ligand curation 3TH Not possible New ligand Actually it is 6CP
Improvement of the chemical dictionary A core task of the wwPDB remediation project Remaining issues and data errors were fixed Duplicate identical ligands No representative coordinates Wrong valences The definition of the ligand identity and the deviations were agreed among wwPDB The wwPDB invested significantly in this area with a new software toolkit (ChemComp) Replaced most of the MSDChem backend Ligands in the wwPDB
Additional investment in chemical software Use of chemical software packages CACTVS OpenEyes CORINA LexiChem MSDChem not a separate data resource Just loading of the wwPDB ligand dictionary in Oracle IUPAC atom names,deoxy-bases, better chemical names
Molecules too big to be a single chemical component Special chemistry (like metal complexes) Limitations of chemical software Legacy chemical components that is hard to deal with (like ions) Components that have never been fully observed Modified components Difficult Issues
Public pages for the wwPDB ligand dictionary Based on an Oracle database load Various search options Visualisation and navigation Exporting in other formats Has been running for almost 6 years Is used and referred by Ligand Depot (RCSB equivalent) ChEbi at EBI PubChem at NCBI HIC-Up and others The MSDChem web application
Statistics Daily average load of MSDChem ~ 400 queries ~ 100 distinct IP adresses
Most common case: search for a 3 letter code seen in a PDB file Search for a chemical name or part of it found in the literature All known names are searched Common, PDB Systematic A synonym Search following references
3 letter code Chemical name Common, PDB Systematic A synonym MSDChem search
Ligand details For every kind of search there is a result list Summary information Preview icon of the molecule Links to pages for every chemical component With detailed images Links for more information about atoms, bond etc. Various options for 3-D visualization Download options for common chemical formats
Results overview Ligand details Ligand overviewLigand details
Visualisation - Export Coordinates Ideal Representative Chemical formats PDB Molfile (SDF)
Searching for chemical composition Often aspects of composition are known but not the exact structure Like particular elements (metals etc.) Or particular chemical fragments User friendly expression building pages based on formula or fragments Visually browse through the results
Formula range Expression can be built with web form Example : O1-4 N3-100 F0 1 to 4 oxygens More than 3 nitrogens No Fluorine Anything else
Fragment search Web form Significant fragments Example : More than 2 benzimidazoles No piperazine Anything else
Searching for parts of structure An outline of the structure or of some characteristic part is known Looking for variants of molecules Load the known target and remove the unimportant parts Perform an sub graph search Looking for chemical components with similar fragments and localized chemistry Load the known target and perform a fingerprint search
Substructure search Applet to draw diagram Load and modify existing ligand May take a couple of minutes
Links to the PDB MSDchem searches strictly the reference dictionary But provides links to the PDB entries that include a ligand or a set of ligands From ligand details pages And from any query results page Links to the summary pages for the entries (MSD Atlas pages) Or instances of the ligands in entries along with their environment and interactions (MSDmotif)
Link to PDB From any result page Like a fragment search Link to PDB entries with such ligands
Link to Binding sites Details - interactions of these ligands in entries Statistics – search within results
Ligand index – download Download of the complete archive Compressed tar of Molfiles (SDF) CML (ChEBI style) MSDChem XML Relational database Just listings Smile strings – name
Summary The wwPDB ligand dictionary provides the chemistry of the PDB The MSDChem backend has been merged in the remediation project The state of the dictionary has improved The MSDChem web application provides searching of the dictionary Name Formula Substructure Fragments - similarity