EBI is an Outstation of the European Molecular Biology Laboratory. MSDchem and the chemistry of the wwPDB EMBO 22nd-26th September 2008 EMBL-EBI Hinxton.

Slides:



Advertisements
Similar presentations
Scientific & technical presentation JChem Cartridge for Oracle
Advertisements

Version 5.3, April 2010 The ChemAxon Markush project overview and development discussion.
Solutions for Cheminformatics
Solutions for Cheminformatics
Configuration management
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
Mining Graphs.
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
EBI is an Outstation of the European Molecular Biology Laboratory. Chemoinformatics and Metabolism Paula de Matos.
Recent developments 1) Tests (outlier analysis) and Bug fixing ( with Paul) 2) Regeneration of Values of Bonds and Bond-angles existing all structures.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
Association Analysis (7) (Mining Graphs)
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
A Guide to Oracle9i1 Introduction To Forms Builder Chapter 5.
High Throughput Processing of the Structural Information of the Protein Data Bank Zoltán Szabadka, Vince Grolmusz Department of Computer Science Eötvös.
Management and Distribution of Chemical Data in the Protein Data Bank John Westbrook, Dimitris Dimitropoulos, Jasmine Young, Peter Rose, Philip E. Bourne.
Visualization By: Simon Luangsisombath. Canonical Visualization  Architectural modeling notations are ways to organize information  Canonical notation.
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson
Chemistry in Biology.
Protein Interfaces, Surfaces and Assemblies
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Bringing Structure to Biology: Small Molecules and the PDBe
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 8: Modelling Interactions and Behaviour.
PowerPoint 2003 – Level 1 Computer Concepts Cathy Horwitz April 25, 2011.
Internet Basics Dr. Norm Friesen June 22, Questions What is the Internet? What is the Web? How are they different? How do they work? How do they.
An Introduction to Software Architecture
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
SDF File analysis Creation, composition, checking.
EMBL-EBI Adel Golovin MSDsite The project is funded by the European Commission as the TEMBLOR, contract-no. QLRI-CT under the RTD programme.
The IUPAC Stability Constants Database (SC-Database) The definitive collection of all significant published metal-complex stability constants Title Structure.
Computer Science 101 Database Concepts. Database Collection of related data Models real world “universe” Reflects changes Specific purposes and audience.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Standards for Digital Data Representation 1) The IUPAC/NIST Chemical Identifier 2) IUPAC Terminology NSF Workshop Constructing a Kinetics Database NIST,
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Crystallographic Databases I590 Spring 2005 Based in part on slides from John C. Huffman.
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Lesson # 9 HP UCMDB 8.0 Essentials.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
Earth Chemistry Section 1 Section 1: Matter Preview Key Ideas Comparing Physical and Chemical Properties Properties of Matter Atomic Structure Parts of.
EMBL-EBI Chemistry & the PDB MSDchem Primary Developer: Dimitris Dimitropoulos.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
EMBL-EBI Dimitris Dimitropoulos MSD-mine. EMBL-EBI MSD-mine overview  Web application for online data analysis and mining  For the advanced MSDSD researcher.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe Search Services (PDBelite, PDBePro and BIObar) Sanchayita Sen, Ph.D. PDB Depositions.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
HELM 2.0 Toolkit Code Orientation. HELM 2.0 Package overview 2 HELM2NotationToolkit ChemistryToolkit ChemistryToolkitMarvinChemistryToolkitCDK HELMNotationParser.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
System Architecture CS 560. Project Design The requirements describe the function of a system as seen by the client. The software team must design a system.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Chemical Bonding (Sec 7.2 pg 176 – 181).
PDBe Protein Interfaces, Surfaces and Assemblies
Lesson # 9 HP UCMDB 8.0 Essentials
Project Objectives Publish to a remote server
PDBemotif A web based integrated search service to understand ligand binding and secondary structure properties in macromolecular structures.
Unified Modeling Language
Getting the Most out of the PDBe
Dimitris Dimitropoulos
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
Using JDeveloper.
An Introduction to Software Architecture
UML  UML stands for Unified Modeling Language. It is a standard which is mainly used for creating object- oriented, meaningful documentation models for.
Presentation transcript:

EBI is an Outstation of the European Molecular Biology Laboratory. MSDchem and the chemistry of the wwPDB EMBO 22nd-26th September 2008 EMBL-EBI Hinxton UK

The PDB Chemical components  PDB has more than the folding of standard polymers in 3-D  It gives an insight of interesting special chemistry  Bound ligands  Modified aminoacids  Non-standard chemical components are often the most interesting  The PDB ligand dictionary has served for many years  As the reference dictionary for the chemical definition of 3 letter codes in the PDB data

 The ligand dictionary has been maintained by the curators in all wwPDB sites  Problems were accumulated  Duplicate entries  Impossible chemistry  The definition of what a 3 letter code represents was not clear and consistent  Stereo-chemistry was ignored

The MSDchem database  The database that supported the chemical component dictionary in the MSD.  The curation team had an explicit clear definition about ligands, right from the start  A distinct stereo-isomer;  connectivity,  bond orders,  absolute stereo-descriptors of atoms and bonds  This was reflected in the design and the implementation of the MSDChem database

 The ligand identity  Atom, elements, bonds and bond orders  Atom and bond absolute stereo-descriptors (Cahn-Ingold-Prelog)  Equivalent to a canonical stereo-smile or INCHI string MSDchem ligand definition DCF C4' R C3' S C1' R DCM C4' S C3' R C1' S

 Other properties  Atom names, and atom/bond ordering  Representative coordinates  Derived properties  Aromatic bonds  Smiles – INCHI strings  Systematic names  Idealised coordinates  Rings – planes  Atom Energy types

 For known ligands coordinates are checked with ligand definition (Program DOHLC)  Atom labeling is checked  A new ligand may have to be defined  For a new ligand  Fundamental properties are checked  Derived properties are generated  Is it identical to an existing ligand with another code? (DOHLC) Ligand curation 3TH Not possible New ligand Actually it is 6CP

 Improvement of the chemical dictionary  A core task of the wwPDB remediation project  Remaining issues and data errors were fixed  Duplicate identical ligands  No representative coordinates  Wrong valences  The definition of the ligand identity and the deviations were agreed among wwPDB  The wwPDB invested significantly in this area with a new software toolkit (ChemComp)  Replaced most of the MSDChem backend Ligands in the wwPDB

 Additional investment in chemical software  Use of chemical software packages  CACTVS  OpenEyes  CORINA  LexiChem  MSDChem not a separate data resource  Just loading of the wwPDB ligand dictionary in Oracle  IUPAC atom names,deoxy-bases, better chemical names

 Molecules too big to be a single chemical component  Special chemistry (like metal complexes)  Limitations of chemical software  Legacy chemical components that is hard to deal with (like ions)  Components that have never been fully observed  Modified components Difficult Issues

 Public pages for the wwPDB ligand dictionary  Based on an Oracle database load  Various search options  Visualisation and navigation  Exporting in other formats  Has been running for almost 6 years  Is used and referred by  Ligand Depot (RCSB equivalent)  ChEbi at EBI  PubChem at NCBI  HIC-Up and others The MSDChem web application

Statistics  Daily average load of MSDChem  ~ 400 queries  ~ 100 distinct IP adresses

 Most common case: search for a 3 letter code seen in a PDB file  Search for a chemical name or part of it found in the literature  All known names are searched  Common, PDB  Systematic  A synonym Search following references

 3 letter code  Chemical name  Common, PDB  Systematic  A synonym MSDChem search

Ligand details  For every kind of search there is a result list  Summary information  Preview icon of the molecule  Links to pages for every chemical component  With detailed images  Links for more information about atoms, bond etc.  Various options for 3-D visualization  Download options for common chemical formats

Results overview Ligand details Ligand overviewLigand details

Visualisation - Export  Coordinates  Ideal  Representative  Chemical formats  PDB  Molfile (SDF)

Searching for chemical composition  Often aspects of composition are known but not the exact structure  Like particular elements (metals etc.)  Or particular chemical fragments  User friendly expression building pages based on formula or fragments  Visually browse through the results

Formula range  Expression can be built with web form  Example : O1-4 N3-100 F0  1 to 4 oxygens  More than 3 nitrogens  No Fluorine  Anything else

Fragment search  Web form  Significant fragments  Example :  More than 2 benzimidazoles  No piperazine  Anything else

Searching for parts of structure  An outline of the structure or of some characteristic part is known  Looking for variants of molecules  Load the known target and remove the unimportant parts  Perform an sub graph search  Looking for chemical components with similar fragments and localized chemistry  Load the known target and perform a fingerprint search

Substructure search  Applet to draw diagram  Load and modify existing ligand  May take a couple of minutes

Links to the PDB  MSDchem searches strictly the reference dictionary  But provides links to the PDB entries that include a ligand or a set of ligands  From ligand details pages  And from any query results page  Links to the summary pages for the entries (MSD Atlas pages)  Or instances of the ligands in entries along with their environment and interactions (MSDmotif)

Link to PDB  From any result page  Like a fragment search  Link to PDB entries with such ligands

Link to Binding sites  Details - interactions of these ligands in entries  Statistics – search within results

Ligand index – download  Download of the complete archive  Compressed tar of Molfiles (SDF)  CML (ChEBI style)  MSDChem XML  Relational database  Just listings  Smile strings – name

Summary  The wwPDB ligand dictionary provides the chemistry of the PDB  The MSDChem backend has been merged in the remediation project  The state of the dictionary has improved  The MSDChem web application provides searching of the dictionary  Name  Formula  Substructure  Fragments - similarity