Dimitris Dimitropoulos

Slides:



Advertisements
Similar presentations
UGM, June, 2007 Presenting: Szabolcs Csepregi JChem Base and Cartridge latest.
Advertisements

Configuration management
What is Matter?.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
INTRODUCTION TO THE BEILSTEIN AND GMELIN DATABASES Margarete Bower Chemistry Library.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Recent developments 1) Tests (outlier analysis) and Bug fixing ( with Paul) 2) Regeneration of Values of Bonds and Bond-angles existing all structures.
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
High Throughput Processing of the Structural Information of the Protein Data Bank Zoltán Szabadka, Vince Grolmusz Department of Computer Science Eötvös.
Getting More from CrossFire Helen Schofield MIDAS and UMIST Stephen Briggs Beilstein Information Systems Paul Meehan MIDAS.
Management and Distribution of Chemical Data in the Protein Data Bank John Westbrook, Dimitris Dimitropoulos, Jasmine Young, Peter Rose, Philip E. Bourne.
Guide to Using Message Maker Robert Snelick National Institute of Standards & Technology (NIST) December 2005
Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson
Protein Interfaces, Surfaces and Assemblies
Database-Driven Web Sites, Second Edition1 Chapter 8 Processing ASP.NET Web Forms and Working With Server Controls.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Bringing Structure to Biology: Small Molecules and the PDBe
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
IDInstanceStatusSelectionDetailsScore LIGA505 LIGAND CODE MATCHED XYZ XYPA503CLOSE MATCH XYPA504NO MATCH MANA500PASSED GLC A501 PASSED NAG A502 PASSED.
EMBL-EBI Adel Golovin MSDsite The project is funded by the European Commission as the TEMBLOR, contract-no. QLRI-CT under the RTD programme.
The IUPAC Stability Constants Database (SC-Database) The definitive collection of all significant published metal-complex stability constants Title Structure.
Biochemistry Mincer/Scully. Chemistry Life functions because of chemical reactions. A chemical reaction is where atoms form with other atoms to make molecules.
Copyright © 2006 Knovel Corporation Streamline Your Science and Engineering Research
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
MSDmotif 1 Adel Golovin Protein Site and Motif search Biosapiense network of excellence.
Carbon Compounds Organic Chemistry. Structural Models and Diagrams Used to show the structure of the atoms in the molecule Isomers: Different structures.
Section 2, Unit 2 Mixtures and Separation Techniques.
Data Integration and Management A PDB Perspective.
EBI is an Outstation of the European Molecular Biology Laboratory. MSDchem and the chemistry of the wwPDB EMBO 22nd-26th September 2008 EMBL-EBI Hinxton.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe
EMBL-EBI Chemistry & the PDB MSDchem Primary Developer: Dimitris Dimitropoulos.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
Functional Groups CHEMISTRY 11 MS. MCGRATH. Functional Groups A functional group is a portion of a molecule that is a recognizable / classified group.
EMBL-EBI Representative sets and Clustering.. EMBL-EBI Representative sets A subset of data that provides a statistically valid sample set for the complete.
Macromolecules. Objectives List the elements that make up living things. List the four kinds of macromolecules. Describe carbohydrates, lipids, fats and.
Use of Machine Learning in Chemoinformatics
Description and exemplification use of a Data Dictionary. A data dictionary is a catalogue of all data items in a system. The data dictionary stores details.
Mixture Pure substances REVIEW. Mixtures: 1.Two or more _____________or _____________ NOT chemically combined 2.No reaction between substances. 3.Mixtures.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
Section 6.2 Molar Mass and Percent Composition 1.To understand the definition of molar mass 2.To learn to convert between moles and mass 3.To learn to.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Ontology, RDF, SW for Chemical Structures
Sample Registration - Introduction
Data quality & VALIDATION
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
Section 3-3 Review Questions
Software Configuration Management
PDBemotif A web based integrated search service to understand ligand binding and secondary structure properties in macromolecular structures.
Carbon Based Molecules
Core LIMS Training: Advanced Administration
Classification of Matter
Service-centric Software Engineering
Daylight and Discovery
Classifying Matter: Atoms, Elements, & Molecules
Goal 5 – Elements and Compounds
Chemistry-Part 1 Inside the Atom
Objectives To understand the definition of molar mass
Activity #16- Classification of Matter Chart Make the chart below in your notebook: ATOM ELEMENT COMPOUND MIXTURE MOLECULE.
Unit 2: Chemistry Lesson 2: Classifying Matter Essential Questions: 1
Version 5.3 From SMILE string to dictionary (LIBCHECK): Now coot uses it Segment id is now used Automatic adjustment for weights Improved bond order extraction.
The molecules that form life.
Unit 2: Chemistry Lesson 2: Classifying Matter Essential Questions: 1
Chapter 1 Matter and Measurement
Chemistry.
Ch “Classifying Matter”
Contract Management Software 100% Cloud-Based ContraxAware provides you with a deep set of easy to use contract management features.
Presentation transcript:

Dimitris Dimitropoulos Chemistry & the PDB MSDchem

The chemical database

MSDchem ligand dictionary Complete, clean, up to date collection of all the chemical species and small molecules in the PDB A ligand in MSDchem is a complete, distinct stereo isomer of a chemical compound Atoms and element types Bonds and bond orders Stereo configuration of atoms and bonds in cases of stereo-isomers (R/S – E/Z) Atom names and coordinates are not fundamental properties XML defines neither the tag nor the grammar. This ensures easier working on the data that is being sent to and from the client. After creating some sample XML, I moved onto the next stage, styling the data within the tags

Role in the MSD database An integral component in the core of MSD database Relational reference from entities where a molecule or atom name is used in the PDB (protein residues and atoms) It is not possible for an ATOM line: HETATM 4342 C2 PLA 86 14.227 11.195 -8.256 1.00 67.95 C to be loaded if the “PLA” ligand is not defined or it does not include a “C2” atom. XML defines neither the tag nor the grammar. This ensures easier working on the data that is being sent to and from the client. After creating some sample XML, I moved onto the next stage, styling the data within the tags

Chemistry and PDB Eliminate chemical inconsistencies from new PDB entries Structure and derived properties of a ligand apply automatically to residues and bound molecules that reference it The basic structure is carefully determined during curation, and a rich set of derived attributes is calculated for each ligand Graph isomorphism is being applied to check the consistency of the PDB, taking stereo-configuration into account Old legacy PDB entries are chemically “corrected” when loaded in the MSD database In thousands of cases errors are identified and corrected, involving most of them times inconsistent naming or different stereo-configuration Exchanged in cooperation with RCSB and the wwPDB XML defines neither the tag nor the grammar. This ensures easier working on the data that is being sent to and from the client. After creating some sample XML, I moved onto the next stage, styling the data within the tags

More than just the PDB codes All ligands are modelled as separate inter-related ligands and the appropriate one is referenced No distinction is made in the PDB between ribo- and deoxyribonucleotides (all are identified with the same residue name i.e., A, C, G, T, U, I) Modified nucleic acids are given as +A etc regardless of modification No distinction between different topological variants (12 different variants can be found for HIS in PDB) XML defines neither the tag nor the grammar. This ensures easier working on the data that is being sent to and from the client. After creating some sample XML, I moved onto the next stage, styling the data within the tags

Derived information External scientific software (CACTVS, VEGA, CORINA, ACD-labs, CCP4, OELIB) together with in house development has been used to derive: Stereochemistry (R/S – E/Z) DCM C4' S C3' R C1' S DCF C4' R C3' S C1' R Smiles and detailed gifs Systematic IUPAC names XML defines neither the tag nor the grammar. This ensures easier working on the data that is being sent to and from the client. After creating some sample XML, I moved onto the next stage, styling the data within the tags THIOALANINE (ALT) CC(N)C(O)=S - C[C@H](N)C(O)=S (2S)-2-aminopropanethioic O-acid

Derived information Fingerprints: A bit string in hexadecimal form that indicates the presence or not of segments from predefined lists Useful for fast search and classification Different libraries of predefined lists can be set Currently calculated for the CACTVS library (500 segments) Molecule Segments BitString 1 Fingerprint: 2A XML defines neither the tag nor the grammar. This ensures easier working on the data that is being sent to and from the client. After creating some sample XML, I moved onto the next stage, styling the data within the tags

Search options By ligand code By ligand name or synonym By formula or formula range By non stereo substructure By non stereo superstructure By exact stereo or non stereo structure By fingerprint similarity XML defines neither the tag nor the grammar. This ensures easier working on the data that is being sent to and from the client. After creating some sample XML, I moved onto the next stage, styling the data within the tags

Results of ‘is superstructure of’ Click on EAA Results of ‘is superstructure of’

EAA details 3-chloro-phenol

Results Viewers

PDB residue KWT <chemComp> <code>KWT</code> <name>(1S,6BR,9AS,11R,11BR)-9A,11B-DIMETHYL-1-[(METHYLOXY)METHYL]-3,6,9-TRIOXO-1,6,6B,7,8,9,9A,10,11,11B-DECAHYDRO-3H-FURO[4,3,2-DE]INDENO[4,5-H][2]BENZOPYRAN-11-YL ACETATE</name> <nAtomsAll>55</nAtomsAll> <nAtomsNh>31</nAtomsNh> <overallCharge>0</overallCharge> <stereoSmiles>COC[C@H]1OC(=O)c2coc3C(=O)C4=C([C@@H](C[C@@]5(C)[C@H]4CCC5=O)OC(C)=O)[C@]1(C)c23</stereoSmiles> <systematicName>(1S,6bR,9aS,11R,11bR)-1-(methoxymethyl)-9a,11b-dimethyl-3,6,9-trioxo-1,6,6b,7,8,9,9a,10,11,11b-decahydro-3H-furo[4,3,2-de]indeno[4,5-h]isochromen-11-yl acetate</systematicName>

Future targets Identify and model protein inhibitors as ligands Pre-classify functional groups for ligands and ligand atoms based on substructure fragments. Optimise and boost the performance of substructure searches Enhance visualisation and integration with other MSD tools XML defines neither the tag nor the grammar. This ensures easier working on the data that is being sent to and from the client. After creating some sample XML, I moved onto the next stage, styling the data within the tags