Standards for Digital Data Representation 1) The IUPAC/NIST Chemical Identifier 2) IUPAC Terminology NSF Workshop Constructing a Kinetics Database NIST,

Slides:



Advertisements
Similar presentations
February 2013 Szilárd Dóránt Scientific & technical Presentation Pipeline Pilot Integration.
Advertisements

Scientific & technical presentation JChem Cartridge for Oracle
Pipeline Pilot Integration Szilard Dorant Solutions for Cheminformatics.
2008 Accelrys EUGM Pipelining ChemAxon Szilard Dorant Solutions for Cheminformatics.
Ch.1: Matter and Change 1.1 Chemistry.
In written names and formulas for ionic compounds, the cation appears first, followed by the anion. Section 3: Names and Formulas for Ionic Compounds K.
Structure and Bonding. Introduction Structure Determines Function Physical and chemical properties of a compound determined by 3-D structure.
Who were each of the following scientists? Democritus Dalton LavoisierMendeleev ThomsonMillikan RutherfordBohr Schrodinger.
Association Analysis (7) (Mining Graphs)
More on Hydrocarbons. Isomers: Simple definition: Different compounds with the same molecular formula Isomers Constitutional isomers (connectivity differences)
System Design and Analysis
How does CFT measure up? I. Colours of Transition Metal Complexes
September 2014, Version Szilárd Dóránt Scientific & technical Presentation Pipeline Pilot Integration.
Clicker What is the electron configuration of oxygen? A. 1s 2 2s 2 2p 4 B. 1s 2 2s 1 2p 5 C. 1s 2 2p 6 D. 1s 2 1s 6 E. Who dat?
Lewis Dot Structures Gateway to Understanding Molecular Structure.
Looking for Patterns in Chemical Reactivity. Elements and Compounds An element is a pure substance that cannon be broken down into simpler substances.
Lewis Dot Structures Quick Review.
Aniko T. Valko, Keymodule Ltd.
1 Chemical Structure Representation and Search Systems Lecture 2. Oct 30, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software.
Similarity Methods C371 Fall 2004.
Full structure. How do we simplify this? we need a set of rules that is consistent rules should be based on simple (even obvious) criteria rules need.
CHE 311 Organic Chemistry I
Today’s Quiz 1 1.What is ground-state electron configuration? 2.Define valence electrons and valence shell. 3.Explain the exceptions to the octet rule.
Ch. 3 HW- 3.18, 3.21, 3.32, 3.33, 3.38, 3.39, 3.43, 3.52, 3.53, 3.56, 3.59, 3.61.
© 2011 Pearson Education, Inc. 1 Organic Chemistry 6 th Edition Paula Yurkanis Bruice Chapter 6 The Reactions of Alkynes An Introduction to Multistep.
Introduction to Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Structure of chemical compounds Bonds and isomery Richard Vytášek 2008 Presentation is only for internal purposes of 2nd Medical faculty.
Representing Markush Structures from Patents and Combinatorial Libraries Dr John M. Barnard Scientific Director Digital Chemistry.
The Ideal Monatomic Gas. Canonical ensemble: N, V, T 2.
The Red Pill Roger Sayle, Geoff Skillman, Matthew Stahl Robert Tolbert OpenEye Scientific Software.
1 Cheminformatics David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
EPR Study of Vanadyl Complexes
EBI is an Outstation of the European Molecular Biology Laboratory. MSDchem and the chemistry of the wwPDB EMBO 22nd-26th September 2008 EMBL-EBI Hinxton.
Basic Chemistry The study of matter. Elements Simple substances composed of 1 type of atom Cannot be broken down by ordinary chemical means 96% of most.
ChEBI, text mining and ontological best practice Colin Batchelor Royal Society of Chemistry
Chapter 2 Relational Database Design and Normalization August
BONDING THEORIES SCH4U Grade 12 Chemistry. Lewis Theory of Bonding (1916) Key Points:  The noble gas electron configurations are most stable.  Stable.
Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.
12. Structure Determination: Mass Spectrometry and Infrared Spectroscopy Based on McMurry’s Organic Chemistry, 6 th edition.
Minerals – Chemistry Review. Minerals are made up of different chemical elements bound together.
EQ: What are the two types of chemical bonds and what makes each one unique?
The Scope of Organic Chemistry: An Overview Functional groups determine the reactivity of organic molecules Alkanes – No functional groups, only carbon.
Chemistry In this science we study matter and the changes it undergoes.
Structural formulas show the relative positions of atoms within a molecule. Section 3: Molecular Structures K What I Know W What I Want to Find Out L What.
June 2016, Version Scientific & technical Presentation Pipeline Pilot Integration.
Revision YEAR 11 ATOMIC STRUCTURE.  What is the atomic number of an element?  What is contained in the nucleus?  What does the word valence mean? 
General & Background InformationPractical & Useful DataDetailed, Original Research Encyclopedias Dictionaries Reference Texts Books Safety Information.
Structure of chemical compounds
Structural Formulas of OrganIc Compounds: Isomers
Bonding.
Pipeline pilot Components
The Groups of the Periodic table of elements
1st Semester Final Exam Outline Chapters 1-8 & 24
Today’s Quiz What is ground-state electron configuration?
Daylight and Discovery
Using the Slope Formula
Vocabulary Day!.
Chemical Bonding.
Chem Basics: Atoms Picture of individual atoms.
The halogens / Qualitative tests Module Enthalpy changes
Aniko T. Valko, Keymodule Ltd.
Models and Modelling in
Chemistry-Part 1 Inside the Atom
2.5 Lewis Structures: Resonance and
COVALENT BONDING.
InChI Open Education Resource
“I Can” Study Guide Unit 3-2
Section 3: Naming Compounds and Writing Formulas
Ch.1: Matter and Change 1.1 Chemistry.
Bonding TheorIES SCH4U Grade 12 Chemistry.
Presentation transcript:

Standards for Digital Data Representation 1) The IUPAC/NIST Chemical Identifier 2) IUPAC Terminology NSF Workshop Constructing a Kinetics Database NIST, April 19-20, 2004

Bad News: –There are more problems than you thought Good News: –NIST/IUPAC are trying to solve them for you The News

Data Tags STM – Scientific, Technical, Medical ‘Publication’ thermokinetics spectroscopy synthesis Chemistry

Data Tags IUPAC/NIST Chemical Identity – INChI Interdisciplinary Terms – Gold & Green STM – Scientific, Technical, Medical ‘Publication’ Chemistry

A Digital ‘Name’ for A Chemical Entity convert chemical structure to digital ‘signature’ To allow computers to: –Organize chemical data –Disseminate data (queries) –Manage quality control

Current Representations are Inadequate Drawing – for humans only CAS registry number –Arbitrary value (hard to find and confirm) –CAS Indexer may not match Specialist –Expensive, imprecise, incomplete, no hierarchy Connection Table –One compound – Many representations –Embedded ambiguities ‘Canonical’ Connection Table –No open standard

Reactive Intermediates Ions, radicals, excited states –In principle, no problem Equilibrated species –Must specify variability precisely Weakly bound complexes –OK if orientation is omitted Transition states –Maybe not necessary in data compilation

ChemWeb, 3/2002

Nature, May 23, 2002

Requirements Different compounds have different identifiers –All distinguishing structural information is included INChI - 1 INChI - 2 = =

Requirements One compound has only one identifier –Include only necessary information Same INChI = ==

Two Problems Chemicals –Fast isomerization (esp, H-atoms) –Unconventional connectivity Chemists –Differing conventions Depends on discipline, education and convenience –Imprecision/uncertainty

3 Steps to INChI Chemistry –‘Normalize’ Input Structure Implement chemical rules Math –‘Canonicalize’ (label the atoms) Equivalent atoms get the same label Format –‘Serialize’ Labeled Structure Output as character string (‘name’)

Normalize Simplify Divide structure into ‘layers’ –Each layer ‘refines’ structure Ignore ‘Electron Density’ –Ignore bond type and electron location Stereochemistry –sp 2 and sp 3 only –Free rotation around single bonds

formula connectivity stereo isotope Chemical Substances “Layers”

4 Connectivity ‘Sublayers’ Disconnect H-atoms and metals –Create skeleton Reconnect Fixed H-atoms –Represent multiple species Reconnect mobile H-atoms –A single species Reconnect metals-non-metal bonds –Represent bonds to metals

Ignore Electron Density Not required for compound identification –Represent ‘excited states’ Simplify representations –Delocalization, aromaticity, zwitterions, coordination …

Münchnones Simplify - Ignore Electrons

Mobile H-atom (Tautomer) Sublayer H-migration between 1,3 heteroatoms

Nitrobenzene

MSG tautomeric

MSG fixed

Ferrocene

Auxiliary Output Confirmation –Label stereogenic atoms –Identify equivalent atoms Warnings/Errors –Unusual valences –Unrecognized input ‘Reversibility’ –Coordinates –Bond/Charge Location

Testing - OK

Beta Testing

50 ms – 2 GHz PC Performance: Most Challenging NCI-NIH Structure

INChI FAQs How can you represent chemistry without electrons? –Chemistry is not represented, just identity –Whole molecule properties may be added (state, phase,..). Do big molecules have big INChIs? –Yes, just like systematic names How to handle other tautomer types, substructures,..? –Other software Is INChI reversible? –Partly - contains only data needed for ‘naming’ –Auxiliary fields can carry structure depiction information Is INChI extensible? –New layers can add refinement

Started Oct. 2002

/ Miloslav Nic, Jiri Jirat, Czech Republic

Converted - XML

My Point of View A forest of data dictionaries is growing –Horizontally and vertically We need to consider forest management Some day all reusable data will be tagged