Download presentation
Presentation is loading. Please wait.
Published byJasmin Walton Modified over 9 years ago
1
Standards for Digital Data Representation 1) The IUPAC/NIST Chemical Identifier 2) IUPAC Terminology NSF Workshop Constructing a Kinetics Database NIST, April 19-20, 2004
2
Bad News: –There are more problems than you thought Good News: –NIST/IUPAC are trying to solve them for you The News
3
Data Tags STM – Scientific, Technical, Medical ‘Publication’ thermokinetics spectroscopy synthesis Chemistry
4
Data Tags IUPAC/NIST Chemical Identity – INChI Interdisciplinary Terms – Gold & Green STM – Scientific, Technical, Medical ‘Publication’ Chemistry
6
A Digital ‘Name’ for A Chemical Entity convert chemical structure to digital ‘signature’ To allow computers to: –Organize chemical data –Disseminate data (queries) –Manage quality control
7
Current Representations are Inadequate Drawing – for humans only CAS registry number –Arbitrary value (hard to find and confirm) –CAS Indexer may not match Specialist –Expensive, imprecise, incomplete, no hierarchy Connection Table –One compound – Many representations –Embedded ambiguities ‘Canonical’ Connection Table –No open standard
8
Reactive Intermediates Ions, radicals, excited states –In principle, no problem Equilibrated species –Must specify variability precisely Weakly bound complexes –OK if orientation is omitted Transition states –Maybe not necessary in data compilation
9
ChemWeb, 3/2002
11
Nature, May 23, 2002
12
Requirements Different compounds have different identifiers –All distinguishing structural information is included INChI - 1 INChI - 2 = =
13
Requirements One compound has only one identifier –Include only necessary information Same INChI = ==
14
Two Problems Chemicals –Fast isomerization (esp, H-atoms) –Unconventional connectivity Chemists –Differing conventions Depends on discipline, education and convenience –Imprecision/uncertainty
15
3 Steps to INChI Chemistry –‘Normalize’ Input Structure Implement chemical rules Math –‘Canonicalize’ (label the atoms) Equivalent atoms get the same label Format –‘Serialize’ Labeled Structure Output as character string (‘name’)
16
Normalize Simplify Divide structure into ‘layers’ –Each layer ‘refines’ structure Ignore ‘Electron Density’ –Ignore bond type and electron location Stereochemistry –sp 2 and sp 3 only –Free rotation around single bonds
17
formula connectivity stereo isotope Chemical Substances “Layers”
18
4 Connectivity ‘Sublayers’ Disconnect H-atoms and metals –Create skeleton Reconnect Fixed H-atoms –Represent multiple species Reconnect mobile H-atoms –A single species Reconnect metals-non-metal bonds –Represent bonds to metals
19
Ignore Electron Density Not required for compound identification –Represent ‘excited states’ Simplify representations –Delocalization, aromaticity, zwitterions, coordination …
20
Münchnones Simplify - Ignore Electrons
21
Mobile H-atom (Tautomer) Sublayer H-migration between 1,3 heteroatoms
22
Nitrobenzene
23
MSG tautomeric
24
MSG fixed
25
Ferrocene
26
Auxiliary Output Confirmation –Label stereogenic atoms –Identify equivalent atoms Warnings/Errors –Unusual valences –Unrecognized input ‘Reversibility’ –Coordinates –Bond/Charge Location
27
Testing - OK
28
Beta Testing
29
50 ms – 2 GHz PC Performance: Most Challenging NCI-NIH Structure
30
INChI FAQs How can you represent chemistry without electrons? –Chemistry is not represented, just identity –Whole molecule properties may be added (state, phase,..). Do big molecules have big INChIs? –Yes, just like systematic names How to handle other tautomer types, substructures,..? –Other software Is INChI reversible? –Partly - contains only data needed for ‘naming’ –Auxiliary fields can carry structure depiction information Is INChI extensible? –New layers can add refinement
31
Started Oct. 2002
36
http://www.nicmila.org/Gold/Output / Miloslav Nic, Jiri Jirat, Czech Republic
37
Converted - XML
41
My Point of View A forest of data dictionaries is growing –Horizontally and vertically We need to consider forest management Some day all reusable data will be tagged
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.