Download presentation
Presentation is loading. Please wait.
Published byBlaise Greer Modified over 9 years ago
1
SDF File analysis Creation, composition, checking
2
Concerning chemical table files Chemical table files are files that contain information about chemicals Various formats RGfiles, Rxnfiles, RDfiles, XDfiles and Clipboard Molfile, SDF
3
MDL Molfile A file format for holding information about the atoms, bonds, connectivity and coordinates of a molecule Most cheminformatics and some computational softwares are able to read Standard version: V2000 Containing a header and a connection table
4
MDL Molfile content Generated by Molgen 5.0 11 9 0 0 0 0 -0.0666 -1.5989 0.0514 C 0 0 0 0 0 0 0 0 0 0 0 0 0 1.2913 -1.6184 -0.1221 C 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.9621 -1.2620 -0.9586 O 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.0783 1.8974 -0.4702 O 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.4844 1.6346 0.9333 O 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.5244 -1.8601 1.0528 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1.7535 -1.3543 -1.1238 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1.9833 -1.8974 0.7324 H 0 0 0 0 0 0 0 0 0 0 0 0 0 -1.9833 -1.2177 -0.8648 H 0 0 0 0 0 0 0 0 0 0 0 0 0 0.8090 1.5332 -0.8167 H 0 0 0 0 0 0 0 0 0 0 0 0 0 -1.3677 1.1615 1.1238 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 1 3 1 0 0 0 0 1 6 1 0 0 0 0 2 7 1 0 0 0 0 2 8 1 0 0 0 0 3 9 1 0 0 0 0 4 5 1 0 0 0 0 4 10 1 0 0 0 0 5 11 1 0 0 0 0 M END $$$$ 1-3Header 1Molecule name 2 User/Program/ Date/etc information 3Comment (blank) 4-25Connection table (Ctab) 4 Counts line: 11 atoms, 9 bonds,..., V2000 standard 5-15 Atom block (1 line for each atom): x, y, z, element, etc. 16-25 Bond block (1 line for each bond): 1st atom, 2nd atom, type, etc. 25M END 26$$$$ Delimiter character (only for SDF)
5
MDL SDF file SDF = structure-data file Wraps the molfile format
6
SDF content §1 – molecular informations./MinCheck/C2_H6_N0_O3_F0_S0_1.log OpenBabel04161413273D Gaussian 09 # G3MP2B3 Opt(Cartesian,Tight,CalcAll,MaxStep=1,MaxCycles=300) QCISD 11 9 0 0 0 0 0 0 0 0999 V2000 0.4466 -1.5390 0.0292 C 0 0 0 0 0 0 0 0 0 0 0 0 1.4790 -2.1676 -0.5273 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.2693 -0.5704 -0.6322 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.3941 2.0659 0.3307 O 0 0 0 0 0 0 0 0 0 0 0 0 -1.5836 1.3451 0.7668 O 0 0 0 0 0 0 0 0 0 0 0 0 0.1141 -1.7508 1.0446 H 0 0 0 0 0 0 0 0 0 0 0 0 1.7979 -1.9482 -1.5413 H 0 0 0 0 0 0 0 0 0 0 0 0 2.0238 -2.9170 0.0345 H 0 0 0 0 0 0 0 0 0 0 0 0 -1.0239 -0.2837 -0.0806 H 0 0 0 0 0 0 0 0 0 0 0 0 0.0506 1.3459 -0.1697 H 0 0 0 0 0 0 0 0 0 0 0 0 -2.2708 1.8377 0.2828 H 0 0 0 0 0 0 0 0 0 0 0 0 1 6 1 0 0 0 0 2 1 2 0 0 0 0 2 8 1 0 0 0 0 3 9 1 0 0 0 0 3 1 1 0 0 0 0 4 5 1 0 0 0 0 7 2 1 0 0 0 0 10 4 1 0 0 0 0 11 5 1 0 0 0 0 M END 1-3Header 1Filename 2 User/Program/ Date/etc information 3Command 4-25Connection table (Ctab) 4 Counts line: 11 atoms, 9 bonds,..., V2000 standard 5-15 Atom block (1 line for each atom): x, y, z, element, etc. 16-25 Bond block (1 line for each bond): 1st atom, 2nd atom, type, etc. 25M END
7
SDF content §2 – input and calculated parameters > 0.96 > C2H6O3 > 0 > 1 > 78.03169 > 27 > 1.475 > 14.133 1.731 1.655 > 49.1 59.1 80.1 182.8 222.6 335.5 460.0 529.6 663.0 762.0 812.3 911.3 928.1 944.3 1124.8 1287.3 1299.6 1321.8 1403.2 1483.7 1689.2 3041.9 3064.2 3147.0 3408.9 3472.7 3557.0 > 4.5 3.8 6.6 7.8 25.1 93.3 16.9 79.8 60.8 214.2 73.0 2.9 55.0 16.5 33.8 210.3 56.9 126.8 4.4 22.8 90.0 19.2 0.4 8.3 59.4 559.4 26.8 > 298.150 > 1.00000 > -269.7 > 363.4 > 98.9 Scale factor Stoichiometry Charge Multiplicity Molecular mass DegreeOfFreedom Permanent dipole moment ABC(cm-1) Scaled freq(cm-1) IR intensities(rel.) Temp(K) Pressure(atm) DfHg_G3MP2B3(kJ/mol) Scaled S(J/molK) UNScaled CV(J/molK)
8
SDF content §3 – molecular descriptors > 2;1-1-2;1-1-9;1-1-13;2-3-13; 2;1-1-2;1-2-13;2-1-9;2-1-13; 9;1-1-2;1-1-13;2-1-2;2-1-13; 8;1-1-8;1-1-13;2-1-13; 8;1-1-8;1-1-13;2-1-13; 13;1-1-2;2-1-2;2-1-9; 13;1-1-2;2-1-2;2-1-13; 13;1-1-2;2-1-2;2-1- 13; 13;1-1-9;2-1-2; 13;1-1-8;2-1-8; 13;1-1-8;2-1-8; > -C(-H(-C)-C(-H-H-C)-O(-H-C)) -C(-H(-C)-H(-C)-C(-H-C-O)) -O(-H(-O)-C(-H-C-O)) -O(-H(-O)-O(-H-O)) -H(-C(-H-C-O)) -H(-C(-H-H-C)) -H(-O(-H-C)) -H(-O(-H-O)) > C(=C)O.OO > 3 > InChI=1S/C2H4O.H2O2/c1-2-3;1-2/h2-3H,1H2;1-2H > JJZZTHKXWWHOAE-UHFFFAOYSA-N > CH;CHH;3OH[2,3;;;5] $$$$ MPD MNA SMI MolRT InChi InChiKey MCDL
9
Molecular fragment schemes Developed in the ’50s Screens (strutural keys, fingerprints) have been developed in the ’70s Generally they represent big strings can be stored effectively -> compressed Important role in providing efficient substructure searching capabilities in large chemical databases, in similarity searching, in clustering large data sets, in assessing chemical diversity, in conducting SAR and QSAR studies
10
Images of the optimized structure (depicted differently) GaussViewChemDraw www.chemicalize.orgwww.chemicalize.org (searched after InChI)
11
MPD (MOLPRINT 2D) MPD = Molecular Populational Dynamics A molecular similarity searching technique based on atom environments Atom environments are count vectors of heavy atoms present at a topological distance from each heavy atom of a molecule > 2;1-1-2;1-1-9;1-1-13;2-3-13; 2;1-1-2;1-2-13;2-1-9;2-1- 13; 9;1-1-2;1-1-13;2-1-2;2-1-13; 8;1-1-8;1-1-13;2-1-13; 8;1-1-8;1-1-13;2-1-13; 13;1-1-2;2-1-2;2-1-9; 13;1-1-2;2- 1-2;2-1-13; 13;1-1-2;2-1-2;2-1-13; 13;1-1-9;2-1-2; 13;1- 1-8;2-1-8; 13;1-1-8;2-1-8;
12
MNA MNA = Multilevel Neighbourhood of Atoms 2D molecular fragments suitable for use in QSAR modelling Output: a complete descriptor fingerprint per molecule Fragment: starting at the origin, each atom is appended to the descriptor immediately followed by a parenthesized list of its neighbours > -C(-H(-C)-C(-H-H-C)-O(-H-C)) -C(-H(-C)-H(-C)-C(-H-C-O)) -O(-H(-O)-C(-H-C-O)) -O(-H(-O)-O(-H-O)) -H(-C(-H-C-O)) -H(-C(-H-H-C)) -H(-O(-H-C)) -H(-O(-H-O))
13
SMILES (SMI) SMILES = Simplified Molecular Input Line Entry Specification A linear text format which can describe the connectivity and chirality of a molecule Specifically represents a valence model of a molecule, not a computer data structure, a mathematical abstraction, or an "actual substance" > C(=C)O.OO
14
MolRT (easter egg, it’s molarity…)
15
InChI InChI = International Chemical Identifier, A reliable computerized method to represent identities A representation of the chemical structure with details Simple, but unique identifier for molecules (like a barcode) Different layers separated with delimiters (/) Main layer Charge layer Stereochemical layer Isotopic layer Fixed-H layer Reconnected layer > InChI=1S/C2H4O.H2O2/c1-2-3;1-2/h2-3H,1H2;1-2H + = =
16
InChiKey A shortened and more browser-preferable form of InChI code Its lengths is fixed in 27 characters The first 14 represent the molecular skeleton/connectivity matrix Next layer contains 8+1 characters the first 8-character block encodes stereochemistry and isotopic substitution information +1 character defines the kind of InChIKey (S=standard, N=non-standard) Next character: used version of InChI Finishing character: protonation indicator > JJZZTHKXWWHOAE-UHFFFAOYSA-N
17
MCDL MCDL = Molecular Chemical Descriptor Language; firstly published in 2001 Developed for linear representation of structural and other chemical information for chemical databases Similar to InChI: both languages are modular, constitution, connectivity, and stereochemistry is represented by individual „modules” MCDL provides direct placement of hydrogen atoms, whereas InChI uses a separate block > CH;CHH;3OH[2,3;;;5]
18
Other useful links and references Todeschini, Roberto / Consonni, Viviana Molecular Descriptors for Chemoinformatics, 2., revised and enlarged Edition, 2009. ISBN 978-3-527-31852-0 - Wiley-VCH, Weinheim Bender A, Mussa HY, Glen RC, Reiling S.: Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance, J Chem Inf Comput Sci. 2004 Sep-Oct; 44(5):1708-18. Gakh AA, Burnett MN.: Modular Chemical Descriptor Language (MCDL): composition, connectivity, and supplementary modules, J Chem Inf Comput Sci. 2001 Nov-Dec; 41(6):1494-9. http://arxiv.org/ftp/arxiv/papers/1311/1311.3723.pdf http://openbabel.org/wiki/Multilevel_Neighborhoods_of_Atoms http://openbabel.org/wiki/SMILES http://www.daylight.com/meetings/summerschool98/course/dave/smiles- intro.html http://www.daylight.com/meetings/summerschool98/course/dave/smiles- intro.html http://www.inchi-trust.org/ (and references therein) http://www.inchi-trust.org/ http://www.iupac.org/home/publications/e-resources/inchi/download.html (and references therein) http://www.iupac.org/home/publications/e-resources/inchi/download.html http://www.chemspider.com/inchi-resolver/
19
Your objectives for today To check your.sdf file for two chosen isomers To collect all the codes To compare them with each other and find differences
20
Thank you for your attention!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.