Standardizer Molecular Cosmetics for Chemoinformatics György Pirok Nóra Máte István Cseh Szilárd Dóránt Péter Kovács Szabolcs Csepregi Ferenc Csizmadia
Why standardize structures? Canonicalisation Uniformization of structures without changing the chemical content to recognize duplicates, functional groups (aromatization, mesomers, tautomers,... ) Beautification Making the structures visually more attractive ( dearomatization, cleaning coordinates, wedge orientation,... ) Modification Conversion of structures by modifying its original content as a preparation step for further chemoinformatics tasks (transformations, removing stereo, removing R-groups,...). often difficult to categorize the standardization actions
Canonicalisation making hydrogens explicit converting to canonical mesomer form transforming to user defined mesomer form Hydrogens aromatizing Kekülé rings Resonant structures converting to canonical tautomer form removing user defined fragments transforming to user defined tautomer form Tautomers expanding stoichiometry Other removing small fragments making hydrogens implicit setting the chiral flag
Mesomers
Tautomers oxo-enol, enamine-imine
Tautomers pyridone-pyridol
Fragment removal
Specific counterion removal
Solvent removal
Stoichiometry expansion expanding salt stoichiometry
Stoichiometry expansion expanding reaction stoichiometry
Beautification calculating 2D coordinates Hydrogens converting aromatic rings to Kekülé format Resonant structures making hydrogens implicit Cleaning reallocating wedge bonds contracting/expanding/ ungrouping abbreviated and multiple groups Groups template based cleaning 3D geometry optimization
Template-based Cleaning 2D-coordinate calculation of macrocycles or bridged systems
query Template-based Cleaning aligning search results to the query
client Canonicalization During Database Import Relational Database input structures canonicalization configurationoriginal structurescanonicalized structures server Standardizer JChem Base / Cartridge
client Sending Query to the Database Relational Database server query structure canonicalization configurationcanonicalized query query is compared to the canonicalized structures Standardizer JChem Base / Cartridge
Displaying Result Structures Relational Database original structures server client beautification configuration beautified structures Standardizer JChem Base / Cartridge
Modification custom transformations +
API and command line interface Standardizer st = new Standardizer(new File("standardize.xml")); st.standardize(mol); standardize input.sdf -c config.xml -o output.smiles
Live Demonstration
Applications: Virtual Synthesis
Applications: Structure Databases
How can ChemAxon Help Free for non commercial websites Free for academic teaching and research Academic Package Free Academic Package to be extended to cover academic networks – campus-wide roll out
Acknowledments Ferenc Csizmadia Nóra Máté István Cseh Szabó Attila Szilárd Dóránt Péter Kovács Szabolcs Csepregi