AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria Joanna Jaworska Central Product Safety Procter and Gamble Belgium
QSAR May, LyonAMBIT is available online at Introduction – why AMBIT ? Limited free, publicly accessible, methodologically transparent software was identified as one of the roadblocks for broadening use of in-silico methods (ICCA Workshop in Setubal 2002, OECD) Realization that efficient use of existing information on chemicals requires better ways for Storage standardized formats, computer automated verification of structures, capability to store large amounts of data Taking advantage of rapidly evolving field of data mining and extraction of relevant information
QSAR May, LyonAMBIT is available online at Content Overview of AMBIT functional modules Technology choice and software capabilities Demonstration of the current state Web application Online similarity search Standalone applications Ambit Database Tools Descriptor search Experimental data search Similarity search Verhaar classification scheme AmbitDiscovery Applicability domain Grouping by different methods
QSAR May, LyonAMBIT is available online at Software overview Database Search engine Searches by (CAS, SMILES, Name) Substructure search Similarity Search EM9-1a,b, 2,3 Data import and export, Format Conversions EM9-1,2,3 Applicability domain EM9-1a Similarity assessment EM9-1b
QSAR May, LyonAMBIT is available online at AMBIT Database Today Not restricted to these datasets! Any dataset can be imported! (e.g. DSSTox, AQUIRE, LLNA dataset …)
QSAR May, LyonAMBIT is available online at AMBIT More about the internals… Open source, relying on open standards Modular approach Stand alone and web versions Implemented in Java, i.e. Platform independent (same application runs on Windows, Unix, Mac …) Suitable for web applications The cheminformatics functionality relies on the open source Java library – The Chemistry Development Kit The software is based on a Relational Database Management System Allows much faster and convenient access to the data in contrast to flat text files. Our choice is MySQL database ( which is the most popular open source relational database. Chemical Markup Language (CML) Acknowledged method of encoding chemical data in XML Acknowledged method of encoding chemical data in XML Being adopted by a large number of chemical organisations, from government, through commercial to academia. Being adopted by a large number of chemical organisations, from government, through commercial to academia. The choice of CML for the internal format makes the database independent of the software which is able to access it, in contrast to some proprietary solutions. The choice of CML for the internal format makes the database independent of the software which is able to access it, in contrast to some proprietary solutions.
QSAR May, LyonAMBIT is available online at AMBIT Information stored: Structures internally stored in (compressed) CML format, allowing transparent and easy storage of 1D,2D or 3D representations (including mixtures) Multiple 3D structures per compound Identifiers ( SMILES, INChi, CAS or other registry numbers; unlimited number of arbitrary identifiers and synonyms ) Inventory indicator Descriptors (unlimited number of arbitrary descriptors) Experimental data (flexible templates for experimental data) QSAR models Literature references Fingerprints and atom environments for fast substructure and similarity search Other information generated in order to accelerate specific queries The complete documentation of AMBIT Database is available at
QSAR May, LyonAMBIT is available online at AMBIT Database schema Descriptors Repository Compounds Repository QSAR models Repository Experimental Results Repository Users Repository Literature References Repository Queries
QSAR May, LyonAMBIT is available online at AMBIT selected functionalities Input/output of chemical compounds, descriptors, experimental data and QSAR models (many file formats) Search Simple search (CAS, SMILES, chemical name) Descriptor search Experimental data search Substructure and similarity search Grouping Verhaar classification scheme Similarity (see J.Jaworska presentation tomorrow) QSAR Applicability domain assessment
QSAR May, LyonAMBIT is available online at AMBIT Online – Similarity search
QSAR May, LyonAMBIT is available online at AMBIT Online - Query result
QSAR May, LyonAMBIT is available online at Links to other databases - KEGG
QSAR May, LyonAMBIT is available online at Information about QSAR models
QSAR May, LyonAMBIT is available online at AMBIT Database Tools Standalone application
QSAR May, LyonAMBIT is available online at AMBIT User Interface Example: Search by descriptor ranges
QSAR May, LyonAMBIT is available online at AMBIT Discovery Software for applicability domain and grouping Methods: Descriptor space Ranges Euclidean distance City-block Distance Probability Density options Threshold Preprocessing (e.g. PCA) Center More…. Structural similarity Fingerprints Consensus fingerprint + Tanimoto distance Consensus fingerprint + Missing fragments Atom environments Consensus atom environments + Hellinger distance kNN + Tanimoto distance Ranking Results from several methods can be combined.
QSAR May, LyonAMBIT is available online at AMBIT Discovery Data visualisation
QSAR May, LyonAMBIT is available online at AMBIT Discovery Results (exported to MSExcel file)
QSAR May, LyonAMBIT is available online at Similarity based on mechanistic understanding Verhaar H.J.M., Van Leeuven C., Hermens J.L.M.,Classifying Environmental Pollutants. 1: Structure-Activity Relationships for Prediction of Aquatic Toxicity, Chemosphere, Vol.25, No.4, pp , 1992 Verhaar scheme 34 rules 5 classes Class 1. Narcosis or baseline toxicity Class 2 Less inert compounds Class 3 Unspecific reactivity Class 4 Compounds and groups of compounds acting by a specific mechanism Class 5 Not possible to classify according to these rules
QSAR May, LyonAMBIT is available online at Verhaar scheme implementation Modular approach Can be used within: AMBIT Database Tools As an extension to ToxTree toxtree
QSAR May, LyonAMBIT is available online at Summary Many tools were developed and we are working on their seamless integration Both standalone and web application are in beta stage and are being extensively tested Synergies with other projects LRI Cefic gold standard BCF database will be stored in AMBIT LRI Cefic biotransformation database will be able to communicate with AMBIT BCF ECB Cramer rules software for TTC (human health) - ToxTree Fraunhofer Institute subchronic toxicity database (human health) Approaches to similarity assessment will be further extended and tested in context of category development /read across (ECB funded project) Open source software lowers the user barrier, facilitates the dissemination activities and enables the reproducibility of models and results
QSAR May, LyonAMBIT is available online at This work is funded by CEFIC LRI EEM-9 Building blocks for a future (Q)SAR decision support system : databases, applicability domain and structure conversions Acknowledgment
QSAR May, LyonAMBIT is available online at The Chemistry Development Kit CDK is a freely available open source Java library for structural chemo- and bioinformatics. Originated in - and is hosted by – the Research Group for Molecular Informatics at Cologne University’s Bioinformatics Center. Maintained and enhanced by more than 20 developers from both academic and industrial institutions all over the world. Used in more than 10 different academic and industrial projects world wide. Provides methods for many common tasks in molecular informatics SMILES parsing and generation Substructure searching 2D and 3D rendering of chemical structures I/O routines (format conversions) 3D builder QSAR module, etc
QSAR May, LyonAMBIT is available online at Thank you! Questions?