Presentation is loading. Please wait.

Presentation is loading. Please wait.

AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

Similar presentations


Presentation on theme: "AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria."— Presentation transcript:

1 AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria E-mail nina@acad.bg Joanna Jaworska Central Product Safety Procter and Gamble Belgium

2 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg Introduction – why AMBIT ?  Limited free, publicly accessible, methodologically transparent software was identified as one of the roadblocks for broadening use of in-silico methods (ICCA Workshop in Setubal 2002, OECD)  Realization that efficient use of existing information on chemicals requires better ways for Storage  standardized formats, computer automated verification of structures, capability to store large amounts of data Taking advantage of rapidly evolving field of data mining and extraction of relevant information

3 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg Content  Overview of AMBIT functional modules  Technology choice and software capabilities  Demonstration of the current state Web application  Online similarity search Standalone applications  Ambit Database Tools Descriptor search Experimental data search Similarity search Verhaar classification scheme  AmbitDiscovery Applicability domain Grouping by different methods

4 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg Software overview Database Search engine Searches by (CAS, SMILES, Name) Substructure search Similarity Search EM9-1a,b, 2,3 Data import and export, Format Conversions EM9-1,2,3 Applicability domain EM9-1a Similarity assessment EM9-1b

5 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg AMBIT Database Today Not restricted to these datasets! Any dataset can be imported! (e.g. DSSTox, AQUIRE, LLNA dataset …)

6 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg AMBIT More about the internals…  Open source, relying on open standards  Modular approach  Stand alone and web versions  Implemented in Java, i.e. Platform independent (same application runs on Windows, Unix, Mac …) Suitable for web applications  The cheminformatics functionality relies on the open source Java library – The Chemistry Development Kit http://cdk.sourceforge.net/http://cdk.sourceforge.net/  The software is based on a Relational Database Management System Allows much faster and convenient access to the data in contrast to flat text files. Our choice is MySQL database (www.mysql.com), which is the most popular open source relational database.www.mysql.com  Chemical Markup Language (CML) Acknowledged method of encoding chemical data in XML Acknowledged method of encoding chemical data in XML Being adopted by a large number of chemical organisations, from government, through commercial to academia. Being adopted by a large number of chemical organisations, from government, through commercial to academia. The choice of CML for the internal format makes the database independent of the software which is able to access it, in contrast to some proprietary solutions. The choice of CML for the internal format makes the database independent of the software which is able to access it, in contrast to some proprietary solutions.

7 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg AMBIT Information stored:  Structures internally stored in (compressed) CML format, allowing transparent and easy storage of 1D,2D or 3D representations (including mixtures)  Multiple 3D structures per compound  Identifiers ( SMILES, INChi, CAS or other registry numbers; unlimited number of arbitrary identifiers and synonyms )  Inventory indicator  Descriptors (unlimited number of arbitrary descriptors)  Experimental data (flexible templates for experimental data)  QSAR models  Literature references  Fingerprints and atom environments for fast substructure and similarity search  Other information generated in order to accelerate specific queries  The complete documentation of AMBIT Database is available at http://ambit.acad.bg/docs http://ambit.acad.bg/docs

8 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg AMBIT Database schema Descriptors Repository Compounds Repository QSAR models Repository Experimental Results Repository Users Repository Literature References Repository Queries

9 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg AMBIT selected functionalities  Input/output of chemical compounds, descriptors, experimental data and QSAR models (many file formats)  Search Simple search (CAS, SMILES, chemical name) Descriptor search Experimental data search Substructure and similarity search  Grouping Verhaar classification scheme Similarity (see J.Jaworska presentation tomorrow)  QSAR Applicability domain assessment

10 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg AMBIT Online – Similarity search

11 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg AMBIT Online - Query result

12 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg Links to other databases - KEGG

13 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg Information about QSAR models

14 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg AMBIT Database Tools Standalone application

15 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg AMBIT User Interface Example: Search by descriptor ranges

16 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg AMBIT Discovery Software for applicability domain and grouping Methods:  Descriptor space Ranges Euclidean distance City-block Distance Probability Density options  Threshold  Preprocessing (e.g. PCA)  Center  More….  Structural similarity Fingerprints  Consensus fingerprint + Tanimoto distance  Consensus fingerprint + Missing fragments Atom environments  Consensus atom environments + Hellinger distance  kNN + Tanimoto distance  Ranking Results from several methods can be combined.

17 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg AMBIT Discovery Data visualisation

18 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg AMBIT Discovery Results (exported to MSExcel file)

19 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg Similarity based on mechanistic understanding Verhaar H.J.M., Van Leeuven C., Hermens J.L.M.,Classifying Environmental Pollutants. 1: Structure-Activity Relationships for Prediction of Aquatic Toxicity, Chemosphere, Vol.25, No.4, pp.471-491, 1992  Verhaar scheme  34 rules  5 classes Class 1. Narcosis or baseline toxicity Class 2 Less inert compounds Class 3 Unspecific reactivity Class 4 Compounds and groups of compounds acting by a specific mechanism Class 5 Not possible to classify according to these rules

20 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg Verhaar scheme implementation Modular approach Can be used within: AMBIT Database Tools As an extension to ToxTree http://ecb.jrc.it/qsar/ toxtree

21 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg Summary  Many tools were developed and we are working on their seamless integration  Both standalone and web application are in beta stage and are being extensively tested  Synergies with other projects LRI Cefic gold standard BCF database will be stored in AMBIT LRI Cefic biotransformation database will be able to communicate with AMBIT BCF ECB Cramer rules software for TTC (human health) - ToxTree Fraunhofer Institute subchronic toxicity database (human health) Approaches to similarity assessment will be further extended and tested in context of category development /read across (ECB funded project)  Open source software lowers the user barrier, facilitates the dissemination activities and enables the reproducibility of models and results

22 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg This work is funded by CEFIC LRI EEM-9 Building blocks for a future (Q)SAR decision support system : databases, applicability domain and structure conversions Acknowledgment

23 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg The Chemistry Development Kit http://cdk.sourceforge.net  CDK is a freely available open source Java library for structural chemo- and bioinformatics.  Originated in - and is hosted by – the Research Group for Molecular Informatics at Cologne University’s Bioinformatics Center.  Maintained and enhanced by more than 20 developers from both academic and industrial institutions all over the world.  Used in more than 10 different academic and industrial projects world wide.  Provides methods for many common tasks in molecular informatics SMILES parsing and generation Substructure searching 2D and 3D rendering of chemical structures I/O routines (format conversions) 3D builder QSAR module, etc

24 QSAR2006 8-12 May, LyonAMBIT is available online at http://ambit.acad.bg Thank you! Questions?


Download ppt "AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria."

Similar presentations


Ads by Google