AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria.

Slides:



Advertisements
Similar presentations
Integrating ChemAxon technology into your End User Applications Java solutions for cheminformatics Ver. Mar., 2005.
Advertisements

SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
Community Grids Lab CICC Activities Geoffrey Fox, Marlon Pierce Indiana University.
Identity Management Based on P3P Authors: Oliver Berthold and Marit Kohntopp P3P = Platform for Privacy Preferences Project.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
4.1 Blended approaches: Information Engineering IMS Information Systems Development Practices.
University of Leeds Department of Chemistry The New MCM Website Stephen Pascoe, Louise Whitehouse and Andrew Rickard.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
Requirements Specification
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Geographic Information Systems
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 1: Introduction to Decision Support Systems Decision Support.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
November 2011 At A Glance GREAT is a flexible & highly portable set of mission operations analysis tools that increases the operational value of ground.
Database Software Application
Application of PDM Technologies for Enterprise Integration 1 SS 14/15 By - Vathsala Arabaghatta Shivarudrappa.
AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Toxmatch - a tool to assess chemical similarity
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria.
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
SDF File analysis Creation, composition, checking.
New functionalities in Toolbox Multi-document application 2. QA of chemical structures 3. Working on 2D or 2.5D mode INPUT.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
10-12 Sep 2008OpenTox kick-off meeting Basel, Switzerland Ideaconsult Ltd. Dr. Nina Jeliazkova.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
The european ITM Task Force data structure F. Imbeaux.
Information Systems Engineering. Lecture Outline Information Systems Architecture Information System Architecture components Information Engineering Phases.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Data resource management
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
1 Chapter 1 Introduction to Databases Transparencies.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Design and Implementation of a Rationale-Based Analysis Tool (RAT) Diploma thesis from Timo Wolf Design and Realization of a Tool for Linking Source Code.
CEFIC LRI Tools – Ambit 1.21 Nina Jeliazkova Ideaconsult Ltd. Sofia, Bulgaria.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
Chapter 4 Automated Tools for Systems Development Modern Systems Analysis and Design Third Edition 4.1.
Types of Information Systems Basic Computer Concepts Types of Information Systems  Knowledge-based system  uses knowledge-based techniques that supports.
Use of Machine Learning in Chemoinformatics
Presentation on Database management Submitted To: Prof: Rutvi Sarang Submitted By: Dharmishtha A. Baria Roll:No:1(sem-3)
MDL Information Systems, Inc. Powering the Process of Invention Donna del Rey Director, Business Planning
Linking LRI AMBIT Chemoinformatic system with the IUCLID Substance database to Support Read-across of Substance endpoint data and Category formation N.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
Federal Land Manager Environmental Database (FED) Overview and Update June 6, 2011 Shawn McClure.
Chapter 1 Overview of Databases and Transaction Processing.
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
Geographic Information Systems GIS Data Databases.
The CompTox Chemistry Dashboard: an informational data hub at the
Computer Aided Software Engineering (CASE)
Modern Systems Analysis and Design Third Edition
Geographic Information Systems
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
2. An overview of SDMX (What is SDMX? Part I)
Electronic Data Exchange and Evaluation System
DATABASES WHAT IS A DATABASE?
Geographic Information Systems
Presentation transcript:

AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Introduction – why Ambit ? Limited free, publicly accessible, methodologically transparent software was identified as one of the roadblocks for broadening use of in-silico methods (ICCA Workshop in Setubal 2002, OECD) Realization that efficient use of existing information on chemicals requires better ways for Storage −standardized formats, computer automated verification of structures, capability to store large amounts of data Taking advantage of rapidly evolving field of data mining and extraction of relevant information

IT strategy Ambit - building blocks for Decision Support System High emphasis on interoperability for “plug and play” Flexibility modular design Transparency −Open source, relying on open standards. Open source software lowers the user barrier, facilitates the dissemination activities and enables the reproducibility of models and results −The cheminformatics functionality relies on the open source Java library – The Chemistry Development Kit −The software is based on MySQL database ( which is the most popular open source relational database. −Chemical Markup Language (CML) acknowledged method of encoding chemical data in XMLacknowledged method of encoding chemical data in XML Is being adopted by a large number of chemical organisations, from government, through commercial to academia.Is being adopted by a large number of chemical organisations, from government, through commercial to academia. The choice of CML for the internal format makes the database independent of the software which is able to access it, in contrast to some proprietary solutions.The choice of CML for the internal format makes the database independent of the software which is able to access it, in contrast to some proprietary solutions.

Ambit - Overview AMBIT software is a set of libraries and tools, providing various cheminformatics functionalities for data management. The AMBIT system consists of a database and functional modules allowing a variety of flexible searches and mining of the data stored in the database. The unique feature of AMBIT is the ability to store multifaceted information about chemical structures and provide a searchable interface linking these diverse components.

Ambit overview The AMBIT database: stores chemical structures, their identifiers such as CAS, INChI numbers; attributes such as molecular descriptors, experimental data together with test descriptions, and literature references. The database can also store QSAR models. In addition the software can generate a suite of 2D and 3D molecular descriptors. can be searched by identifiers, attribute value or range, experimental data value or range, user defined structure and substructure, structural similarity AMBIT database contains over chemical compounds with data imported from over a dozen databases [ The number of compounds is growing all the time and one the of system’s great strengths is that any dataset can be imported for comparison and analysis. AMBITDatabaseTools 1.10 allows the user to create a local database and to import his own sets of chemical compounds. AMBIT Discovery performs chemical grouping and assesses the applicability domain of a QSAR offering a variety of methods including using different approaches to similarity assessments: statistical that rely on ‘descriptor space’; approaches based on mechanistic understanding; and approaches based on structural similarity. ToxTree ToxTree is a flexible user friendly application which integrates structure based (classification) schemes. Currently 3 schemes are available: Verhaaar for fish toxicity, Cramer for human acute toxicity, BfR rules for skin irritation. ToxTree implements a plug-in mechanism, allowing to be extended by modules developed at a future time, without recompiling the application. ToxTree and AMBIT modules can be integrated one within another. Toxmatch – stand alone application for pairwise similarity assessments with intention for read-across. QSAR database under development. Will store information in QMRF. Large effort on standardization

AMBIT Database Today Not restricted to these datasets!Any dataset can be imported. (e.g. DSSTox, AQUIRE, LLNA …)

AMBIT Database Schema

Experimental results repository

Ambit database Two user interfaces to the database Online Standalone Online a more restricted interface Standalone Full interface Can be used for storing & managing confidential data Common Can link with other databases and pull information via webservices

AMBIT database functionalities Storage: information about chemicals name and structure, descriptors, experimental data and QSAR models Example with a tailored template : BCF golden database LRI project ( EURAS) Q QSAR database with QMRF ( ECB funded) Conversion: Different computer formats of structure, CAS-structure Calculation Variety of descriptors The available list is growing thanks to contributions to CDK Search identification search (CAS, SMILES, chemical name) Descriptor search Experimental data search Substructure and similarity search Complex searches with multiple criteria (standalone)

What kind of searches are desired ? Detailed analyses for pairwise similarity Similarity of a compound to compounds in the database Similarity of a compounds to a reference set Similarity of a set of compounds to compounds in the database Grouping based on chemical class

Ambit online Searching for basic information

AMBIT Online: Similarity search – replace with new search results !!!

AMBIT Online: Query result

Links to other databases: (example: KEGG)

Link to Aquire

Information about QSAR models

Ambit Database Tools 1.20 Standalone application available at

Ambit converter (Batch search) Ambit converter can open : CML, CSV, HIN, ICHI, INCHI, MDL MOL, MDL SDF, MOL2, PDB, SMI, TXT and XYZ file types Ambit converter can save : SDF, MOL, CSV, TXT, SMI file types. CAS-SMILES conversion based on a database lookup Descriptors calculation Cramer rules, Verhaar scheme

Ambit Database Tools 1.20 Import to Database Compounds – several file formats Descriptors – SDF, CSV, TXT Experimental data – SDF, CSV, TXT QSAR models – SDF, CSV, TXT Database processing Calculate SMILES/Fingerprints/Atom environments – necessary in order to perform substructure and similarity search. Should be invoked after importing compounds into database several file formats Descriptors calculation Distances calculation – used to speed up distance between heavy atoms query

Ambit Database Tools 1.20 perform a CAS RN search in the database (submenu "Search -> CAS RN search"); perform a SMILES search in the database (submenu "Search -> SMILES"); perform a molecular formula search in the database (submenu ("Search -> Molecular formula"); define structure,descriptor,distance-based and experimental data criteria and perform searches in the database database Output: On screen To file The user can select between the different datasets existing in the AMBIT database. Subsequent searches will be performed only within the selected dataset

AMBIT User Interface Example: Search by structure Exact search Substructure search Similarity search Fingerprints Atom environments

AMBIT User Interface Example: Search by descriptors

AMBIT User Interface Example: Search by experimental data

Similarity based on toxicity mechanism Verhaar scheme Verhaar H.J.M., Van Leeuven C., Hermens J.L.M.,Classifying Environmental Pollutants. 1: Structure-Activity Relationships for Prediction of Aquatic Toxicity, Chemosphere, Vol.25, No.4, pp , rules 5 classes Class 1. Narcosis or baseline toxicity Class 2 Less inert compounds Class 3 Unspecific reactivity Class 4 Compounds and groups of compounds acting by a specific mechanism Class 5 Not possible to classify according to these rules

Chemical similarity assessment using the database Exact substructure search based on 2D Structural Similarity search (various methods) Criteria on descriptors Based on mechanistic understanding ( Verhaar scheme)

Another view on Similarity assessments with Toxmatch and Discovery Discovery similarity to a set (summary representation) Toxmatch pairwise similarities Similarity to a set (nearest neighbours)

Thank you Questions?