AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

Slides:



Advertisements
Similar presentations
Integrating ChemAxon technology into your End User Applications Java solutions for cheminformatics Ver. Mar., 2005.
Advertisements

SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
Pulan Yu School of Informatics Indiana University Bloomington Web service based Varuna.Net.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
S&I Framework Testing HL7 V2 Lab Results Interface and RI Pilot Robert Snelick National Institute of Standards and Technology June 23 rd, 2011 Contact:
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
PROBABILISTIC ASSESSMENT OF THE QSAR APPLICATION DOMAIN Nina Jeliazkova 1, Joanna Jaworska 2 (1) IPP, Bulgarian Academy of Sciences, Sofia, Bulgaria (2)
The CEMS Faculty Information System Project 23 June 2006.
Interactive Systems Technical Design Seminar work: Web Services Janne Ojanaho.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
SciFinder ® : Part of the process™ 2006 Edition. SciFinder ® : Part of the process™ 2006 Edition SciFinder ® 2006 provides new, powerful capabilities.
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
DEiXTo.
1 Introducing Reportnet Miruna Badescu. 2 A linear view of Reportnet process.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
Toxmatch - a tool to assess chemical similarity
AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria.
Part 1. Persistent Data Web applications remember your setting by means of a database linked to the site.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
10-12 Sep 2008OpenTox kick-off meeting Basel, Switzerland Ideaconsult Ltd. Dr. Nina Jeliazkova.
The european ITM Task Force data structure F. Imbeaux.
Scientific Applications of XML Arvind Hulgeri, Shantanu Godbole
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
ABSTRACT The JDBC (Java Database Connectivity) API is the industry standard for database- independent connectivity between the Java programming language.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Developed at the Broad Institute of MIT and Harvard Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, and Mesirov JP. GenePattern 2.0. Nature Genetics 38.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
XML stands for Extensible Mark-up Language XML is a mark-up language much like HTML XML was designed to carry data, not to display data XML tags are not.
Design and Implementation of a Rationale-Based Analysis Tool (RAT) Diploma thesis from Timo Wolf Design and Realization of a Tool for Linking Source Code.
CEFIC LRI Tools – Ambit 1.21 Nina Jeliazkova Ideaconsult Ltd. Sofia, Bulgaria.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
SDMX IT Tools Introduction
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
IAEA International Atomic Energy Agency INIS Collection Search: Introduction and main features The Role of the International Nuclear Information System.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze.
Use of Machine Learning in Chemoinformatics
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
MDL Information Systems, Inc. Powering the Process of Invention Donna del Rey Director, Business Planning
GROUP PresentsPresents. WEB CRAWLER A visualization of links in the World Wide Web Software Engineering C Semester Two Massey University - Palmerston.
Linking LRI AMBIT Chemoinformatic system with the IUCLID Substance database to Support Read-across of Substance endpoint data and Category formation N.
Exeter – Implementation of a Crosswalk Connector S. Trowell, University of Exeter Nov 2013.
General Architecture of Retrieval Systems 1Adrienn Skrop.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
TIBCO Business Events Online Training. Introduction to TIBCO BE Tibco Business Events is complex event processing software with a powerful engine enables.
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Visualization of Adverse effect pathways
Waikato Environment for Knowledge Analysis
Jonathan Griffin, Managing Director, IFIS Publishing &
Information Retrieval and Web Design
Reportnet 3.0 Database Feasibility Study – Approach
Presentation transcript:

AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria Joanna Jaworska Central Product Safety Procter and Gamble Belgium

QSAR May, LyonAMBIT is available online at Introduction – why AMBIT ?  Limited free, publicly accessible, methodologically transparent software was identified as one of the roadblocks for broadening use of in-silico methods (ICCA Workshop in Setubal 2002, OECD)  Realization that efficient use of existing information on chemicals requires better ways for Storage  standardized formats, computer automated verification of structures, capability to store large amounts of data Taking advantage of rapidly evolving field of data mining and extraction of relevant information

QSAR May, LyonAMBIT is available online at Content  Overview of AMBIT functional modules  Technology choice and software capabilities  Demonstration of the current state Web application  Online similarity search Standalone applications  Ambit Database Tools Descriptor search Experimental data search Similarity search Verhaar classification scheme  AmbitDiscovery Applicability domain Grouping by different methods

QSAR May, LyonAMBIT is available online at Software overview Database Search engine Searches by (CAS, SMILES, Name) Substructure search Similarity Search EM9-1a,b, 2,3 Data import and export, Format Conversions EM9-1,2,3 Applicability domain EM9-1a Similarity assessment EM9-1b

QSAR May, LyonAMBIT is available online at AMBIT Database Today Not restricted to these datasets! Any dataset can be imported! (e.g. DSSTox, AQUIRE, LLNA dataset …)

QSAR May, LyonAMBIT is available online at AMBIT More about the internals…  Open source, relying on open standards  Modular approach  Stand alone and web versions  Implemented in Java, i.e. Platform independent (same application runs on Windows, Unix, Mac …) Suitable for web applications  The cheminformatics functionality relies on the open source Java library – The Chemistry Development Kit  The software is based on a Relational Database Management System Allows much faster and convenient access to the data in contrast to flat text files. Our choice is MySQL database ( which is the most popular open source relational database.  Chemical Markup Language (CML) Acknowledged method of encoding chemical data in XML Acknowledged method of encoding chemical data in XML Being adopted by a large number of chemical organisations, from government, through commercial to academia. Being adopted by a large number of chemical organisations, from government, through commercial to academia. The choice of CML for the internal format makes the database independent of the software which is able to access it, in contrast to some proprietary solutions. The choice of CML for the internal format makes the database independent of the software which is able to access it, in contrast to some proprietary solutions.

QSAR May, LyonAMBIT is available online at AMBIT Information stored:  Structures internally stored in (compressed) CML format, allowing transparent and easy storage of 1D,2D or 3D representations (including mixtures)  Multiple 3D structures per compound  Identifiers ( SMILES, INChi, CAS or other registry numbers; unlimited number of arbitrary identifiers and synonyms )  Inventory indicator  Descriptors (unlimited number of arbitrary descriptors)  Experimental data (flexible templates for experimental data)  QSAR models  Literature references  Fingerprints and atom environments for fast substructure and similarity search  Other information generated in order to accelerate specific queries  The complete documentation of AMBIT Database is available at

QSAR May, LyonAMBIT is available online at AMBIT Database schema Descriptors Repository Compounds Repository QSAR models Repository Experimental Results Repository Users Repository Literature References Repository Queries

QSAR May, LyonAMBIT is available online at AMBIT selected functionalities  Input/output of chemical compounds, descriptors, experimental data and QSAR models (many file formats)  Search Simple search (CAS, SMILES, chemical name) Descriptor search Experimental data search Substructure and similarity search  Grouping Verhaar classification scheme Similarity (see J.Jaworska presentation tomorrow)  QSAR Applicability domain assessment

QSAR May, LyonAMBIT is available online at AMBIT Online – Similarity search

QSAR May, LyonAMBIT is available online at AMBIT Online - Query result

QSAR May, LyonAMBIT is available online at Links to other databases - KEGG

QSAR May, LyonAMBIT is available online at Information about QSAR models

QSAR May, LyonAMBIT is available online at AMBIT Database Tools Standalone application

QSAR May, LyonAMBIT is available online at AMBIT User Interface Example: Search by descriptor ranges

QSAR May, LyonAMBIT is available online at AMBIT Discovery Software for applicability domain and grouping Methods:  Descriptor space Ranges Euclidean distance City-block Distance Probability Density options  Threshold  Preprocessing (e.g. PCA)  Center  More….  Structural similarity Fingerprints  Consensus fingerprint + Tanimoto distance  Consensus fingerprint + Missing fragments Atom environments  Consensus atom environments + Hellinger distance  kNN + Tanimoto distance  Ranking Results from several methods can be combined.

QSAR May, LyonAMBIT is available online at AMBIT Discovery Data visualisation

QSAR May, LyonAMBIT is available online at AMBIT Discovery Results (exported to MSExcel file)

QSAR May, LyonAMBIT is available online at Similarity based on mechanistic understanding Verhaar H.J.M., Van Leeuven C., Hermens J.L.M.,Classifying Environmental Pollutants. 1: Structure-Activity Relationships for Prediction of Aquatic Toxicity, Chemosphere, Vol.25, No.4, pp , 1992  Verhaar scheme  34 rules  5 classes Class 1. Narcosis or baseline toxicity Class 2 Less inert compounds Class 3 Unspecific reactivity Class 4 Compounds and groups of compounds acting by a specific mechanism Class 5 Not possible to classify according to these rules

QSAR May, LyonAMBIT is available online at Verhaar scheme implementation Modular approach Can be used within: AMBIT Database Tools As an extension to ToxTree toxtree

QSAR May, LyonAMBIT is available online at Summary  Many tools were developed and we are working on their seamless integration  Both standalone and web application are in beta stage and are being extensively tested  Synergies with other projects LRI Cefic gold standard BCF database will be stored in AMBIT LRI Cefic biotransformation database will be able to communicate with AMBIT BCF ECB Cramer rules software for TTC (human health) - ToxTree Fraunhofer Institute subchronic toxicity database (human health) Approaches to similarity assessment will be further extended and tested in context of category development /read across (ECB funded project)  Open source software lowers the user barrier, facilitates the dissemination activities and enables the reproducibility of models and results

QSAR May, LyonAMBIT is available online at This work is funded by CEFIC LRI EEM-9 Building blocks for a future (Q)SAR decision support system : databases, applicability domain and structure conversions Acknowledgment

QSAR May, LyonAMBIT is available online at The Chemistry Development Kit  CDK is a freely available open source Java library for structural chemo- and bioinformatics.  Originated in - and is hosted by – the Research Group for Molecular Informatics at Cologne University’s Bioinformatics Center.  Maintained and enhanced by more than 20 developers from both academic and industrial institutions all over the world.  Used in more than 10 different academic and industrial projects world wide.  Provides methods for many common tasks in molecular informatics SMILES parsing and generation Substructure searching 2D and 3D rendering of chemical structures I/O routines (format conversions) 3D builder QSAR module, etc

QSAR May, LyonAMBIT is available online at Thank you! Questions?