Using ChemAxon Toolkits in the Lead Discovery Database at GNF Genomics Institute of the Novartis Research Foundation Genomics Institute of the Novartis.

Slides:



Advertisements
Similar presentations
Using the SQL Access Advisor
Advertisements

Virtual Synthesis - Reactor
PROJECTIONS OF STRAIGHT LINES.
Efficient full-text search in databases Andrew Aksyonoff, Peter Zaitsev Percona Ltd. shodan (at) shodan.ru.
1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics.
Version 5.3, February 2010 Scientific & technical presentation JChem Base.
Scientific & technical presentation JChem Cartridge for Oracle
1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia January, 2007 Structural Search Using ChemAxon.
May, 2008 Presenting: Szabolcs Csepregi The ChemAxon Markush project overview and development discussion.
Scientific & technical presentation Fragmenter Nóra Máté Sept 2005.
Integrating ChemAxon technology into your End User Applications Java solutions for cheminformatics Ver. Mar., 2005.
JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.
Scientific & technical presentation Calculator Plugins January 2011.
Instant JChem INFORMATICS MATTERS
Java Solutions for Cheminformatics Feb 2008 Whats new for PP.
Version 5.3, April 2010 The ChemAxon Markush project overview and development discussion.
UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance.
Structural Search Using ChemAxon Tools
Nov 2008 Scientific & technical presentation JChem for Excel.
A Novel SAR-Driven Approach for Identifying True High-Throughput Screening Hits S. Frank Yan, Hayk Asatryan, Jing Li, Kaisheng Chen, and Yingyao Zhou Genomics.
Pipeline Pilot Integration Szilard Dorant Solutions for Cheminformatics.
4 August 2009Copyright © 2009 – Kelaroo, Inc. Kelaroo & ChemAxon Robert D. Feinstein, PhD Vice President & CSO, Kelaroo, Inc.
Reaxys – Managing Complexity
Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.
JChem Base chemical database
In Silico Synthesis György Pirok, Nóra Máté. Elements of the Virtual Synthesis Technology A language for describing chemical rules –Chemical Terms A library.
ChemAxon's Java Components in a Heterogeneous, Server-Centric Application Environment ChemAxon 2005 User Group Meeting May 19th and 20th, Budapest, Hungary.
Integrating JChem and Marvin into the Integrity ® Drug Discovery and Development Portal Rosa Alentorn, Gerard Chiva and Ann Wescott ChemAxon UGM, 7-8 June.
Solutions for Cheminformatics
Interfacing the JChem Suite outside of Java Jonathan Lee Solutions for Cheminformatics.
UGM, June, 2007 Presenting: Szabolcs Csepregi JChem Base and Cartridge latest.
Instant JChem - current status and what's coming soon. Tim Dudgeon Solutions for Cheminformatics.
1 Szabolcs Csepregi May, 2005 Structural Search Using ChemAxon Tools.
Leveraging ChemAxon Cheminformatics in an Integrated Drug Discovery and Development Platform Zhenbin Li, Paul Starbard, Jim Gregory, Donald Chen, Paul.
19 May 2005Copyright © 2005 – Kelaroo, Inc. Kelaroo Applications & ChemAxon Components: Reagent Management Robert D. Feinstein, Ph.D. Kelaroo, Inc. –
1 HTS Follow-up using JChem Base: Virtual and Vendor Neighbors James Baxendale, Ajay, Keana Scott, Lalit Verma, Noel Southall, Trung Nguyen.
1 Péter Kovács May, 2005 Compound storage / retrieval with JChem Cartridge for Oracle.
1 György Pirok, Szilárd Dóránt May, 2005 What is Marvin and how to...
DeltaSofts ChemCart Next Generation Access to Research Data ChemAxon User Group Meeting Budapest, Hungary June 13-14, 2007.
ChemAxon for Developers Ferenc Csizmadia 2008 November – Last updated: 2010 April.
1 Tobias Kind FiehnLab at UC Davis Genome Center November 2006 Benchmarking JChem Oracle and Instant-JChem (and more) Free Academic Licenses for JChem.
An integrated suite of applications using ChemAxon components
2008 Accelrys EUGM Pipelining ChemAxon Szilard Dorant Solutions for Cheminformatics.
Instant JChem 2009 US + EU Seminars Confidential. Copyright© 2009 ChemAxon Kft, Informatics Matters Ltd Instant JChem Instant JChem Seminar series Q
Java Solutions for Cheminformatics March About Us Molecule Drawing and Visualization Structure Searching Cartridge Structure Standardization Molecular.
Solutions for Cheminformatics
eClassifier: Tool for Taxonomies
OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research.
Extreme Performance with Oracle Data Warehousing
Dr. Matthew Wright Product Director.
1 Chapter 16 Tuning RMAN. 2 Background One of the hardest chapters to develop material for Tuning RMAN can sometimes be difficult Authors tried to capture.
13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.
Overview of performance tuning strategies Oracle Performance Tuning Allan Young June 2008.
© Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi © Bharati Vidyapeeths Institute of Computer Applications and.
The Use of Graph Matching Algorithms to Identify Biochemical Substructures in Synthetic Chemical Compounds Application to Metabolomics Mai Hamdalla, David.
Migrating to DayCart Introduction Thor database concepts. Data (Chemical Structure) hierarchy. Thor data model Daylight/Oracle cartridge data model.
Ideal Parent Structure Learning School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel Gal Elidan with Iftach Nachman and Nir.
IG Pro & CMS.
Integrated Compound Management using Daylight TM, Java TM and Oracle ® The GNF Compound Management System Project Elena Rodriguez, GNF Steven Wilkens,
Company Info ChemAxon UGM Sept/09 Software development & product company since Started by developing EMR systems for hospitals and have since expanded.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Prerequisite Checker Neeharika Bollepalli Masters Report, Final Defense Guidance by Dr. Dan Andresen.
May 2009 ChemAxon - What’s New?. What’s new and hot? All products have seen enhancements in the past 12 months BUT WHAT’S REALLY HOT?
The Random Sampling/Tracking Tool: A Response to Over- Surveying Steve Graves Intel Corporation.
Building Dashboards SharePoint and Business Intelligence.
June 2016, Version Scientific & technical Presentation Pipeline Pilot Integration.
Computing challenges in working with genomics-scale data
Pipeline pilot Components
CS122B: Projects in Databases and Web Applications Spring 2018
CS122B: Projects in Databases and Web Applications Winter 2018
Presentation transcript:

Using ChemAxon Toolkits in the Lead Discovery Database at GNF Genomics Institute of the Novartis Research Foundation Genomics Institute of the Novartis Research Foundation Hayk Asatryan, Dimitri Petrov, S. Frank Yan, Andrey Santrosyan, Kaisheng Chen, Shumei Jiang, Jeff Janes, Yingyao Zhou May, 2005 Budapest, Hungry

Scope of the Lead Discovery Database - LDDB Compound Management Center HTS Center Program Management Quality Control Hit Picking Hit to Lead Tracking MedChem QSAR Analytical Chemistry ADME/Tox PK/PD Data Processing Optimized Leads Genomics Institute of the Novartis Research Foundation Genomics Institute of the Novartis Research Foundation

Data Mining Oracle + DayCart Web Browser Marvin Applet Tomcat Apache CGI, Servlet Compound Registration Normalization (Novartis) Mol2smi (Daylight) Desktop Edit/Visualization Tools ChemDraw ISIS Draw Accord for Excel Architecture of LDDB JChem API Daylight Toolkits Genomics Institute of the Novartis Research Foundation Genomics Institute of the Novartis Research Foundation

Problems of the Heterogeneous Setup Molsmart solution for [H]N([H])c1ccccc1 Instead of [H]N([H])c1ccccc1 [N;!H0;!H1]c1ccccc1 Mol2Smi – aromatization & chirality Nitrogen E/Z isomerism: C\C(=N/O)c1ccccc1.C\C(=N\O)c1ccccc1 Mol2Smi – aromatization & chirality Nitrogen E/Z isomerism: C\C(=N/O)c1ccccc1.C\C(=N\O)c1ccccc1 ChemDraw & Marvin ChemDraw & Marvin MolSmart – chirality MolSmart – chirality Uncompleted Asymmetric Center (fixed in the latest Marvin), draw Input: C\C=C/C Output: [#6]C=C[#6] Genomics Institute of the Novartis Research Foundation

Daylight & ChemAxon (discuss later) Daylight & ChemAxon (discuss later) Accord: chirality, display Accord: chirality, display Pricing considerations Pricing considerations Problems of the Heterogeneous Setup (cont.) ChemDraw Accord for Excel Marvin Genomics Institute of the Novartis Research Foundation

JChem Cartridge – initial testing (July 2004) Daylight's substructure search: 5-6 seconds JCart substructure search: sec (caching the whole structure table in Oracle) Similarity search is approximately minutes (1.76million) JChem results: 10.6 minutes for 3 million structures (3 GHz Pentium 4) SmilesCount Time (ms) NC(=O)C(=NOC)c1csc(N)n OC(=O)c1cc(O)c(O)c(O)c OC1CC(C)(N)C(O)C(C)O c1ccc(OS(=O)(=O)O)cc C2OC(n1ccc(=O)[nH]c1=O)C(O)C2O Cc1ccc(N(CCCl)CCCl)cc OC1OC(C)C(O)C(OC)C1OC C1OC(CO)C(O)C(OC(N)=O)C1O NC(=O)C(N)Cc1cnc[nH] OC(=O)c1cc(OC)c(OC)c(OC)c OC1OC(CO)C(O)C(OC(N)=O)C1O c1c(C)[nH]c(=O)[nH]c1=O C(=O)NC(C=O)CCCNC(=N)N Genomics Institute of the Novartis Research Foundation

Brainstorming Initial attempts Reduce SOAP's overhead Tuning on fingerprint parameters when creating the structure table Observation SELECT statement that is used in screening: 16-17s select cd_id, cd_smiles from SCOTT.NCI_3M where BITAND(cd_fp1, ) = AND BITAND(cd_fp2, ) = AND … During screening, the CPU usage is only 30-40%, mainly I/O activity. Second attempts Fail to PIN the fingerprints column alone into memory in Oracle. Solution Preliminary studies show that the substructure search drops below 1 sec. The cache will consume only around 100MB/million, more scalable. Challenge: structure-synchronization issues. Genomics Institute of the Novartis Research Foundation

Performance Testing – Substructure Search JChem Cartridge (sec) DayCart (sec) JCart (sec) DayCart (sec) Genomics Institute of the Novartis Research Foundation

0.2 sec screening ms/hit Performance Testing – Substructure Search (cont.) Genomics Institute of the Novartis Research Foundation

New Cartridge Features – SQL Filtering Genomics Institute of the Novartis Research Foundation Use filtering can dramatically improve performance SQL Cost: 240sec select count(*) from cpd where jc_compare(jc_smiles,'c1ccccc1','sep=! t:s')=1 SQL Cost: 0.25sec select * from cpd where jc_compare(jc_smiles,'c1ccccc1','sep=! t:s! filterQuery:select c.rowid from cpd_instance i, cpd c where i.plate_sid= and i.cpd_sid=c.cpd_sid')=1 Challenge: when SQL-filtering is appropriate? SQL Cost: 25sec select * from cpd where jc_compare(jc_smiles,'CCCCCCOc1cccc(C=NOCC(O)COc2cccc(c2)C(C)C) c1','sep=! t:s!filterQuery:select c.rowid from cpd c where cpd_sid>0')=1 SQL Cost: 0.25sec select count(*) from cpd where jc_contains(jc_smiles,'CCCCCCOc1cccc(C=NOCC(O)COc2cccc(c2)C(C)C)c 1')=1

DayCart to JCart Migration Challenges Identical structures or not? Two identical structures considered by Daylight ideally remains identical by JChem, and vice versa. Identical structures or not? Two identical structures considered by Daylight ideally remains identical by JChem, and vice versa. Example 1: Aromatic Sulfur Example 1: Aromatic Sulfur COC1=NC(=NS(=N1)C2CCCCC2)ClC Oc1nc(Cl)ns(n1)C2CCCCC2 Genomics Institute of the Novartis Research Foundation Solution: Jchem support Daylight rules

Challenges – Identical Structures? (cont.) Example 2: Isotope Bug Example 2: Isotope Bug Example 3: Standardization Example 3: Standardization CC1=CC=N(C)C=C1 Cc1cc[n+](C)cc1 Genomics Institute of the Novartis Research Foundation

Challenges – Identical Structures? (cont.) Brc1ccc2[N]c3nc4ccccc4nc3-c2c1 Brc1ccc2Nc3nc4ccccc4nc3- c2c1 Example 4: Non-standard Bond Example 4: Non-standard Bond Example 4: Chirality Example 4: Chirality *c1ccc(COCC2CCC=CO2)cc1 or NULL (not supported by JCart) Incomplete Structures in Database Incomplete Structures in Database Genomics Institute of the Novartis Research Foundation

Migration Challenges - LogP 50% compounds have both values agreed within 30% Genomics Institute of the Novartis Research Foundation

Applications in LDDB Structure Display Instead of using Marvin applets for structure display, LDDB uses a structure image servlet. This strategy improves display speed, overcomes undesirable browser caching. Structure Search In-house and vendor collection, followed by hit analysis. Click on an image for Marvin applet pop-up. Genomics Institute of the Novartis Research Foundation

Ongoing Developments … Other applications R-group analysis Most-common substructure analysis Database-wise clustering analysis Thank You! Genomics Institute of the Novartis Research Foundation