Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using ChemAxon Toolkits in the Lead Discovery Database at GNF Genomics Institute of the Novartis Research Foundation Genomics Institute of the Novartis.

Similar presentations


Presentation on theme: "Using ChemAxon Toolkits in the Lead Discovery Database at GNF Genomics Institute of the Novartis Research Foundation Genomics Institute of the Novartis."— Presentation transcript:

1 Using ChemAxon Toolkits in the Lead Discovery Database at GNF Genomics Institute of the Novartis Research Foundation Genomics Institute of the Novartis Research Foundation Hayk Asatryan, Dimitri Petrov, S. Frank Yan, Andrey Santrosyan, Kaisheng Chen, Shumei Jiang, Jeff Janes, Yingyao Zhou May, 2005 Budapest, Hungry

2 Scope of the Lead Discovery Database - LDDB Compound Management Center HTS Center Program Management Quality Control Hit Picking Hit to Lead Tracking MedChem QSAR Analytical Chemistry ADME/Tox PK/PD Data Processing Optimized Leads Genomics Institute of the Novartis Research Foundation Genomics Institute of the Novartis Research Foundation

3 Data Mining Oracle + DayCart Web Browser Marvin Applet Tomcat Apache CGI, Servlet Compound Registration Normalization (Novartis) Mol2smi (Daylight) Desktop Edit/Visualization Tools ChemDraw ISIS Draw Accord for Excel Architecture of LDDB JChem API Daylight Toolkits Genomics Institute of the Novartis Research Foundation Genomics Institute of the Novartis Research Foundation

4 Problems of the Heterogeneous Setup Molsmart solution for [H]N([H])c1ccccc1 Instead of [H]N([H])c1ccccc1 [N;!H0;!H1]c1ccccc1 Mol2Smi – aromatization & chirality Nitrogen E/Z isomerism: C\C(=N/O)c1ccccc1.C\C(=N\O)c1ccccc1 Mol2Smi – aromatization & chirality Nitrogen E/Z isomerism: C\C(=N/O)c1ccccc1.C\C(=N\O)c1ccccc1 ChemDraw & Marvin ChemDraw & Marvin MolSmart – chirality MolSmart – chirality Uncompleted Asymmetric Center (fixed in the latest Marvin), draw Input: C\C=C/C Output: [#6]C=C[#6] Genomics Institute of the Novartis Research Foundation

5 Daylight & ChemAxon (discuss later) Daylight & ChemAxon (discuss later) Accord: chirality, display Accord: chirality, display Pricing considerations Pricing considerations Problems of the Heterogeneous Setup (cont.) ChemDraw Accord for Excel Marvin Genomics Institute of the Novartis Research Foundation

6 JChem Cartridge – initial testing (July 2004) Daylight's substructure search: 5-6 seconds JCart substructure search: 10-12 sec (caching the whole structure table in Oracle) Similarity search is approximately 40-50 minutes (1.76million) JChem results: 10.6 minutes for 3 million structures (3 GHz Pentium 4) SmilesCount Time (ms) NC(=O)C(=NOC)c1csc(N)n15212044 OC(=O)c1cc(O)c(O)c(O)c153012091 OC1CC(C)(N)C(O)C(C)O1311420 c1ccc(OS(=O)(=O)O)cc14810873 C2OC(n1ccc(=O)[nH]c1=O)C(O)C2O010310 Cc1ccc(N(CCCl)CCCl)cc13611482 OC1OC(C)C(O)C(OC)C1OC28314216 C1OC(CO)C(O)C(OC(N)=O)C1O210857 NC(=O)C(N)Cc1cnc[nH]131518011 OC(=O)c1cc(OC)c(OC)c(OC)c143611779 OC1OC(CO)C(O)C(OC(N)=O)C1O210764 c1c(C)[nH]c(=O)[nH]c1=O010919 C(=O)NC(C=O)CCCNC(=N)N183515184 Genomics Institute of the Novartis Research Foundation

7 Brainstorming Initial attempts Reduce SOAP's overhead Tuning on fingerprint parameters when creating the structure table Observation SELECT statement that is used in screening: 16-17s select cd_id, cd_smiles from SCOTT.NCI_3M where BITAND(cd_fp1,2144163094) = 2144163094 AND BITAND(cd_fp2,1689182963) = 1689182963 AND … During screening, the CPU usage is only 30-40%, mainly I/O activity. Second attempts Fail to PIN the fingerprints column alone into memory in Oracle. Solution Preliminary studies show that the substructure search drops below 1 sec. The cache will consume only around 100MB/million, more scalable. Challenge: structure-synchronization issues. Genomics Institute of the Novartis Research Foundation

8 Performance Testing – Substructure Search JChem Cartridge (sec) DayCart (sec) JCart (sec) DayCart (sec) Genomics Institute of the Novartis Research Foundation

9 0.2 sec screening + 0.375ms/hit Performance Testing – Substructure Search (cont.) Genomics Institute of the Novartis Research Foundation

10 New Cartridge Features – SQL Filtering Genomics Institute of the Novartis Research Foundation Use filtering can dramatically improve performance SQL Cost: 240sec select count(*) from cpd where jc_compare(jc_smiles,'c1ccccc1','sep=! t:s')=1 SQL Cost: 0.25sec select * from cpd where jc_compare(jc_smiles,'c1ccccc1','sep=! t:s! filterQuery:select c.rowid from cpd_instance i, cpd c where i.plate_sid=268191 and i.cpd_sid=c.cpd_sid')=1 Challenge: when SQL-filtering is appropriate? SQL Cost: 25sec select * from cpd where jc_compare(jc_smiles,'CCCCCCOc1cccc(C=NOCC(O)COc2cccc(c2)C(C)C) c1','sep=! t:s!filterQuery:select c.rowid from cpd c where cpd_sid>0')=1 SQL Cost: 0.25sec select count(*) from cpd where jc_contains(jc_smiles,'CCCCCCOc1cccc(C=NOCC(O)COc2cccc(c2)C(C)C)c 1')=1

11 DayCart to JCart Migration Challenges Identical structures or not? Two identical structures considered by Daylight ideally remains identical by JChem, and vice versa. Identical structures or not? Two identical structures considered by Daylight ideally remains identical by JChem, and vice versa. Example 1: Aromatic Sulfur Example 1: Aromatic Sulfur COC1=NC(=NS(=N1)C2CCCCC2)ClC Oc1nc(Cl)ns(n1)C2CCCCC2 Genomics Institute of the Novartis Research Foundation Solution: Jchem support Daylight rules

12 Challenges – Identical Structures? (cont.) [2H][C@H]1O[C@H]1COCc2ccccc2 C(OCc1ccccc1)[C@H]2CO2 Example 2: Isotope Bug Example 2: Isotope Bug Example 3: Standardization Example 3: Standardization CC1=CC=N(C)C=C1 Cc1cc[n+](C)cc1 Genomics Institute of the Novartis Research Foundation

13 Challenges – Identical Structures? (cont.) Brc1ccc2[N]c3nc4ccccc4nc3-c2c1 Brc1ccc2Nc3nc4ccccc4nc3- c2c1 Example 4: Non-standard Bond Example 4: Non-standard Bond Example 4: Chirality Example 4: Chirality C[C@]1(O)C[C@@]23C[C@H](O)C4C(CCC5=CC(=O)CC[C@]45C)C3CCC12 *c1ccc(COCC2CCC=CO2)cc1 or NULL (not supported by JCart) Incomplete Structures in Database Incomplete Structures in Database Genomics Institute of the Novartis Research Foundation

14 Migration Challenges - LogP 50% compounds have both values agreed within 30% Genomics Institute of the Novartis Research Foundation

15 Applications in LDDB Structure Display Instead of using Marvin applets for structure display, LDDB uses a structure image servlet. This strategy improves display speed, overcomes undesirable browser caching. Structure Search In-house and vendor collection, followed by hit analysis. Click on an image for Marvin applet pop-up. Genomics Institute of the Novartis Research Foundation

16 Ongoing Developments … Other applications R-group analysis Most-common substructure analysis Database-wise clustering analysis Thank You! ZHOU@GNF.ORG Genomics Institute of the Novartis Research Foundation


Download ppt "Using ChemAxon Toolkits in the Lead Discovery Database at GNF Genomics Institute of the Novartis Research Foundation Genomics Institute of the Novartis."

Similar presentations


Ads by Google