JChem Base chemical database

Slides:



Advertisements
Similar presentations
Using the SQL Access Advisor
Advertisements

Advanced SQL Topics Edward Wu.
1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics.
Version 5.3, February 2010 Scientific & technical presentation JChem Base.
Scientific & technical presentation JChem Cartridge for Oracle
1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia January, 2007 Structural Search Using ChemAxon.
Java Solutions for Cheminformatics Feb 2008 Whats new for PP.
Version 5.3, April 2010 The ChemAxon Markush project overview and development discussion.
Calculator Plugins József Szegezdi, Nóra Máté. ChemAxon Calculator Plugins ChemAxons plugin handling mechanism provides a framework for calculating various.
Structural Search Using ChemAxon Tools
Scientific & technical presentation Standardizer January 2008.
Nov 2008 Scientific & technical presentation JChem for Excel.
Szilárd Dóránt May 2006 Building on JChem Base. Contents Introduction Structural overview The Property Table JChem structure tables The log table Standardization.
Whats new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics.
In Silico Synthesis György Pirok, Nóra Máté. Elements of the Virtual Synthesis Technology A language for describing chemical rules –Chemical Terms A library.
Solutions for Cheminformatics
1 Miklós Vargyas May, 2005 Compound Library Annotation.
UGM, June, 2007 Presenting: Szabolcs Csepregi JChem Base and Cartridge latest.
Instant JChem - current status and what's coming soon. Tim Dudgeon Solutions for Cheminformatics.
1 Szabolcs Csepregi May, 2005 Structural Search Using ChemAxon Tools.
1 György Pirok, Szilárd Dóránt May, 2005 What is Marvin and how to...
Agricultural Products Group 1 ChemAxons Marvin & JChem (v 3.1.3) vs. MDL® ISIS/Draw ISIS/Host (v 4.0) Seong Jae Yu, David Roush, Usha Ganesh Young Moon,
1 Miklós Vargyas, Judit Papp May, 2005 MarvinSpace – live demo.
2008 Accelrys EUGM Pipelining ChemAxon Szilard Dorant Solutions for Cheminformatics.
Java Solutions for Cheminformatics March About Us Molecule Drawing and Visualization Structure Searching Cartridge Structure Standardization Molecular.
Solutions for Cheminformatics
Chapter 1: The Database Environment
Copyright © 2003 Pearson Education, Inc. Slide 8-1 The Web Wizards Guide to PHP by David Lash.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Servlets and Java Server Pages.
BASIC SKILLS AND TOOLS USING ACCESS
Relational Database and Data Modeling
17. Data Access ADO.Net Architecture New Features of ADO.NET
11 Copyright © 2005, Oracle. All rights reserved. Using Arrays and Collections.
16 Copyright © 2005, Oracle. All rights reserved. Using JDBC to Access the Database.
6 Copyright © 2005, Oracle. All rights reserved. Building Applications with Oracle JDeveloper 10g.
17 Copyright © 2005, Oracle. All rights reserved. Deploying Applications by Using Java Web Start.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Enterprise Java and Data Services Designing for Broadly Available Grid Data Access Services.
What's new?. ETS4 for Experts - New ETS4 Functions - improved Workflows - improvements in relation to ETS3.
Information Systems Today: Managing in the Digital World
Database Performance Tuning and Query Optimization
MySQL Access Privilege System
1 Web-Enabled Decision Support Systems Access Introduction: Touring Access Prof. Name Position (123) University Name.
Word Lesson 7 Working with Documents
Microsoft Access.
State of Connecticut Core-CT Project Query 8 hrs Updated 6/06/2006.
Vanderbilt Business Objects Users Group 1 Reporting Techniques & Formatting Beginning & Advanced.
Review Chapter 11 - Tables © 2010, 2006 South-Western, Cengage Learning.
ECATS RCCA CAMP PROCESS ENHANCEMENTS
OFFICE OF SUPERINTENDENT OF PUBLIC INSTRUCTION Division of Assessment and Student Information Online MSP Testing Technology & Assessment Coordinator Training.
Chapter 11: The X Window System Guide To UNIX Using Linux Third Edition.
Benchmark Series Microsoft Excel 2013 Level 2
HORIZONT TWS/WebAdmin TWS/WebAdmin for Distributed
4 Oracle Data Integrator First Project – Simple Transformations: One source, one target 3-1.
Chapter 9: The Client/Server Database Environment
Presented by Douglas Greer Creating and Maintaining Business Objects Universes.
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
25 seconds left…...
Performance Tuning for Informer PRESENTER: Jason Vorenkamp| | October 11, 2010.
WaveMaker Visual AJAX Studio 4.0 Training
© Paradigm Publishing, Inc Access 2010 Level 2 Unit 2Advanced Reports, Access Tools, and Customizing Access Chapter 8Integrating Access Data.
Chapter 8 Improving the User Interface
CSCI3170 Introduction to Database Systems
Reporting Services – Data Driven Subscriptions
CSCI 6962: Server-side Design and Programming
ASP.NET Programming with C# and SQL Server First Edition
Preface IIntroduction Course Objectives I-2 Course Content I-3 1Introduction to Oracle Reports Developer Objectives 1-2 Business Intelligence 1-3 Enterprise.
Introduction to Database Programming with Python Gary Stewart
JDBC.
Presentation transcript:

JChem Base chemical database Szilárd Dóránt May, 2005

Contents Introduction Structural overview Compatibility Administration JChem tables Fingerprints Structural search Structure cache Standardization Search options JSP example API examples Performance Future plans

Introduction JChem Base provides high performance Java based tools for the storage, search and retrieval of chemical structures and associated data. These components can be integrated into web-based or standalone applications in association with other ChemAxon tools.

Structural overview Application Web application (JSP) JChem Base API: Chemical logic Structure cache Web browser JDBC driver: Standard interface to the RDBMS RDBMS (e.g. Oracle, MySQL, etc.) : Storage and security

Compatibility and integration File formats: SMILES MDL molfile (v2000 and v3000) MDL SDF RXN RDF MRV Integration: 100% Java extensive API JChem Cartridge for Oracle Database engines: Oracle MySQL MS SQL Server PostgreSQL MS Access DB2 etc. Operating systems: Windows Linux Mac OS X Solaris

Administration with JChemManager User interface for creating tables import export deleting rows dropping tables Most functions are also available from command-line.

The property table The property table stores information about JChem structure tables, including: Fingerprint parameters Custom standardization rules Recent changes (to optimize cache updates) Other table options and information Database-related licence keys More than one property table can be used, each property table represents a particular JChem environment.

The structure of JChem tables Column name Explanation cd_id unique numeric identifier in the table cd_structure the imported structure in the original format, without modifications (except for the removal of data fields) cd_smiles the standardized structure in ChemAxon Extended Smiles (cxsmiles) format, used by the search process cd_formula the formula of the standardized structure cd_molweight the molecular weight of the standardized structure cd_hash hash code used for duplicate filtering (PERFECT search) cd_flags can store row specific option, e.g. overriding the chiral flag cd_timestamp the date and time of the insertion of the row cd_fp… fingerprint columns [user fields] custom data fields can be added by the user

Chemical Hashed Fingerprints Chemical Hashed Fingerprints encode structural patterns in bit strings If structure A is a substructure of structure B, every bit in B’s fingerprint will be set that is set in structure A’s fingerprint: Tanimoto similarity of hashed fingerprints can be used for diversity analysis and similarity search:

Structural search in database Two stage method provides optimal performance: Rapid pre-screening reduces the number of possible hit candidates Chemical Hashed Fingerprints are used for substructure and superstructure searches Hash code is used for duplicate filtering (usually during compound registration) Graph search algorithm is used to determine the final hit list

Structure Cache Contains Fingerprints for screening and ChemAxon Extended SMILES for ABAS Instant access to the structures for the search process Reduced load on the database server Incremental update ensures minimum overhead after changes in the table Small memory footprint due to SMILES compression Optimized storage technique Approximately 100MB memory needed for 1 million typical drug-like structures (using 512 bit long fingerprints)

Standardization Default standardization includes: Hydrogen removal Aromatization Custom standardization can be specified for each table by specifying an XML configuration file at table creation or in the “Regenerate” dialog of JChem Manager (jcman) http://www.jchem.com/doc/user/Standardizer.html

Custom Standardization Example before after

Database search options Maximum search time / number of hits SQL SELECT statement for pre-filtering Ordering of results Result table Inverse hit list Chemical Terms filter constraint

JSP example application Open source, customizable Features: Substructure, Superstructure, Exact and Similarity search Molecular Descriptor similarity search with descriptor coloring Substructure hit alignment and coloring, inverse hit list Chemical Terms filter Import / Export Export of hits Insert / Modify / Delete structures

API example : connecting to a database ConnectionHandler ch = new chemaxon.jchem.db.ConnectionHandler(); ch.setDriver(“oracle.jdbc.driver.OracleDriver”); ch.setUrl(“jdbc:oracle:thin:@localhost:1521:mydb”); ch.setPropertyTable(“JChemProperties”); ch.setLoginName(“scott”); ch.setPassword("tiger"); ch.connect(); // the java.sql.Connection object is available if needed: Connection con=ch.getConnection(); … // closing the connection: ch.close();

API example : database import Importer importer = new chemaxon.jchem.db.Importer(); importer.setConnectionHandler(conh); importer.setInput(“sample.sdf”); // importer.setInput(is); // alternatively a stream can also be specified importer.setTableName(“SCOTT.STRUCTURES”); importer.setHaltOnError(false); importer.setDuplicateImportAllowed(false); //can filter duplicates // specifying SDFile field - table field pairs: String fieldPairs = “DB_Field1=SDF_Field1; DB_Field2=SDF_Field2”; importer.setFieldConnections(fieldPairs); int importedCount = importer.importMols(); System.out.println( “Imported” + importedCount + “structures” );

API example : database export Exporter exporter = new chemaxon.jchem.db.Exporter(); exporter.setConnectionHandler(conh); exporter.setTableName(“structures”); //data fields to be exported with the structure: exporter.setFieldList(“cd_id cd_formula name comments”); String fileName=“output.sdf”; OutputStream os=new FileOutputStream(fileName); exporter.setOutputStream(os); exporter.setFormat(“sdf”); int exportedCount = exporter.writeAll(); System.out.println(“Exported ” + exportedCount + “structures”);

API example : database search JChemSearch searcher = new chemaxon.jchem.db.JChemSearch(); searcher.setConnectionHandler(ch); searcher.setSearchType(JChemSearch.SUBSTRUCTURE) searcher.setQueryStructure(“c1ccccc1”); searcher.setStructureTable(“SCOTT.STRUCTURES”); // a query that returns cd_id values can be used for prefiltering: Searcher.setFilterQuery( “SELECT cd_id FROM structures, biodata WHERE ” + “structures.cd_id = biodata.cd_id AND biodata.toxicity < 0.3” ); searcher.setWaitingForResult(true); // otherwise runs in a separate thread searcher.setStructureCaching(true); // caching speeds up the search searcher.run(); // getting the results as cd_id values: int[] results=searcher.getResults();

API example : inserting a structure // ConnectionHandler, mode, table name and data field names: UpdateHandler uh = new chemaxon.jchem.db.UpdateHandler( ch, UpdateHandler.INSERT, “structures”, “comment, stock”); uh.setValueForFixColumns(“c1ccccc1”); // the structure // specifying data field values: uh.setStructureValueForAdditionalColumn(1, “some text”); uh.setStructureValueForAdditionalColumn(2, new Double(8.5)); uh.setDuplicateFiltering(true); // filtering duplicate structures int id=uh.execute(true); // getting back the cd_id of the inserted structure if ( id > 0 ) { System.out.println(“Inserted, cd_id value : ” + id); } else { System.out.println(“Already exists with cd_id value : ” + (-id)); } // storing update information, the database connection remains open : uh.close();

Performance (1) Compound registration: Substructure search in a table of 3 million compounds: Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i 12min 26s 8min 17s 200,000 6min 20s 4min 11s 100,000 45s 32s 10,000 Duplicates checked Duplicates not checked Elapsed time Number of compounds 10.7 49740 1.2 0.9 936 0.1 12 Search time (s) Number of hits Query Latest speed benchmarks: http://www.jchem.com/FAQ.html#benchmark2 http://www.jchem.com/FAQ.html#benchmark3

Performance (2) Similarity search: Tanimoto >0.8 Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i 1.3 336 156 1.5 24 Search time (s) Number of hits Query Latest speed benchmarks: http://www.jchem.com/FAQ.html#benchmark2 http://www.jchem.com/FAQ.html#benchmark3

Future plans Additional layer: JChem Server (later also as grid) Structural keys as optional extension to current fingerprints Tables for storing query structures Tables for storing general (Markush) structures Partial clean option for hit alignment Installer etc.

Summary ChemAxon’s JChem Base toolkit provides sophisticated methods to deal with chemical structures and associated data. The usage of fingerprints and structure cache provide high search performance.

Links JChem home page: Live demos: API documentation: Brochure: www.jchem.com Live demos: www.jchem.com/examples API documentation: www.jchem.com/doc/api Brochure: www.chemaxon.com/brochures/JChemBase.pdf

Thank you for your attention Máramaros köz 3/a Budapest, 1037 Hungary info@chemaxon.com www.chemaxon.com