Presentation is loading. Please wait.

Presentation is loading. Please wait.

JChem Base chemical database

Similar presentations


Presentation on theme: "JChem Base chemical database"— Presentation transcript:

1 JChem Base chemical database
Szilárd Dóránt May, 2005

2 Contents Introduction Structural overview Compatibility Administration
JChem tables Fingerprints Structural search Structure cache Standardization Search options JSP example API examples Performance Future plans

3 Introduction JChem Base provides high performance Java based tools for the storage, search and retrieval of chemical structures and associated data. These components can be integrated into web-based or standalone applications in association with other ChemAxon tools.

4 Structural overview Application Web application (JSP) JChem Base API:
Chemical logic Structure cache Web browser JDBC driver: Standard interface to the RDBMS RDBMS (e.g. Oracle, MySQL, etc.) : Storage and security

5 Compatibility and integration
File formats: SMILES MDL molfile (v2000 and v3000) MDL SDF RXN RDF MRV Integration: 100% Java extensive API JChem Cartridge for Oracle Database engines: Oracle MySQL MS SQL Server PostgreSQL MS Access DB2 etc. Operating systems: Windows Linux Mac OS X Solaris

6 Administration with JChemManager
User interface for creating tables import export deleting rows dropping tables Most functions are also available from command-line.

7 The property table The property table stores information about JChem structure tables, including: Fingerprint parameters Custom standardization rules Recent changes (to optimize cache updates) Other table options and information Database-related licence keys More than one property table can be used, each property table represents a particular JChem environment.

8 The structure of JChem tables
Column name Explanation cd_id unique numeric identifier in the table cd_structure the imported structure in the original format, without modifications (except for the removal of data fields) cd_smiles the standardized structure in ChemAxon Extended Smiles (cxsmiles) format, used by the search process cd_formula the formula of the standardized structure cd_molweight the molecular weight of the standardized structure cd_hash hash code used for duplicate filtering (PERFECT search) cd_flags can store row specific option, e.g. overriding the chiral flag cd_timestamp the date and time of the insertion of the row cd_fp… fingerprint columns [user fields] custom data fields can be added by the user

9 Chemical Hashed Fingerprints
Chemical Hashed Fingerprints encode structural patterns in bit strings If structure A is a substructure of structure B, every bit in B’s fingerprint will be set that is set in structure A’s fingerprint: Tanimoto similarity of hashed fingerprints can be used for diversity analysis and similarity search:

10 Structural search in database
Two stage method provides optimal performance: Rapid pre-screening reduces the number of possible hit candidates Chemical Hashed Fingerprints are used for substructure and superstructure searches Hash code is used for duplicate filtering (usually during compound registration) Graph search algorithm is used to determine the final hit list

11 Structure Cache Contains Fingerprints for screening and ChemAxon Extended SMILES for ABAS Instant access to the structures for the search process Reduced load on the database server Incremental update ensures minimum overhead after changes in the table Small memory footprint due to SMILES compression Optimized storage technique Approximately 100MB memory needed for 1 million typical drug-like structures (using 512 bit long fingerprints)

12 Standardization Default standardization includes: Hydrogen removal
Aromatization Custom standardization can be specified for each table by specifying an XML configuration file at table creation or in the “Regenerate” dialog of JChem Manager (jcman)

13 Custom Standardization Example
before after

14 Database search options
Maximum search time / number of hits SQL SELECT statement for pre-filtering Ordering of results Result table Inverse hit list Chemical Terms filter constraint

15 JSP example application
Open source, customizable Features: Substructure, Superstructure, Exact and Similarity search Molecular Descriptor similarity search with descriptor coloring Substructure hit alignment and coloring, inverse hit list Chemical Terms filter Import / Export Export of hits Insert / Modify / Delete structures

16 API example : connecting to a database
ConnectionHandler ch = new chemaxon.jchem.db.ConnectionHandler(); ch.setDriver(“oracle.jdbc.driver.OracleDriver”); ch.setPropertyTable(“JChemProperties”); ch.setLoginName(“scott”); ch.setPassword("tiger"); ch.connect(); // the java.sql.Connection object is available if needed: Connection con=ch.getConnection(); // closing the connection: ch.close();

17 API example : database import
Importer importer = new chemaxon.jchem.db.Importer(); importer.setConnectionHandler(conh); importer.setInput(“sample.sdf”); // importer.setInput(is); // alternatively a stream can also be specified importer.setTableName(“SCOTT.STRUCTURES”); importer.setHaltOnError(false); importer.setDuplicateImportAllowed(false); //can filter duplicates // specifying SDFile field - table field pairs: String fieldPairs = “DB_Field1=SDF_Field1; DB_Field2=SDF_Field2”; importer.setFieldConnections(fieldPairs); int importedCount = importer.importMols(); System.out.println( “Imported” + importedCount + “structures” );

18 API example : database export
Exporter exporter = new chemaxon.jchem.db.Exporter(); exporter.setConnectionHandler(conh); exporter.setTableName(“structures”); //data fields to be exported with the structure: exporter.setFieldList(“cd_id cd_formula name comments”); String fileName=“output.sdf”; OutputStream os=new FileOutputStream(fileName); exporter.setOutputStream(os); exporter.setFormat(“sdf”); int exportedCount = exporter.writeAll(); System.out.println(“Exported ” + exportedCount + “structures”);

19 API example : database search
JChemSearch searcher = new chemaxon.jchem.db.JChemSearch(); searcher.setConnectionHandler(ch); searcher.setSearchType(JChemSearch.SUBSTRUCTURE) searcher.setQueryStructure(“c1ccccc1”); searcher.setStructureTable(“SCOTT.STRUCTURES”); // a query that returns cd_id values can be used for prefiltering: Searcher.setFilterQuery( “SELECT cd_id FROM structures, biodata WHERE ” + “structures.cd_id = biodata.cd_id AND biodata.toxicity < 0.3” ); searcher.setWaitingForResult(true); // otherwise runs in a separate thread searcher.setStructureCaching(true); // caching speeds up the search searcher.run(); // getting the results as cd_id values: int[] results=searcher.getResults();

20 API example : inserting a structure
// ConnectionHandler, mode, table name and data field names: UpdateHandler uh = new chemaxon.jchem.db.UpdateHandler( ch, UpdateHandler.INSERT, “structures”, “comment, stock”); uh.setValueForFixColumns(“c1ccccc1”); // the structure // specifying data field values: uh.setStructureValueForAdditionalColumn(1, “some text”); uh.setStructureValueForAdditionalColumn(2, new Double(8.5)); uh.setDuplicateFiltering(true); // filtering duplicate structures int id=uh.execute(true); // getting back the cd_id of the inserted structure if ( id > 0 ) { System.out.println(“Inserted, cd_id value : ” + id); } else { System.out.println(“Already exists with cd_id value : ” + (-id)); } // storing update information, the database connection remains open : uh.close();

21 Performance (1) Compound registration:
Substructure search in a table of 3 million compounds: Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i 12min 26s 8min 17s 200,000 6min 20s 4min 11s 100,000 45s 32s 10,000 Duplicates checked Duplicates not checked Elapsed time Number of compounds 10.7 49740 1.2 0.9 936 0.1 12 Search time (s) Number of hits Query Latest speed benchmarks:

22 Performance (2) Similarity search: Tanimoto >0.8
Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i 1.3 336 156 1.5 24 Search time (s) Number of hits Query Latest speed benchmarks:

23 Future plans Additional layer: JChem Server (later also as grid)
Structural keys as optional extension to current fingerprints Tables for storing query structures Tables for storing general (Markush) structures Partial clean option for hit alignment Installer etc.

24 Summary ChemAxon’s JChem Base toolkit provides sophisticated methods to deal with chemical structures and associated data. The usage of fingerprints and structure cache provide high search performance.

25 Links JChem home page: Live demos: API documentation: Brochure:
Live demos: API documentation: Brochure:

26 Thank you for your attention
Máramaros köz 3/a Budapest, 1037 Hungary


Download ppt "JChem Base chemical database"

Similar presentations


Ads by Google