1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics and Chemical Modelling to Drug Discovery · 8-19 Nov 2004 Updated. April, 2005 Structural Search Using ChemAxon Tools
Slide 2 Structural Search Using ChemAxon Tools April Contents Structural search in cheminformatics The JChem suite of tools Structural search in JChem Interfaces Database solutions: JChemBase, Cartridge Standardization Search features MCS/MCES and Library MCS R-group decomposition The Chemical Terms language Future plans All examples generated by ChemAxons Marvin
Slide 3 Structural Search Using ChemAxon Tools April Structural search in cheminformatics A few examples to highlight the diversity of applications. : Compound registration – duplicate checking Database search e.g. imidazole derivatives Pharmacophoric group identification (JChem Screen, JKlustor) Functional group identification Cleavage bond identification (JChem Fragmenter) Virtual reaction processing (JChem Reactor) Standardization (canonicalization of structures, JChem Standardizer) Toxical fragment identification (superstructure search)
Slide 4 Structural Search Using ChemAxon Tools April Search types in JChem ABAS(Atom By Atom Search) or structural search: – Exact – Substructure – Superstructure – MC(E)S – maximum common (edge) substructure – R-group decomposition (identify ligands of a given scaffold) Similarity search: – Different Descriptors – Different Metrics
Slide 5 Structural Search Using ChemAxon Tools April ABAS search interfaces JSP(Java Server Pages): web GUI for database –Similarity & structural search –Substructure highlighting –Additional constraints –Insert, modify, delete Command line utility: jcsearch: for files and DB Java API –isMatching() – Only to check matching –findFirst(), findNext()Enumerate all –findAll() possible matchings Cartridge: access all functionality from SQL Chemical Terms
Slide 6 Structural Search Using ChemAxon Tools April ABAS options General options: Order sensitive hits e.g. Pre-assignment of query and target atoms Consider stereo or not, absolute stereo (ignore chiral flag) Timeout limit Exact charge/radical/isotope/query features/bond/stereo matching Double bond stereo: no check/marked/all double bonds Chemical Terms filter expression etc Database search: Maximum search time/number of hits Additional SQL SELECT expression for prefiltering Output table Reverse hits mode
Slide 7 Structural Search Using ChemAxon Tools April Structural search in database Search: two stage method: – Rapid pre-screening based on chemical hashed fingerprints – ABAS (isMatching) Duplicate check at compound registration: – Hash code: primary filter – ABAS (isMatching) Standardization Caching of structures and fingerprints allow top performance
Slide 8 Structural Search Using ChemAxon Tools April Import with JChem Base Manager
Slide 9 Structural Search Using ChemAxon Tools April JChem Base molecular file formats and integration Import formats: SMILES MDL molfile (v2000 and v3000) MDL SDF RXN RDF MRV Database engines: Oracle MySQL MS SQL Server PostgreSQL MS Access DB2 etc. CML PDB Sybyl molfile XYZ Gaussian cube Image formats for export (JPG, PNG, SVG) OS: any operating systems running java Windows Linux Mac OS X Solaris etc.
Slide 10 Structural Search Using ChemAxon Tools April JChem Base performance (1) Compound registration: Substructure search in a table of 3 million compounds: Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i Number of compounds Elapsed time Duplicates not checkedDuplicates checked 10,00032s45s 100,0004min 11s6min 20s 200,0008min 17s12min 26s Search time (s)Number of hitsQuery
Slide 11 Structural Search Using ChemAxon Tools April JChem Base performance (2) Similarity search: Tanimoto >0.8 Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i Search time (s)Number of hitsQuery
Slide 12 Structural Search Using ChemAxon Tools April JChem Cartridge for Oracle
Slide 13 Structural Search Using ChemAxon Tools April JChem Cartridge for Oracle Oracle is extended to support chemical database operations using the JChem Cartridge for Oracle Examples: Substructure search displaying ID, SMILES codes, and molweight: SELECT cd_id, cd_smiles, cd_molweight FROM my_structures WHERE jc_contains(cd_smiles, 'CC(=O)Oc1ccccc1C(O)=O') = 1; Similarity search filtered with predicted pKa values, which displays predicted logP and logD values: SELECT cd_id, jc_logP(cd_smiles), jc_logD(cd_smiles, 7.4) FROM my_structures WHERE jc_tanimoto(cd_smiles, 'CC(=O)Oc1ccccc1C(O)=O') >= 0.8 AND jc_pKa(cd_smiles, 'acidic', 1) < 4; JChem Cartridge for Oracle
Slide 14 Structural Search Using ChemAxon Tools April JChem Cartridge for Oracle Chemical Terms examples: Number of compounds in table nci_10m containing benzene and conforming the Lipinski rule of 5: SELECT count(*) FROM nci_10m WHERE jc_compare(structure, 'c1ccccc1','sep=! t:s!ctFilter:(mass() <= 500) && (logP() <= 5) && (donorCount() <= 5) && (acceptorCount() <= 10)') = 1 Compounds in table nci_10m containing 3-bromoindole and restricting TPSA, molecular weight, rotatable and aromatic ring counts: SELECT cd_structure FROM nci_10m WHERE jc_compare(structure, 'Brc1cnc2ccccc12','sep=! t:s!ctFilter:(PSA() <= 200) && (rotatableBondCount() <= 10) && (mass() <= 500) && (aromaticRingCount() <= 4) ') = 1 New interface to ChemAxon API features from SQL accessible from non-java programs as well. Enhanced performance of certain SQL queries. JChem Cartridge for Oracle
Slide 15 Structural Search Using ChemAxon Tools April Query features 1. Atomic features Query atom types: any, hetero, list, not list Pseudo atoms e.g. Resin Explicit lone pairs (matches to implied lone pairs as well. Charge, isotope, radical Query properties: SymbolDescription H Total hydrogen count aAromatic AAliphatic R Ring count in SSSR r Ring size in SSSR v valence X Connectivity
Slide 16 Structural Search Using ChemAxon Tools April Query features 2. Atomic SMARTS features SMARTS atoms: Additional query properties: Example: Carbonyl C, but not amide SymbolDescription D Degree h Implicit H count & ;, !Logical operators $( )Recursive smarts +0, -0Zero charge
Slide 17 Structural Search Using ChemAxon Tools April Query features 3. Bond features & components Query bond types: Any, single or double, single or aromatic, double or aromatic Bond topology: chain/ring Smarts bonds Component level grouping SymbolDescription - = #Single, double, triple :aromatic &, ; !Logical bond / \ /? \?Directional bond (cis/trans) SymbolDescription (C.C)Same component (C).(C)Different component C.CNo component restrictions
Slide 18 Structural Search Using ChemAxon Tools April Stereo searching 1. Double bonds Levels of check: –All –Only marked double bonds (MDL: stereo care flag) –None Not cis Not trans Cis or trans (unknown) Trans Cis MeaningDepiction
Slide 19 Structural Search Using ChemAxon Tools April Stereo searching 2. Tetrahedral chirality Stereo bond types: Relative stereo configuration Chiral flag model Enhanced stereo representation: AND, OR, ABS groups Up or downDownUp
Slide 20 Structural Search Using ChemAxon Tools April Reaction search Reactants, agents, products Transformation recognition (mapping) Stereospecific reactions (inversion, retention) Reactant grouping
Slide 21 Structural Search Using ChemAxon Tools April R-group search Scaffold, R-group definitions Monovalent, divalent R-groups R-logic Occurrence If-then RestH
Slide 22 Structural Search Using ChemAxon Tools April Hydrogens H representations: – Explicit – Implicit – Query H count (total or implicit) Example: Considered in ABAS Explicit HImplicit HQuery H count Query Target Query
Slide 23 Structural Search Using ChemAxon Tools April Standardization Explicit hydrogens removal Aromatic bonds Mesomers Tautomers Counterions Stereo representation
Slide 24 Structural Search Using ChemAxon Tools April Standardization - Aromaticity Representations KekuléAromatic Example: The two Kekulé representations below dont match Two options available: ChemAxon & Daylight aromatization
Slide 25 Structural Search Using ChemAxon Tools April Standardization Example afterbefore
Slide 26 Structural Search Using ChemAxon Tools April Similarity search Descriptors: – Chemical hashed fingerprint – 2D (topological) pharmacophore fingerprint – BCUT – Structural keys – Hypothesis fingerprints: minimum, average Dissimilarity Metrics: – Tanimoto: standard, scaled, asymmetric – Euclidean: standard, normalized, weighted, asymmetric – Optimized for a set of actives
Slide 27 Structural Search Using ChemAxon Tools April MC(E)S 1. Pairs of molecules The largest connected common subgraph Application: reaction automapping in Marvin
Slide 28 Structural Search Using ChemAxon Tools April MCS 2. Library MCS The LibMCS program rapidly creates a hierarchy of MCS-es on a library. Applications: Identification of the most frequently occurring MCS. Focused set analysis Clustering based on common substructures
Slide 29 Structural Search Using ChemAxon Tools April Hierarchy calculation performance LibraryLibrary size Time(s)ClustersTop level clusters No. of levels NCI (small molecules, random, diverse sets) , , d2 inhibitors (medium sized molecules, low diversity) , Thrombin inhibitors (medium sized molecules, medium diversity) 1, ,
Slide 30 Structural Search Using ChemAxon Tools April R-group decomposition JChem is able to identify the ligands of a given scaffold at specified substitution positions: Query(scaffold) Result Library R-group decomposition
Slide 31 Structural Search Using ChemAxon Tools April Applications of Chemical Terms CT virtual synthesis reaction and synthesis rules pharmacophore analysis pharmacophore definitions drug design goal functions structural search advanced query expressions e.g. in the Cartridge
Slide 32 Structural Search Using ChemAxon Tools April Chemical Terms searching match("olefine.mol") && !match("c1ccncc1") && (atomCount(16) == 0) || (mass() < 300); goal functions inhibitor = inhibitor.mol; (similarity(inhibitor, pharmacophore_tanimoto) > 0.8) && (similarity(inhibitor, chemical_tanimoto) < 0.5); filtering (mass() <= 500) && (logP() <= 5) && (donorCount() <= 5) && (acceptorCount() <= 10); structure matching functions (describing functional groups, reaction sites, similarity…) property calculations (partial charge distribution, pKa, logP, electrophility…, etc) arithmetic and logic-operators Elements of the language Chemical Terms examples
Slide 33 Structural Search Using ChemAxon Tools April Chemical Terms Some available functions Structural search (match, matchcount) Partial charge distribution pKa, Log P, Log D, major microspecies Polarizability Topological Polar Surface Area Number of rotatable bonds, rings, aromatic rings, etc. Number of HB donors/acceptors Exact mass Arithmetic and logic operators Extensible: your own Java plugins can be easily added. Etc.
Slide 34 Structural Search Using ChemAxon Tools April Future plans More query features (e.g link nodes, ring bond count, unsaturated atom) Flexible search options: tautomeric search, ignore bond types, salts, etc. Search targets having R-groups (Markush structures) etc.
Slide 35 Structural Search Using ChemAxon Tools April Summary Structural search provides a useful set of tools for chemists and cheminformaticians. ChemAxon JChem suite contains a broad range of chemical search facilities and the presented benchmark results illustrate the high performance of JChem search. The new Chemical Terms language is a beneficial complement to structural searches allowing data mining made easy.
Slide 36 Structural Search Using ChemAxon Tools April Links Home page – Forum – Animated demos and tutorials – Presentations and posters –
Slide 37 Structural Search Using ChemAxon Tools April Máramaros köz 3/a Budapest, 1037 Hungary Thank you for your attention