Java Solutions for Cheminformatics March 2005
About Us Molecule Drawing and Visualization Structure Searching Cartridge Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments
History Formed: 1998 Budapest, Hungary Skills base: Chemistry, Software development, Predictive tools Aim: Platform independent software for chemistry Highlights 1998: Custom projects 1999: Java tools for sketching/viewing structures 2000: Structure database support 2001: Clustering and diversity analysis 2003: Pharmacophore screening, property predictions, reaction processing, fragmenting 2004: Cartridge technology, virtual synthesis, improved SMARTS support
People Developers: 17 (7 Phd, 10 MSc) Technical expertise Cheminformatics Synthetic and physico- chemistry Virtual drug design Java Web technology Business Support: 3 (1 MSc, 2 BSc) Commercial expertise Negotiation & contracting Relationship management Collaboration steering and development Strategic marketing Mutually benefitial (win win) business relationships
Selected Application Areas Global licenses Custom development projects Value added constructions Websites/portal front and back end Educational
Product development Chemical drawing JChem Applets, Molfiles, stereo support, Windows, Unix SMILES, SMARTS, PDB, Rgroups, isotopes, shortcuts, Marvin Beans Ball and stick, JPG, PNG, SVG, Cut&Paste with Isis/ChemDraw, 2D cleaning, (de)aromatizatio n, reactions SDF, RDF, XYZ animations, CML, templates, compressed formats, Swing, 3D rendering Mac support, signed applets, Java Web Start, atom mapping Partial charge, pK a, logP, logD, 3D generation, radicals, Sgroups Oracle, MySQL, SQLServer, Access, hashed fingerprints, substructure and similarity searching DB2, PostgreSQL, Rgroup searching reaction searching, reaction processing, pharmacophore analysis. screening, standardization, fragmentation clustering, diversity Marvin 2004 Marvin file format, enhanced stereo, enhanced SMARTS support, shapes, text boxes, multiple groups, TPSA, Donor/Acceptor... cartridge, enhanced stereo searching, recursive SMARTS, chemical expressions, virtual synthesis… Structure Database and Cheminformatics toolkit
Current Products Overview
Multiple Deployment Formats Applications Java Applets Signed Java Applets Java Web Start Java Beans Plugins JSP
Why ChemAxon? Sophisticated virtual chemistry technology Platform independence and Web (Java) High performance tools (speed, capacity) Client oriented development Comprehensive API for the developers Detailed documentation Competitive prices Fast and reliable support
Product Support Fast response to support question – max. 24 hour response (fast solution also!) Final and beta releases available online. Detailed documents available online and extensive help bundled within software Skilled and relevant human support quality (direct developer to developer) Product development based on support requests Developers supporting developers
Molecule Drawing and Visualization About Us Molecule Drawing and Visualization Structure Searching Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments
Operating Systems 100% pure java Windows –95, 98, Me, NT, 2000, XP Macintosh –OS 9, OS X Unix –Linux, Solaris, Irix, etc.
Web Browsers Internet Explorer Netscape Mozilla Safari Opera
Marvin Various file formats Isotopes, charges, radicals Alias, pseudo atoms Templates Abbreviated groups Reactions Atom maps R-groups Stereo bonds, stereo configurations (R/S, E/Z) Enhanced stereo (ABS/AND/OR) SMARTS properties (atoms, bonds, recursive SMARTS) Chemical error checking Generic atoms and bonds Atom lists and not lists 2D cleaning 3D cleaning Various 3D models Shapes, text boxes Plugins
Various File Formats
Isotopes, Charges, Radicals
Templates
Abbreviated Groups
R-groups
Reactions
Rendered 3D displays with MarvinSpace
Structure Cleaning CC(C)NCC(O)COC1=C2C=C(C)NC2=CC=C1 3D2D topology
Structure Searching About Us Molecule Drawing and Visualization Structure Searching Cartridge Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments
Rapid fingerprint-based database scanning Sophisticated graph-based searching Integration with databases –Oracle –MS SQL Server –DB2 –MYSQL –PostgreSQL –InterBase –Access Custom standardization JChem Cartridge for searching in Oracle JSP integration JChem Base Features
Import with JChem Base Manager
Exact structure Substructure Atom lists and notlists Explicit hydrogens Generic atoms Generic bonds SMARTS atom properties –Aliphatic, aromatic –Hydrogen count –Connection count –Valence –Ring count –Smallest ring size –Recursive SMARTS Stereo atoms Stereo bonds R-group queries –R-groups –Occurence –if / then conditions –RestH Reaction search –Transformation recognition –Component identification –Stereospecific reactions (inversion, retention) Diastereomers –Enhanced stereo groups (Abs, And, Or) Query Features
JChem Base JSP Integration Thin client support: only a web browser and Java required
Cartridge Technology About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments
JChem Cartridge for Oracle Oracle can be extended to support chemical database operations using the JChem Cartridge for Oracle Examples: Substructure search displaying ID, SMILES codes, and molweight: SELECT cd_id, cd_smiles, cd_molweight FROM my_structures WHERE jc_contains(cd_smiles, 'CC(=O)Oc1ccccc1C(O)=O') = 1; Finding benzene derivatives conforming the Lipinskis rule of five: SELECT count(*) FROM my_structures WHERE jc_compare(structure, 'c1ccccc1','sep=!t:s!ctFilter: (mass() <= 500) && (logP() <= 5) && (donorCount() <= 5) && (acceptorCount() <= 10)') = 1; JChem Cartridge for Oracle
Example Oracle search returning similar structures with logP >1, which were acquired after April 14th, MarvinView below.
Structure Standardization About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments
Standardization Explicit hydrogens Aromatic bonds Mesomers Tautomers Counterions
Standardization Example afterbefore
Molecular Predictions About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments
Calculator Plugins Available Calculations Elemental analysis Charge distribution Polarizability pK a logP logD Polar surface area Huckel Analysis H-bond donor-acceptor Major microspecies Refractivity Calculation Interface Marvin GUI Command line Chemical Terms API
Elemental Analysis
Polar Surface Area
Partial Charge Distribution
Partial Charge Distribution Calculation Partial Equalization of Orbital Electronegativities (PEOE) Orbital electronegativity defined by Mulliken Orbital electronegativity of atom i: i =a t +b t q i +c t q i 2 q i : partial charge Partial charge of atom i is iteratively calculated based on Gasteigers method: i (0) = a t q i (0) = 0 q i (n+1) = q i (n) + ) n ( i - k )/ max( i, k ) k: index of a neighbor of atom i
Polarizability
logP
logP = f i f I : atomic logP increment logP Example
Validation of the logP prediction
logD
logD is computed using micro ionization constants (k i ), micro partition coefficients (p i ), and pH 123 (0) 1 + (1) 2 + (2) 3 - (3) (4 ) (5) (6) (7) k1k1 k2k2 k3k3 k4k4 k5k5 k6k6 k7k7 logD Example
pKapKa
pKa Plugin - Microconstants Micro ionization constants (logk) are calculated from regression equations that have three types of calculated parameters: Polarizabilities Partial charges Intramolecular interactions logk
Macro ionization constants (pK a ) are calculated from the microconstants (logk) pK a Plugin - Macroconstants Ionization scheme
Hydrogen Bonds in pK a Calculation logk = a q i - q k ) + b a,b: regression parameters Intramolecular hydrogen bonds are also taken into account
Validation of the pK a prediction
Chemical Expressions About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments
Chemical Terms searching match("olefine.mol") && !match("c1ccncc1") && (atomCount(16) == 0) || (mass() < 300); goal functions inhibitor = inhibitor.mol; (similarity(inhibitor, pharmacophore_tanimoto) > 0.8) && (similarity(inhibitor, chemical_tanimoto) < 0.5); filtering (mass() <= 500) && (logP() <= 5) && (donorCount() <= 5) && (acceptorCount() <= 10); structure matching functions (describing functional groups, reaction sites, similarity…) property calculations (partial charge distribution, pKa, logP, electrophility…) arithmetic and logic-operators Elements of the language Chemical Terms examples
Applications of Chemical Terms CT virtual synthesis reaction and synthesis rules pharmacophore analysis pharmacophore definitions drug design goal functions structure searching advanced query expressions
Screening About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments
Pharmacophore Mapping hydrophobic (h) aromatic (r) acceptor (a) acceptor / donor (a/d) donor / cationic (d/c) donor / aromatic (d/r) atom type colors pharmacophore type colors
Topological Pharmacophore Fingerprint hh h h h h h d/+ r/d r r r r r r r r d/a
Hypothesis Fingerprints AdvantagesDisadvantages Minimum strict selection of common features very sensitive to one missing feature Average not that sensitive to outliers less selective if actives are similar
Dissimilarity Metrics Euclidean standard normalized weighted asymmetric Tanimoto standard scaled asymmetric
Screening Optimization 10,000 test compounds (from NCI) 50 active compounds (ß-adrenoreceptor antagonists) 9,700 validation 300 optimization 1/3 training set 1/3 spikes 1/3 query set TRAINING VALIDATION
Screening Validation ß2-adrenoreceptor antagonists All compounds:9,700 Known active compounds:18 minimum hypothesis before optimization after optimization all hits2,47618 known active hits1518 enrichment
Mixing 18 active compounds with random 9,700 NCI molecules. Sorting by pharmacophore similarity. Active Hit Distribution ß2-adrenoreceptor antagonists
Screening Validation 10,000 NCI compoundsbefore optimizationafter optimization familyactivesall hitsactive hitsenrichmentall hitsactive hitsenrichment ACE76, Angiotensin D delta FTP mGluR NPY Y thrombin
Optimized Screening JSP Example
Optimized Screening JSP Example Hits
Clustering About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments
JKlustor JarvisPatrick Ward
Ward's minimum variance method Murtagh's reciprocal nearest neighbor (RNN) algorithm O(n 2 ) time complexity O(n) memory complexity Ward Clustering Features
8 active compound sets –5-HT3-antagonists –ACE inhibitors –angiotensin 2 antagonists –D2 antagonists –delta antagonists –FTP antagonists –mGluR1 antagonists –thrombin inhibitors Ward Pharmacophore Clustering Example
Ward Centroids
A Ward Cluster D2 antagonists
Maximum Common Substructure Clustering
Drug Design About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments
RECAP fragmentation example amide:1 amine:2ether:2 ether:1 amine:1 amide:2
Virtual Synthesis About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments
The Ideal Virtual Reaction Generic (simple) –the equation describes the transformation only –few hundred generic reactions can form the basic armory of a preparative chemist Specific (complex) –chemo-, recognizes reactive and inactive functional groups –regio-, "knows" directing rules –stereo-, inversion/retention Customizable –to improve reaction model quality
Processing selective "smart" reactions Batch mode (sequential or combinatorial combinations) Reverse direction High performance (speed and capacity) Customizable Reaction Engine! Reaction Modeling
Chemoselective Reaction Definition REACTIVITY:!match(ratom(3), "[#6][N,O,S:1][N,O,S]", 1) && !match(ratom(3), "[N,O,S:1][C,P,S]=[N,O,S]", 1)
Reactants 2920 amines, alcohols and thiols 369 isocyanates and isothiocyanates
Chemoselective Reaction Products 1,264,391 single site products
Regioselectivity (Markovnikov, Zaitsev) An elimination reaction definition with Zaitsevs rule. r2 Addition reaction definition with the Markovnikov rule. r1 SELECTIVITY:hcount(ratom(2)) SELECTIVITY:-hcount(ratom(2))
Regioselective Reaction Example Chlorine migration example in four steps by consecutive elimination and addition reactions. r2r1 r2r1r1
Regioselectivity (SeAr) Reaction definition of aromatic electrophile bromination of the benzene ring. The expression defines a regioselectivity rule for the major product. SELECTIVITY:-charge(ratom(1)) TOLERANCE:0.0045
Regioselectivity (SeAr) Products The virtual bromination of toluene with the above reacton definition results the ortho and para isomer as main product… … and bromine is directed into the meta position in case of nitro-benzene.
Regioselectivity (SeAr) Example Products 1,198 monobrominated main products (tolerance is set to zero)
Multiple steps Flexible compound dispatching Synthesis rules Synthesis tree building Memory, file and database mode Graphical synthesis browser Building block coloring Customizable Synthesis Engine! Virtual Synthesis
Synthesis Example Derek S. Tan, Michael A. Foley, Matthew D. Shair, Stuart L. Schreiber*, J. Am. Chem. Soc., 1998, 120, lacton aminolysisalkyne coupling esterification
Synthesis Definition Component set definition Set1: A Set2:B1, B2, B3 Set3: Set4:D1, D2 Set5: Set6:F1, F2 Set7: "Smart" reaction library R1: alkyl-iodid + alkyne >> alkyl-alkyne R2: lacton + amine >> amide R3: alcohol + carboxylic acid >> ester Synthesis route definition Step1:A + B C R1 Step2: C + D E R2 Step3: E + F G R3
Synthesis Browser
Current Developments About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments
Recent Developments Automatic searching of low-energy conformers Improved Oracle cartridge Structure searching combined with chemical calculations Exhaustive Synthesis for metabolism applications R-group decomposition Maximum common substructure search in molecule pairs and in libraries
Current Developments MarvinSpace, an OpenGL based 3D molecule and surface visualisation engine for small and macromolecules Instant JChem Base, a desktop and enterprise chemical database client with form builder IUPAC naming plugin Isoelectric point plugin Random Synthesis for building up a diverse virtual space of synthetically feasible compounds Extension of the reaction library Further descriptors in the Topology Analysis plugin
Future Plans Metabolic transformation library Diverse database of synthetically accessible compounds Search in Markush compounds Peptide builder Fragment-based activity analysis of compound libraries AnalogMaker (fragment based random evolutionary analog design) Retrosynthesis
Visit us Home page – Forum – Animated demos and tutorials – Presentations and posters –
Máramaros köz 3/a Budapest, 1037 Hungary Thank you for your attention