EBI is an Outstation of the European Molecular Biology Laboratory. Chemoinformatics and Metabolism Paula de Matos
Indexing, searching and dissemination of chemical information Cheminformatics Algorithms and Toolkits Natural Products and Metabolomics Chemoinformatics and Metabolism Group Research
Chemical Entities of Biological Interest A database containing a freely available, manually annotated dictionary of molecular entities focused on ‘small’ chemical compounds. Provides a method to navigate the chemical space via an ontology ChEBI aims to provide a central, definitive reference of chemical nomenclature.
Dictionary Resource for Nomenclature
Mostly small entities Big entities too like alumina amylose metaborate Excludes proteins and nucleic acids What does ChEBI cover?
7
ChEBI Web Services Programmatic access to a ChEBI entry SOAP based Java implementation Clients currently available in Java and perl Four methods with which to access data getLiteEntity getCompleteEntity getOntologyParents getOntologyChildren Documented at
ChEBI Status
ChEBI further info Mailing lists: Submitting data
> Lines of Code, >900 Classes, > 9000 Methods Library Generation Virtual Screening Molecular Property Prediction Visualization (1) Steinbeck, C.; Hoppe, C.; Kuhn, S.; Guha, R.; Willighagen, E. L. Current Pharmaceutical Design 2006, 12, (2) Steinbeck<, C.; Han, Y. Q.; Kuhn, S.; Horlacher, O.; Luttmann, E., Willighagen, E. Journal of Chemical Information and Computer Sciences 2003, 43, The Chemistry Development Kit (CDK): An Open Source Java-Library for Structural Chemo- and Bioinformatics
I/O (CML, MDL Molfile, SDF, PDB) SMILES InChI Input/Output Structure-Diagram-Layout (SDG) 2D Rendering 3D Rendering Visualization 3D Model-Builder Atom-Typing Force-Field Representation of Biomolecular Structures Modelling Isomorphism detection Maximum-Common-Substructure Searches SMARTS- and Substructure searches Ring searches Aromaticity detection Chemical Graphs Deterministic Isomer generator Stochastic Structure Generators via Simulated Annealing Genetic Algorithms Library Enumeration Fingerprinting > 70 QSAR-Descriptors QSAR model building Properties The Chemistry Development Kit (CDK)
Example: Structure Diagram Generation
COOH Hetero- aryl Bitscreen coding for structural features O-Alkyl- NH 2 Alky IMolecule superstructure = MoleculeFactory.makeIndole(); IMolecule substructure = MoleculeFactory.makePyrrole(); Fingerprinter fingerprinter = new Fingerprinter(); BitSet superBS = fingerprinter.getFingerprint(superstructure); BitSet subBS = fingerprinter.getFingerprint(substructure); boolean isSubset = FingerprinterTool.isSubset(superBS, subBS); Example: Fingerprinting
registered developers on SF 86 people subscribed to cdk-devel list 111 people subscribed to cdk-user list CDK in numbers
,966 downloads since 2001 CDK in numbers
CDK article (2003) cited 68 times CDK in numbers
CDK info Project home page: Mailing list: Documentation
OrChem Oracle chemistry plug-in using the Chemistry Development Kit (CDK) providing substructure and similarity searches for chemical graphs.Chemistry Development Kit OrChem is suitable for Oracle 11G and onwards Not an Oracle data cartridge - it doesn't need Oracle's extensibility architecture because its Java components run as Java stored procedures inside the Oracle standard JVM (Aurora)
Problem Chemical substructure or similarity searching is computationally expensive especially on a large dataset?
OrChem database structure
Example OrChem Queries Similarity search select * from table( orchem_simsearch.search( 'OC4=C(C(=C3OC(C)(COC=1C=CC(=CC=1)CC2C(=O)NC(=O)S 2)CCC3=C4C)C)C','SMILES',0.8,null,'N') ) ; Substructure search select orchem_subsearch.search(molfile,'MOL',50,'Y') from compounds where molregno=12345;
Fingerprint distribution
Parallel vs. Non parallel Performance of substructure search on 3.5 million compounds
Substructure benchmarking Performance of substructure search on 3.5 million compounds
Similarity Benchmarking
OrChem info Mailing list: