Copyright Elsevier MDL 2007 Present and future of informatics in chemistry Symposium in Honor of Gary Wiggins Division of Chemical Information 223 rd ACS National Meeting, Chicago Phil McHale Elsevier MDL 25 March 2007
Copyright Elsevier MDL Outline Informatics in chemistry? Where have we got to? What can we do now? What’s left to do? Where are we going?
Copyright Elsevier MDL Informatics in chemistry? Cheminformatics vs. Chemoinformatics Structure representation Information acquisition Information management Information use
Copyright Elsevier MDL This Awful Neologism …. Date: Fri, 17 Oct 1997 From: Wendy Warr Subject: Re: Cheminformatics/Two new refs. I wonder if any of the sources define this awful neologism ("chemoinformatics" or "cheminformatics"). Does it really differ from "chemical information" or "computational chemistry". As I have said before, I suspect that it is merely an image-enhancing name for some practitioners of computational chemistry.
Copyright Elsevier MDL O or X 2 O? Data copyrighted (C) by Molinspiration Cheminformatics.
Copyright Elsevier MDL The Building Blocks Molecules – 2D, 3D, stereoisomers, conformers, polymers, mixtures, formulations, sequences, combichem libraries, virtual libraries, Markush…. Reactions – reagents, products, catalysts, solvents, reacting centers, transition states, metabolic pathways …. Nomenclature, fragment codes, line notations, graphics, file formats
Copyright Elsevier MDL Representing Chemistry: Benzene? Connection table: Benzene -ISIS D V C C C C C C M END Benzene ID #: MUSE CAS #: Other Names: Benzol Cyclohexa-1,3,5-triene Line notation Wiswesser:RH MDL SMILES:c1ccccc1 InChIInChI=1/C6H6/c /h1-6H
Copyright Elsevier MDL A Previous UI
Copyright Elsevier MDL But have we really progressed? Subject:Re: Beilstein R-groups From:Dana Roth [log in to unmask] Reply-To:CHEMICAL INFORMATION SOURCES DISCUSSION LIST [log in to unmask] Date:Fri, 16 Mar :57: Content-Type:text/plain Howard: we are still teaching v.6 since most people here are using MACs. From my little experience with v.7, it appears that the structure editor is the same. I just followed these instructions (which I borrowed many years ago from Andrea Twiss-Brooks) in v.7 and it works fine. ================= Creating User Defined Groups and Atom Lists Atoms: Click on the atom in the structure, which needs to be variable. Type 'A1' in the Atom Box and click OK to make the change. Next, click the 'An' button in the Tool Box (left side), and the 'Atom List Number' box will appear. Click OK to display a 'Define Atom List A1' periodic table. Click as many elements or element groups as needed and click OK. A list of the all the selected atoms will appear in the Structure Editor window. Groups: Click the atom, which will be the variable group in the structure. Type 'G1' in the Atom Box and click OK to effect the change. Next, draw a group in the Structure Editor window, 'Select' a group structure (i.e. by double clicking an atom or bond with the select tool) and click the 'Gn' button in the tool box. Set G=1 and click OK. Repeat for additional groups. One atom in each group must be designated as the attachment point. Click on this atom (with the Edit tool), to display the 'Atom Attributes box. Click 'Set User Defined' and then click 'Attachments'. Click '1' in the 'Attachment Points' box and click OK (in that box). Then click OK in the 'Atom Attributes' box. After drawing the structure, click on the Crossed Red Arrows à Beilstein Commander.
Copyright Elsevier MDL Information Acquisition: Structure tools and presentation Structure drawing Name structure converters Virtual chemistry – de novo structure generation, enumeration Chemical OCR: dead structure live structure Text mining: text structure Renderers - on screen, in print, within applications, 2D, 3D, shapes, animations
Copyright Elsevier MDL Data Management Structure storage systems – online, in-house, local, distributed, open, closed, proprietary systems, Oracle cartridges Registration, novelty check, definitions, business rules Search systems Molecules, reactions 2D, 3D, conformations Exact, substructure, similarity, fuzzy, shape, property-based, pharmacophores Pre/Post-search processing – fingerprints, clustering, filtering, diversity analysis Performance and scalability – virtual chemistry
Copyright Elsevier MDL Information Use: What we can do now “Publish” information in lab notebooks, databases, reports, papers, patents Detect, analyze and harvest structures and reactions from printed materials Create, maintain, publish and link to databases Search, browse and analyze structures and reactions in databases and documents Link structures with their properties and with other disciplines – pathways, proteins, genes Virtual chemistry and sceening Predict/calculate properties, activity, reactivity, drug-likeness Render, share and communicate Collaborate and reuse
Copyright Elsevier MDL Sample workflows Finding out what’s known about a molecule Exploring possible synthetic routes to a target molecule Assessing metabolic and toxic liabilities and outcomes
Copyright Elsevier MDL Search MDL Compound Index
Copyright Elsevier MDL Links to all indexed content
Copyright Elsevier MDL Links to all indexed content
Copyright Elsevier MDL Links to all indexed content
Copyright Elsevier MDL Links to all indexed content
Copyright Elsevier MDL Links to all indexed content
Copyright Elsevier MDL Exploring Possible Syntheses
Copyright Elsevier MDL Evaluating Metabolic and Toxic Liabilities From one parent in MDL Metabolite From another parent in MDL Metabolite From Corporate Database Link to Toxicity Transformation Details
Copyright Elsevier MDL Evaluating Toxicity Information Link to Toxicity
Copyright Elsevier MDL What’s left to do? Structure Representation Generic structures and patents More stereochemistry Organometallics, composites, stuff Biomolecules Transition states, reaction mechanisms, pathways Information Acquisition Authoring tools Annotation - semantics Web 2.0 – social networking, wikis
Copyright Elsevier MDL What else is left to do? Information Management Integration Performance Timeliness Accessibility Portability Information Use Better predictors: activity, ADMET, reactivity Better virtual screening Presenting QSAR results that chemists can act on Capturing and automating intellectual processes: synthesis design Knowledge extraction, inference generation
Copyright Elsevier MDL Where are we going? Automated data capture and indexing Papers, patents, theses …. Robust predictors and inference generators Blurring of boundaries Internal and external information Text and structures Publications and databases Small molecules and -omics Mash ups in cranio >> in silico >> in vitro
Copyright Elsevier MDL Thanks Gary