Bringing cheminformatics toolkits into tune May 2011 Molecular Informatics Open Source Software EMBL-EBI, Cambridge, UK Noel M. O’Boyle OpenBabel.

Slides:



Advertisements
Similar presentations
Dynamic web application for drug design research M. Chapman 1, N. MacCuish 1, J. MacCuish 1 J. Bradley 2, J. Blankley 3 1 Mesa Analytics & Computing, Inc.,
Advertisements

Scientific & technical presentation JChem Cartridge for Oracle
Integrating ChemAxon technology into your End User Applications Java solutions for cheminformatics Ver. Mar., 2005.
Java Solutions for Cheminformatics Feb 2008 Whats new for PP.
SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
ChemAxon in 3D Gábor Imre, Adrián Kalászi and Miklós Vargyas Solutions for Cheminformatics.
Instant JChem - current status and what's coming soon. Tim Dudgeon Solutions for Cheminformatics.
2008 Accelrys EUGM Pipelining ChemAxon Szilard Dorant Solutions for Cheminformatics.
Pulan Yu School of Informatics Indiana University Bloomington Web service based Varuna.Net.
Dr. Matthew Wright Product Director.
Web Toolkit Julie George & Ronald Lopez 1. Requirements  Java SDK version 1.5 or later  Apache Ant is also necessary to run command line arguments 
Introduction To Java Objectives For Today â Introduction To Java â The Java Platform & The (JVM) Java Virtual Machine â Core Java (API) Application Programming.
The Web Warrior Guide to Web Design Technologies
What is a Programming Language? The computer operates using binary numbers. The computer only knows about 1’s and 0’s. Humans can also use 1’s and 0’s,
1 GWT Google Web Toolkit Build AJAX apps in the Java language
Improving the quality of chemical databases with community-developed tools (and vice versa) Aug th Meeting on U.S. Government Chemical Databases.
Cheminformatics II Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
Building bridges for chemical information Interoperability and the Blue Obelisk Noel M. O’Boyle, et a lot of al The Blue Obelisk is a group of people and.
Scientific Workflows Systems : In Drug discovery informatics Presented By: Tumbi Muhammad Khaled 3 rd Semester Department of Pharmacoinformatics.
Jmol virtual model kit: An entirely new way to build and explore molecular structures Robert M. Hanson Lexington Section, American Chemical Society Centre.
CoolMolecules: A Molecular Structure Explorer Robert M. Hanson, Melanie Casavant, Michael McGuan.
HTML Recall that HTML is static in that it describes how a page is to be displayed, but it doesn’t provide for interaction or animation. A page created.
Programming Introduction November 9 Unit 7. What is Programming? Besides being a huge industry? Programming is the process used to write computer programs.
Jmol virtual model kit: An entirely new way to build and explore molecular structures Robert M. Hanson, Otis Rothenberger, Thomas Newton 241 st National.
ANDROID PROGRAMMING MODULE 1 – GETTING STARTED
SQL Server Reporting Services
Your Interactive Guide to the Digital World Discovering Computers 2012.
Arc: Programming Options Dr Andy Evans. Programming ArcGIS ArcGIS: Most popular commercial GIS. Out of the box functionality good, but occasionally: You.
Lab 8 – C# Programming Adding two numbers CSCI 6303 – Principles of I.T. Dr. Abraham Fall 2012.
Linux Operations and Administration
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
RMG Study Group Session I: Git, Sphinx, webRMG Connie Gao 9/20/
Lecture 8 – Platform as a Service. Introduction We have discussed the SPI model of Cloud Computing – IaaS – PaaS – SaaS.
Lesley Bross, August 29, 2010 ArcGIS 10 add-in glossary.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
COMPUTER PROGRAMMING Source: Computing Concepts (the I-series) by Haag, Cummings, and Rhea, McGraw-Hill/Irwin, 2002.
Silverlight Hitesh Trivedi Computer Science B.Tech A-Sec J.I.E.T.
AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria.
SDF File analysis Creation, composition, checking.
May 2009 ChemAxon - What’s New?. What’s new and hot? All products have seen enhancements in the past 12 months BUT WHAT’S REALLY HOT?
Programming for Geographical Information Analysis: Advanced Skills Lecture 1: Introduction Programming Arc Dr Andy Evans.
AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria.
Python From the book “Think Python”
Custom Spotfire Applications for use in Drug Discovery Chris Louer Team Leader, Cheminformatics © 2001, GlaxoSmithKline, Inc. - All Rights Reserved.
Kinemage; Rasmol and Chime C371 Chemical Informatics.
Open source software and web services for designing therapeutic molecules G. P. S. Raghava, Head Bioinformatics Centre, Institute of Microbial Technology,
The Red Pill Roger Sayle, Geoff Skillman, Matthew Stahl Robert Tolbert OpenEye Scientific Software.
Python and Chemical Informatics The Daylight and OpenEye toolkits, part II Presented by Andrew Dalke, Dalke Scientific Software for David Wild’s I590 course.
CS 4720 Dynamic Web Applications CS 4720 – Web & Mobile Systems.
Eclipse 24-Apr-17.
EBI is an Outstation of the European Molecular Biology Laboratory. MSDchem and the chemistry of the wwPDB EMBO 22nd-26th September 2008 EMBL-EBI Hinxton.
Chapter One An Introduction to Programming and Visual Basic.
Planning the official release of RMG-Py issues to resolve and issues to put off Connie Gao 4/11/2014 RMG Study Group.
TOPIC 7.0 LINUX SERVICES AND CONFIGURATION. ROOT USER Root user is called “super user” because it has power far beyond those of mortal user. As root,
JavaScript 101 Introduction to Programming. Topics What is programming? The common elements found in most programming languages Introduction to JavaScript.
PREPARED BY: PN. SITI HADIJAH BINTI NORSANI. LEARNING OUTCOMES: Upon completion of this course, students should be able to: 1. Understand the structure.
ITP 109 Week 2 Trina Gregory Introduction to Java.
HELM 2.0 Toolkit Code Orientation. HELM 2.0 Package overview 2 HELM2NotationToolkit ChemistryToolkit ChemistryToolkitMarvinChemistryToolkitCDK HELMNotationParser.
4000 Imaje 4020 – Software Imaje 4020 – Content ■ Content of Chapter Software: 1. Flash Up 2. Netcenter 3. FTP 4. Active X 5. XCL commands 6. Exercise.
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Smart Calendar Chrome Extension v Dec. 28, 2010 Kyoungryol Kim 1.
FESR Consorzio COMETA - Progetto PI2S2 Molecular Modelling Applications Laura Giurato Gruppo di Modellistica Molecolare (Prof.
XP Creating Web Pages with Microsoft Office
Arklio Studija 2007 File: / / Page 1 Automated web application testing using Selenium
1 January 14, Evaluating Open Source Software William Cohen NCSU CSC 591W January 14, 2008 Based on David Wheeler, “How to Evaluate Open Source.
Development Environment
CST 1101 Problem Solving Using Computers
An Introduction to JavaScript
Input and Output Python3 Beginner #3.
Programming Logic and Design Eighth Edition
Presentation transcript:

Bringing cheminformatics toolkits into tune May 2011 Molecular Informatics Open Source Software EMBL-EBI, Cambridge, UK Noel M. O’Boyle OpenBabel

Toolkits, toolkits and more toolkits Commercial cheminformatics toolkits:

Toolkits, toolkits and more toolkits OpenBabel PerlMol OASA CDK Open Source cheminformatics toolkits:

The importance of being interoperable Good for users –Can take advantage of complementary features CDK: Gasteiger π charges, maximal common substructure, shape similarity with ultrafast shape descriptors, mass-spectrometry analysis RDKit: RECAP fragmentation, calculation of R/S, atom pair fingerprints, shape similarity with volume overlap OpenBabel: several forcefields, crystallography, large number of file formats, conformer searching, InChIKey

The importance of being interoperable Good for users –Can take advantage of complementary features –Can choose between different implementations Faster SMARTS searching, better 2D depiction, more accurate 3D structure generation –Avoid vendor lock-in Good for developers –Less reinvention of wheel, more time to spend on development of complementary features –Avoid balkanisation of field –Bigger pool of users

J. Chem. Inf. Model., 2006, 46, 991http://

J. Chem. Inf. Model., 2006, 46, 991http://

Bringing it all together with Cinfony Different languages –Java (CDK, OPSIN), C++ (Open Babel, RDKit, Indigo) –Use Python, a higher-level language that can bridge to both Different APIs –Each toolkit uses different commands to carry out the same tasks –Implement a common API Different chemical models –Different internal representation of a molecule –Use existing method for storage and transfer of chemical information: chemical file formats MDL mol file for 2D and 3D, SMILES for 0D

Cinfony API

One API to rule them all mol = openbabel.OBMol() obconversion = openbabel.OBConversion() obconversion.SetInFormat("smi") obconversion.ReadString(mol, SMILESstring) builder = cdk.DefaultChemObjectBuilder.getInstance() sp = cdk.smiles.SmilesParser(builder) mol = sp.parseSmiles(SMILESstring) mol = Chem.MolFromSmiles(SMILESstring) Example - create a Molecule from a SMILES string: OpenBabel CDK RDKit mol = toolkit.readstring("smi", SMILESstring) where toolkit is either obabel, cdk, indy or rdk mol = Indigo.loadMolecule(SMILESstring) Indigo

Design of Cinfony API API is small (“fits your brain”) Covers core functionality of toolkits –Corollary: need to access underlying toolkit for additional functionality Makes it easy to carry out common tasks API is stable Make it easy to find relevant methods –Example: add hydrogens to a molecule atommanip = cdk.tools.manipulator.AtomContainerManipulator atommanip.convertImplicitToExplicitHydrogens(molecule) CDK molecule.addh()

cinfony.toolkit ClassesPurpose MoleculeWraps Molecule objects, and provides methods that act on molecules AtomWraps Atom objects in the underlying toolkit OutputfileHandle multimolecule output files FingerprintBinary fingerprints, and calculating similarity SmartsSMARTS searching MoleculeDataProvide dictionary access to the tag fields of SDF and MOL2 files Functions readfileRead Molecules from a file readstringRead a Molecule from a string Variables descsA list of available descriptors forcefieldsA list of available forcefields fpsA list of available fingerprints informatsA list of input formats outformatsA list of output formats ob, cdk, indigo, etc.Direct access to the underlying library

cinfony.toolkit.Molecule AttributesPurpose atomsA list of atoms in the Molecule dataA dictionary of data items (SD file tags) formulaMolecular formula molwtMolecular weight titleTitle Functions addhAdd hydrogens calcdescCalculate descriptor values calcfpCalculate a molecular fingerprint drawCreate a 2D depiction localoptOptimize the coordinates using a forcefield make3DGenerate 3D coordinates removehRemove hydrogens writeWrite a molecule to a file or string

Examples of use Chemistry Toolkit Rosetta Andrew Dalke

Combining toolkits >>> from cinfony import rdk, cdk, obabel >>> obabelmol = obabel.readstring("smi", "CCC") >>> rdkmol = rdk.Molecule(obabelmol) >>> rdkmol.draw(show=False, filename="propane.png") >>> print cdk.Molecule(rdkmol).calcdesc() {'chi0C': , 'BCUT.4': , 'rotatableBondsCount': 2, 'mde.9': 0.0, 'mde.8': 0.0,... } 1.Import Cinfony 2.Read in a molecule from a SMILES string with Open Babel 3.Convert it to an RDKit Molecule 4.Create a 2D depiction of the molecule with RDKit 5.Convert it to a CDK Molecule and calculate descriptor values

Comparing toolkits >>> from cinfony import rdk, cdk, obabel, indy, webel >>> for toolkit in [rdk, cdk, obabel, indy, webel]:... mol = toolkit.readstring("smi", "CCC")... print mol.molwt... mol.draw(filename="%s.png" % toolkit.__name__) 1.Import Cinfony 2.For each toolkit Read in a molecule from a SMILES string Print its molecular weight Create a 2D depiction Useful for sanity checks, identifying limitations, bugs –Calculating the molecular weight ( implicit hydrogen, isotopes –Comparison of descriptor values ( Should be highly correlated –Comparison of depictions (

Cinfony and the Web

Webel - Chemistry for Web 2.0 Webel is a Cinfony module that runs entirely using web services –CDK webservices by Rajarshi Guha, hosted by Ola Spjuth at Uppsala University –NCI/CADD Chemical Identifier Resolver by Markus Sitzmann (uses Cactvs for much of backend) Easy to install – no dependencies Can be used in environments where installing a cheminformatics toolkit is not possible Web services may provide additional services not available elsewhere Example: how similar is aspirin to Dr. Scholl’s Wart Remover Kit? >>> from cinfony import webel >>> aspirin = webel.readstring("name", "aspirin") >>> wartremover = webel.readstring("name",... "Dr. Scholl’s Wart Remover Kit") >>> print aspirin.calcfp() | wartremover.calcfp()

Webel - Chemistry for Web 2.0 Webel is a Cinfony module that runs entirely using web services –CDK webservices by Rajarshi Guha, hosted by Ola Spjuth at Uppsala University –NCI/CADD Chemical Identifier Resolver by Markus Sitzmann (uses Cactvs for much of backend) Easy to install – no dependencies Can be used in environments where installing a cheminformatics toolkit is not possible Web services may provide additional services not available elsewhere Example: how similar is aspirin to Dr. Scholl’s Wart Remover Kit? >>> from cinfony import webel >>> aspirin = webel.readstring("name", "aspirin") >>> wartremover = webel.readstring("name",... "Dr. Scholl’s Wart Remover Kit") >>> print aspirin.calcfp() | wartremover.calcfp()

Cheminformatics in the browser See or just Google “webel silverlight”

makes it easy to... Start using a new toolkit Carry out common tasks Combine functionality from different toolkits Compare results from different toolkits Do cheminformatics through the web, and on the web

Food for thought Inclusion of cheminformatics toolkits in Linux distributions –“apt-get install cinfony” –DebiChem can help Binary versions for Linux API stability – and associated version numbering –Needed to handle dependencies –“Sorry - This version of Cinfony will work only with the 1.2.x series of Toolkit Y” What other toolkits or functionality should Cinfony support? Would be nice if various toolkits promoted Cinfony –Even nicer if they ran the test suite and fixed problems, and added in new features (new fps, etc.)! Using Cinfony, it’s easy for toolkits to test against other toolkits –Quality Control RDKit - Java bindings on Windows Licensing of Cinfony’s components –Related point: Science is BSD Let’s support Python 3 already

Bringing cheminformatics toolkits into tune Acknowledgements CDK: Egon Willighagen, Rajarshi Guha Open Babel: Chris Morley, Tim Vandermeersch RDKit: Greg Landrum Indigo: Dmitry Pavlov OASA: Beda Kosata OPSIN: Daniel Lowe JPype: Steve Ménard Chemical Identifier Resolver: Markus Sitzmann Interactive Tutorial: Michael Foord Image: Tintin44 (Flickr) Chem. Cent. J., 2008, 2, 24.

Cheminformatics in the browser As Webel is pure Python, it can run places where traditional cheminformatics software cannot... –...such as in a web browser Microsoft have developed a browser plugin called Silverlight for developing applications for the web –It includes a Python interpreter (IronPython) So you can use Webel in Silverlight applications Michael Foord has developed an interactive Python tutorial using Silverlight –See I have combined this with Webel to develop an interactive Cheminformatics tutorial

Performance

import this user:~$ cd apps/cinfony user:~/apps/cinfony$./myjython.sh Jython (Release_2_5_2:7206, Mar ) >>> from cinfony import cdk, indy, opsin, webel >>> See API and “How to Use” at VirtualBox, Double click on MIOSS Applications/Accessories/Terminal