Download presentation
Presentation is loading. Please wait.
Published byWilfred Newman Modified over 9 years ago
1
Chemical Informatics and Cyber- infrastructure Building Blocks Chemical Informatics Resources: Deluge of experimental data > 100,000 compounds screened by 10 publicly funded high throughput screening centers using various assay techniques (molecular to cellular) Molecular Libraries Screening Center Network Chemical databases maintained by various groups NIH PubChem, NIH DTP Chemical informatics and computational chemistry Data clustering, data mining, descriptor calculations, toxicity prediction, docking, molecular modeling, and quantum chemistry Visualization tools Web resources: journal articles, etc. A Chemical Informatics Grid will need to integrate these into a common, loosely coupled, open, distributed computing environment.
2
Our Solution Stack Domain specific Web Services VOTables, CDK services Grid services, Cyber- infrastructure for computationally intensive applications. Clustering, quantum chemistry Workflow and service management We work with Taverna Many solutions: Kepler, BPEL engines, etc. Portlets and other user interfaces Rich desktop apps Ubiquitous clients Portals and Other User Interfaces Workflow and Service Management Web and Grid Services Each level is subject for research and development, as is their integration.
3
Wrapping Science Applications as Services Science Grid services typically must wrap legacy applications written in C or Fortran. You must handle such problems as Specifying several input and output files These may need to be staged in Launching executables and monitoring their progress. Specifying environment variables Often these have also shell scripts to do some miscellaneous tasks. How do you convert this to WSDL? Or (equivalently) how do you automatically generate the XML job description for WS-GRAM?
4
Flow Chart of SMILES to Cluster Partitioned of BCI Web Service SMILE String Makebits Dictionary (Default) Fingerprint (*.scn) DivKmeans Cluster Hierarchy (*.dkm) OptclusRNNclus One Column Process Merge Process Extracted Cluster Hierarchy (*.clu) New SMILE String Generating Fingerprints Clustering Fingerprints Generating the best levels SMILES to DKM Extracting individual cluster partitions best level
5
BCI Clustering Service Methods Service MethodDescription InputOutput makebitsGenerateGenerate fingerprints from a SMILES structure SMIstringFingerprint string divkmGenerateCluster fingerprints with Divkmeans SCNstringClustered Hierarchy smile2dkmMakebits + divkmSMIstringClustered Hierarchy optclusGenerateGenerate the best levels in a hierarchy DKMstringBest partition cluster level rnnclusGenerateExtract individual cluster partitions DKMstringIndiv. cluster partitions smile2ClusterPartiti oned Generate a new SMILES structure w/ extra col. SMIstringNew SMILES structure
6
Submitting Applications with Condor We are working to use Condor-G as a simple bridge to the NSF’s TeraGrid for job submission. Condor has a Web Service interface (called BirdBath) that we are using to construct Java portlets. We are investigating how to construct Condor classads using GPIR. Required for Condor matchmaking But no facility for this built in to the TeraGrid.
7
Condor Master Condor Condor Only Condor-G and Globus (Portal) Client Condor -G LSFPBS TeraGrid Globus TeraGrid Globus (Portal) Client
8
VOTables: Handling Tabular Data Developed by the Virtual Observatory community for encoding astronomy data. The VOTable format is an XML representation of the tabular data (data coming from BCI, NIH DTP databases, and so on). VOTables-compatible tools have been built We just inherit them. SAVOT and JAVOT JAVA Parser APIs for VOTable allow us to easily build VOTable-based applications Web Services Spread sheet Plotting applications. VOPlot and TopCat are two
9
mrtd1.txt – smiles representation of chemical compounds along with its properties
10
Votable.xml : xml representation of mrtd1.txt file
11
VOPlot Application from generated votable.xml file : Graph plotted on Mass (X–axis) and PSA (Y-axis)
12
More Services: WWMM Services ServicesDescriptionsInputOutput InChIGoogleSearch an InChI structure through Google inchiBasic type Search result in HTML format InChIServerGenerate InChIversion format An InChI structure OpenBabelS erver Transform a chemical format to another using Open Babel format inputData outputData options Converted chemical structure string CMLRSSSer ver Generate CMLRSS feed from CML data mol, title description link, source Converted CMLRSS feed of CML data
13
CDK-Based Services Common Substructure Calculates the common substructure between two molecules. CDKsimTakes two SMILES and evaluates the Tanimoto coefficient (ratio of intersection to union of their fingerprints). CDKdescCalculates a variety of molecular and atomic descriptors for QSAR modeling CDKwsFingerprint generation CDKsdgCreates a jpeg of the compound’s 2D structure CDKStruct3DGenerates 3D coordinates of a molecule from its SMILE
14
ToxTree Service The Threshold of Toxicological Concern (TTC) establishes a level of exposure for all chemicals below which there would be no appreciable risk to human health. ToxTree implements the Cramer Decision Tree approach to estimate TTC. We have converted this into a service. Uses SMILES as input. Note the GUI must be separated from the library to be a service http://ecb.jrc.it/QSAR/home.php?CONTENU=/QSAR/qsar_tools/qsar_tools_toxtree.php
15
OSCAR3 Service Oscar3 is a tool for shallow, chemistry-specific natural language parsing of chemical documents (i.e. journal articles). It identifies (or attempts to identify): Chemical names: singular nouns, plurals, verbs etc., also formulae and acronyms. Chemical data: Spectra, melting/boiling point, yield etc. in experimental sections. Other entities: Things like N(5)-C(3) and so on. Results are exported as an XML file. There is a larger effort, SciBorg, in this area http://www.cl.cam.ac.uk/~aac10/escience/sciborg.html It also has potentially very interesting Workflows http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Oscar3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.