Integrating Chemistry Scholarship with Web Architectures, Grid Computing and Semantic Web Sashi Kiran Challa, Marlon Pierce, Suresh Marru Indiana University,

Slides:



Advertisements
Similar presentations
Building a Semantic IntraWeb with Rhizomer and a Wiki Roberto Garcia and Rosa Gil GRIHO (Human Computer Interaction Research Group) Universitat de Lleida,
Advertisements

Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Pulan Yu School of Informatics Indiana University Bloomington Web service based Varuna.Net.
Digital Repositories – Linked Open Data – the possible Role of D4Science Workshop, December 2010, FAO use cases A tool to create Linked Data providers.
S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
© S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.
Crystal Structure EPrints: Source Through the Open Archive Initiative S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge b. a School of Chemistry, University of Southampton, UK.; b School of Electronics.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
The CLARION Project for the Infrastructure for Integration in Structural Sciences (I2S2) mtg, Rutherford Labs, 11 th February 2010 CLARION – Chemical Laboratory.
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
Semantic Web Tools Vagan Terziyan Department of Mathematical Information Technology, University of Jyvaskyla ;
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice RDF and SOA David Booth, Ph.D. HP.
University of Southampton, U.K.
Interpret Application Specifications
Carl Lagoze, Cornell University Prasenjit Mitra, William Brouwer (Penn State University) Mark Borkum (University of Southampton)
AgriDrupal - a “suite of solutions” for agricultural information management and dissemination, built on the Drupal CMS; - the community of practice around.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Triple Stores.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Semantic Sensor/Device Description System EEEM042-Mobile Applications and Web Services Assignment- Spring Semester 2015 Prof. Klaus Moessner, Dr Payam.
Information Integration Intelligence with TopBraid Suite SemTech, San Jose, Holger Knublauch
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
Using the SAS® Information Delivery Portal
Universität Innsbruck Leopold Franzens  Copyright 2007 DERI Innsbruck EASAIER 18 Month Coordination Meeting, Tel Aviv, Israel WP 2 – Media.
Towards a Javascript CoG Kit Gregor von Laszewski Fugang Wang Marlon Pierce Gerald Guo
Software for Science Gateways: Open Grid Computing Environments Marlon Pierce, Suresh Marru Pervasive Technology Institute Indiana University
Developing Cyberinfrastructure to Support Computational Chemistry Workflows Marlon Pierce (IU), Suresh Marru (IU), Sudhakar Pamidighantam (NCSA) Sashikiran.
OGCE Workflow Suite GopiKandaswamy Suresh Marru SrinathPerera ChathuraHerath Marlon Pierce TeraGrid 2008.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
© Geodise Project, University of Southampton, Data Management in Geodise Zhuoan Jiao, Jasmin Wason and Marc Molinari
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Application portlets within the PROGRESS HPC Portal Michał Kosiedowski
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Holding slide prior to starting show. A Portlet Interface for Computational Electromagnetics on the Grid Maria Lin and David Walker Cardiff University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
WHIP - Workflow Hosted in Portals Kurt Mueller and Andrew Harrison School of Computer Science, Cardiff And Ian Taylor School of Computer Science, Cardiff.
Ipgdec5-01 Remarks on Web Services PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce, Shrideep Pallickara, Choonhan Youn Computer Science,
Technical Update 2008 Sandy Payette, Executive Director Eddie Shin, Senior Developer April 3, 2008 Open Repositories 2008, Fedora User Group.
OGCE Components for Enhancing UltraScan Job Management. Suresh Marru,Raminder Singh, Marlon Pierce.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Web Portal for Chemists M. Sterzel,
1 Web 2.0 and Grids for Scholarly Research Peking University July Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
RDF David R Newman 15 May 2009.
Taming the Big Data in Computational Chemistry #euroCRIS2015 Barcelona 9-11-XI-2015 Carles Bo ICIQ (BIST) -
A Technical Overview Bill Branan DuraCloud Technical Lead.
© Geodise Project, University of Southampton, Integrating Data Management into Engineering Applications Zhuoan Jiao, Jasmin.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Computational chemistry with ECCE on EGEE.
IU OREChem Summary Slides Marlon Pierce, Geoffrey Fox, Sashikiran Challa.
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
The AstroGrid-D Information Service Stellaris A central grid component to store, manage and transform metadata - and connect to the VO!
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Lecture Transforming Data: Using Apache Xalan to apply XSLT transformations Marc Dumontier Blueprint Initiative Samuel Lunenfeld Research Institute.
AstroGrid-D Host Monitoring in AstroGrid-D with GRAM-Audit or SGAS based on Usage Records Format S. Braune, F. Breitling, H. Enke AIP.
Session: Towards systematically curating and integrating
Triple Stores.
LEAD-VGrADS Day 1 Notes.
Middleware independent Information Service
Introduction an Open Source, Open Data international collaboration, based entirely in the internet started following a CECAM meeting in Zaragoza:
Triple Stores.
LOD reference architecture
Triple Stores.
Presentation transcript:

Integrating Chemistry Scholarship with Web Architectures, Grid Computing and Semantic Web Sashi Kiran Challa, Marlon Pierce, Suresh Marru Indiana University, Bloomington

Microsoft Research’s ORECHEM Project “A collaboration between chemistry scholars and information scientists to develop and deploy the infrastructure, services, and applications to enable new models for research and dissemination of scholarly materials in the chemistry community.” 2

OAI-ORE and ORE-Chem Open Archive Initiative – Object Reuse and Exchange defines standards for the description and exchange of aggregations of Web resources. based around the ORE-Model which introduces the Resource Map (ReM) that makes it possible to associate an identity with aggregations of resources and make assertions about their structure and semantics. ReMs are expressed in ATOM/XML, RDF/XML, n3, turtle formats. We want to use, extend this to describe all aspects of crystallography experiments – Publication links and metadata, data, 3

PSU NMR Spectra and Structural Data Experiment data Bibliographic metadata Citations Figures Tables Chunks Reactions Molecular Compounds Cambridge Indiana Workflows, TeraGrid services Triplestore On Azure Cloud Triplestore On Azure Cloud Southampton Carl Lagoze’s OreCHEM eScience Presentation Slides 4

Our Objective To build a pipeline to: Fetch ATOM feeds Transform ATOM feeds into triples and store them into a triple store ( Using GRDDL/Saxon HE) Extract Crystallographically obtained 3D coordinates information Submit compute intensive electronic structure calculations, geometry optimization tasks to tools like Gaussian09 on TeraGrid. Transform the Gaussian output into triples and store them into a triple store 5

Extract Moiety feeds in CML format Convert CML to Gaussian Input format Gaussian on TeraGrid Gaussian Output to RDF triples Triplestore ATOM Feeds from eCrystals or CrystalEye OREChem-Computation Workflow N3 files or RDF/XML 6 ImplementedYet to Implement From Partners Moiety files

RESTful Web services  REST is the way the Web already works.  URI for a resource.  HTTP GET/POST/PUT/DELETE  Very easy to build one using Java APIs (JAX-RS Jersey (server & client)) 7

Jersey public class @Produces("text/plain”) public String String @Produces("application/json") public JSONArray String num_entries){ } 8

ORECHEM REST Services Web serviceDescriptionInputOutput InChIExtractorExtracts InChIs by parsing the ATOM Feed entries ATOM feed URLString of InChI’s InChIto3DGenerates 3D coordinates of an InChI. (Open Babel) InChI string3D coordniates in CML format CML2GaussGenerates Gaussian input file. (Jumbo Converters) 3D coordinates (CML) Gaussian input file URL ATOM2RDFATOM to RDF/XML SAXON-XSLT (or GRDDL transformation) ATOM feed URLRDF/XML triples file URL RDFIntoVirtuosoPut the triples into Triple Store. (Jack-rabbit WEBDAV Client) POST RDF/XML triples file URL GRAPH IRI for SPARQL queries 9

ORECHEM REST Services Web serviceDescriptionInputOutput FeedsHarvest er Fetch the moiety feeds from Crystal Eye. (crystal-eye harvester) harvester name, number of feeds to be fetched URLs of the cml.xml files CML2Gaussia nSemCompCh em Generate Gaussian Input file. (Semantic Comp Chem) POST cml.xml file URL URL of the Gaussian Input file &numofentries=5 nerator 10

Testing Services public class JerseyClient{ public static void main(String[] args) { Client client = Client.create(); WebResource cml2gauss = client.resource ( " " + " + "/CML2GaussianSemCompChem/gauss/inputgenerator“ ); String cmlfileURL= " + "orechem/moieties/ic sup1_comp9_” + moiety_1.complete.cml.xml"; String gaussURL = cml2gauss.accept(MediaType.TEXT_PLAIN_TYPE,MediaType.APPLICATION_XML_T YPE).post(String.class,cmlfileURL); System.out.println(gaussURL); } 11 Jersey Client API

TeraGrid 12

13 OREChem Workflow in XBaya

Triple Store A triple store is framework used for storing and querying RDF data. It provides a mechanism for persistent storage and access of RDF graphs. Commercial: Allegrograph, BigOWLIM, Virtuoso Open Source: Jena SDB, Sesame, Virtuoso, Intellidimension 14

Virtuoso Triple Store ORDBMS extended into a Triple store. Command line loaders; isql utility (interactive sql access to a database) Support for SPARQL and web server to perform SPARQL queries Uploading of data over HTTP, WEBDAV browser. 15

What’s in Triple Store RDF Graph Experiments performed on a particular crystal Journal articles containing this crystal (research groups working with the crystal) Moieties in the crystal, their energies geometries, vibrational frequencies, etc. All this information in the triple store can be queried on, using a single GRAPH IRI. 16

GRAPH IRI : used to perform sparql query on the RDF triples. * Unique for every file uploaded. * A common GRAPH IRI for all the data uploaded into rdf_sink. (virt:rdf_graph, virt:rdf_sponger) Virtuoso Triple Store 17

Future Work Real future work (through Dec 2010) – Use OGCE workflow interpreter engine to run workflow as a service. – Integrate with simple visualization services (JMOL). – Store input and output URLs persistently in the triple store. Anticipating higher level services. – Better support for REST services in OGCE GFAC and XBaya Hopeful future work (next year) – Integrate with services from GridChem/ParamChem – Handle larger scale job submission – Develop a full gateway for public browsing and retrieval. – Investigate push-style publish/subscribe solutions for notifications. Great deal of JMS and Web Service experience with this, but very scalable REST messaging for RSS/Atom is coming Pubsubhubbub and Twitter live feeds for example. OGCE Messaging system prototyped with REST interfaces for small iPlant collaboration. 18

Come by the IU booth for more information on OGCE tools used here. – Mini-symposium: noon on Tuesday – Interactive presentations all week at the flat screen kiosk. – NCSA walkup demos: 1-2 PM on Wednesday Source code for our ORE-Chem services is available from SourceForge Contact: 19 More Information

Thank You 20

Future Work Google’s PubSubHubbub : As soon as a feed is published, hub notifies the subscriber. Thus get the new entry and start the pipeline. PublisherHubSubscriber 21

Questions ?? 22

ATOM to RDF/XML  GRDDL Transformation: (Jena GRDDL Reader) GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. atom-grddl.xsl - XSLT stylesheet GRDDLReader grddl=new GRDDLReader(); grddl.read (defaultmodel, atomfeedURL); GRDDL W3C documentation: 23

24 ORE Representation of an Aggregation of a Moiety in Turtle format

 Saxon XSLT Tranformation : ByteArrayOutputStream transformOutputStream = new ByteArrayOutputStream(); TransformerFactory factory = TransformerFactory. newInstance(); StreamSource xslSource = new StreamSource(xslstream); StreamSource xmlSource = new StreamSource(atomstream); StreamResult outResult = new StreamResult(transformOutputStream); Transformer transformer = factory.newTransformer(xslSource); transformer.transform(xmlSource, outResult); transformOutputStream.close(); ATOM to RDF/XML 25

OGCE-Workflow Suite Tools to wrap command-line applications as light weight web services, compose workflows from those web services and, execute and monitor the workflows. 1) GFAC : allows users to wrap any command-line application as a web service. 2) XRegistry :XRegistry is the information repository of the workflow suite enabling users to register, search and access application service and workflow deployment descriptions. 3) XBaya :Java webstart workflow composer. Used for composing workflows from web services created by the GFAC, and running and monitoring those workflows. Open Grid Computing Environments Wiki 26

27

Experiments, Protocols ??? (Experimental Data) Who ? Where ? When ? (Bibliographic Data) Moieties’, their energies, latent heats of fusion, vibrational frequencies ? (Molecular Properties,etc) 28

Microsoft Research’s ORECHEM Project “A collaboration between chemistry scholars and information scientists to develop and deploy the infrastructure, services, and applications to enable new models for research and dissemination of scholarly materials in the chemistry community.” 29

30 ORE representation of a Resource Map in Turtle format

31 Gaussian Input File

32 Moiety and its 3D co- ordinates. every atom & it’s X,Y,Z co-ordinates. bond order, Smiles & InChI representations Currently ~30000 moieties in Crystal Eye Repository

OGCE-Workflow Suite OGCE Workflow Toolkit for Multi-Disciplinary Science Applications, Suresh Marru’s Presentation. 33

XBaya Workflow Composer 34

Acknowledgements Dr. Marlon Pierce Assistant Director, Community Grid Labs, Pervasive Technology Institute, Indiana University Dr. David J.Wild Assistant Professor of Informatics & Computing Director of Cheminformatics Program School of Informatics and Computing, Indiana University Suresh Marru Research Scientist, Pervasive Technology Institute, Indiana University Orechem Group : Dr. Carl Lagoze (Cornell University), Dr. Peter Murray Rust, Nick Day, Jim Downing (University of Cambridge), Mark Borkum (University of Southampton), Na Li (Penn State), Alex, Lee Dirks (Microsoft Research) Jaliya Ekanayake, Scott Beason, All the members in Pervasive Technology Institute 35

Future Work Wrap the tool that generates triples from gaussian output, into a REST service. Install Virtuoso triple store on the Azure cloud. Fetch & process the feeds from Southampton, Penn State. 36

37 Moiety and its 3D co- ordinates. every atom & it’s X,Y,Z co-ordinates. bond order, Smiles & InChI representations Currently ~30000 moieties in Crystal Eye Repository

38 ORE representation of a Resource Map in Turtle format

Virtuoso Triple Store Implementing a SPARQL compliant RDF Triple Store using a SQL-ORDBMS. Windows and Linux versions are installed and tested. Currently Linux version being used. Conductor: Sparql endpoint : 39