Wrapping analytical services for caBIG Taverna-caGrid technical review meeting Stian Soiland-Reyes, myGrid University of Manchester, UK 2009-01-23

Slides:



Advertisements
Similar presentations
SDM center All-hands breakout session notes March 2002 Gatlinburg TN.
Advertisements

European Bioinformatic Institute.
On line (DNA and amino acid) Sequence Information Lecture 7.
Static Structure: Process Description
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
CaGrid Service Metadata Scott Oster - Ohio State
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
The Protein Data Bank (PDB)
EBI is an Outstation of the European Molecular Biology Laboratory. Web Services Programmatic access to Life Sciences resources. Rodrigo Lopez.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
On line (DNA and amino acid) Sequence Information
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Wrapping third- party analytical services for caBIG Taverna-caBIG project Stian Soiland-Reyes Alexandra Nenadic University of Manchester, UK
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois Shannon Hastings Department of Biomedical Informatics Ohio State University.
Network Services for Biologists in the Genome Era The Work of the European Bioinformatics Institute.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Discover the UniProt Blast tool. Murcia, February, 2011Protein Sequence Databases Customize the BLAST results.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
CaBIG Workflow University of Chicago, USA University of Manchester, UK.
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
جلسه اول بیو انفورماتیک گردآوری:مسعود رسول آبادی
CaBIG ® VCDE Workspace Tactics thru June 14, 2010: How working groups fit together, and other activities Brian Davis April 1, 2010 VCDE WS Teleconference.
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
EMBRACE An example of Grid Integration (I): The EMBRACE project Jean SALZEMANN CNRS/IN2P3.
European Life Sciences Infrastructure for Biological Information META-pipe WP6 Kick-off Lars Ailo Bongo, ELIXIR-NO.
Protein and RNA Families
Technology behind using Taverna in caGrid caGrid user meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
EBI is an Outstation of the European Molecular Biology Laboratory. EBI patent related services Jennifer McDowall Senior Scientist, EMBL-EBI 3 rd Annual.
EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond.
Yazd University, Electrical and Computer Engineering Department Course Title: Advanced Software Engineering By: Mohammad Ali Zare Chahooki The Rational.
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
Rational Unified Process Fundamentals Module 4: Core Workflows II - Concepts Rational Unified Process Fundamentals Module 4: Core Workflows II - Concepts.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
Protein sequence databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen This also includes old material from my thesis
Copyright OpenHelix. No use or reproduction without express written consent1.
InterPro Sandra Orchard.
What is BLAST? Basic BLAST search What is BLAST?
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
CaBIG™ Annual Meeting Update June June 13, 2008.
Exploring Taverna engine Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.
BUSINESS SENSITIVE 1 SAAW - Sequence Annotation and Analysis Workshop Boyu Yang and Gene Godbold Battelle Memorial Institute, Charlottesville Operations.
Taverna allows you to automatically iterate through large data sets. This section introduces you to some of the more advanced configuration options for.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
What is BLAST? Basic BLAST search What is BLAST?
Basics of BLAST Basic BLAST Search - What is BLAST?
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Genome Center of Wisconsin, UW-Madison
Taverna workflow management system
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Wrapping analytical services for caBIG Taverna-caGrid technical review meeting Stian Soiland-Reyes, myGrid University of Manchester, UK

Agenda Project overview Primary goals Service selection Services identified Architecture Service outputs UML model Template workflow Work so far Implementation plan

Project overview Taverna caGrid cooperation Taverna workbench enhancements for caGrid Grid-enabling analytical services caGrid security support for Taverna This presentation deals with the analytical services

Primary goals Identify two publicly available analytical web services currently accessible through Taverna caGrid-enable the services; semantically described using caBIG’s infrastructure Demonstrate building of workflows combining the new services with existing caBIG services

Service selection Selected services in collaboration with the caGrid Workflow working group, lead by Juli Winners: NCBI Blast hosted by EBI InterProScan hosted by EBI

Why these services? Freely available Highly reliable, hosted by EBI Widely used by the scientific community Can be combined with existing caBIG tools in biologically meaningful workflows caBIO, GridPIR, etc.

Services identified NCBI Blast A popular similarity search tool using local sequence alignment Supports sequences of proteins, DNA, RNA Searches sequences in a whole range of databases SWISSPROT, UNIPROT, NCBI, EMBL, etc. SOAP web service hosted by EMBL-EBI

Services identified InterProScan Integrates various databases of protein domains and functional sites Searches using protein signature recognition methods SOAP web service hosted by EMBL-EBI

Architecture

Architecture as pseudo code class CaGridClient: def main(): endpointReference = wrappedService.invoke(inputs) endpointReference.subscribe() def resourcePropertyChanged(): outputs = endpointReference.getResourceProperty() print "Result", outputs class WrappedService: def invoke(inputs): convertedInputs = dataConverter.convertFromCaGrid(inputs) jobId = serviceInvoker.invoke(convertedInputs) endpointReference = new EndpointReference(jobId) return endpointReference def outputReturned(jobId, outputs): convertedOutputs = dataConverter.convertToCaGrid(outputs) endpointReference.setResourceProperty(convertedOutputs) class ServiceInvoker: def invoke(convertedInputs): jobId = originalService.invoke(convertedInputs) return jobId

Output InterProScan (Untranslated) xsi:noNamespaceSchemaLoca.. /Header> <protein id="unipro <interpro id="IPR008197" name="Whey acidic protein, 4-disulphide core" type="Domain" parent_id="IPR015874"> Molecular Function protease inhibitor activity <match id="G3DSA: " name="Whey_acidic_protein_4-diS_core" dbname="GENE3D"> <location start="77" end="128" score=" E-5" status="T" evidence="Gene3D" /> <location start="30" end="72" score=" E-5" status="T" evidence="HMMPfam" /> <location start="79" end="126" score=" E-14" status="T" evidence="HMMPfam" /> <interpro id="IPR008198" name="Proteinase inhibitor I17" type="Domain" parent_id="IPR008197">...

Output InterProScan (Untranslated) xsi:noNamespaceSchemaLoca.. /Header> <protein id="unipro <interpro id="IPR008197" name="Whey acidic protein, 4-disulphide core" type="Domain" parent_id="IPR015874"> Molecular Function protease inhibitor activity <match id="G3DSA: " name="Whey_acidic_protein_4-diS_core" dbname="GENE3D"> <location start="77" end="128" score=" E-5" status="T" evidence="Gene3D" /> <location start="30" end="72" score=" E-5" status="T" evidence="HMMPfam" /> <location start="79" end="126" score=" E-14" status="T" evidence="HMMPfam" /> <interpro id="IPR008198" name="Proteinase inhibitor I17" type="Domain" parent_id="IPR008197">...

UML model: wrapped InterproScan

UML model: wrapped NCBIBlast

Template workflow EBI_dbfetch_fetchBatch will be replaced with the caBIG service caBIO This workflow uses both NCBIBlast and InterproScan which will be replaced with the wrapped services

Work so far Identified services and example workflow Described services (Deliverable 3.2) Modelled service inputs and outputs in UML according to caGrid guidelines Still a few tweaks needed for WS-Resource usage Architecture and implementation plan for wrapping services (Deliverable 3.3) JavaDoc needs updating for WS-Resource

Implementation plan Generate Common Data Elements for inputs and outputs and verify Silver compatability Generate semantically annotated XMIs Submit Silver compatability review package Implement and deploy wrapped services Using Introduce and possibly gRavi Implement, test, deploy We’ll start with this before submitting CDEs Build caGrid-based workflow using services

Any questions..?