CaBIG Workflow University of Chicago, USA University of Manchester, UK.

Slides:



Advertisements
Similar presentations
Web Services & EAI.
Advertisements

TeraGrid's GRAM Auditing & Accounting, & its Integration with the LEAD Science Gateway Stuart Martin Computation Institute, University of Chicago & Argonne.
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Open Grid Forum 19 January 31, 2007 Chapel Hill, NC Stephen Langella Ohio State University Grid Authentication and Authorization with.
Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University.
CACORE TOOLS FEATURES. caCORE SDK Features caCORE Workbench Plugin EA/ArgoUML Plug-in development Integrated support of semantic integration in the plugin.
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
CaBIG™ Terminology Services Path to Grid Enablement Thomas Johnson 1, Scott Bauer 1, Kevin Peterson 1, Christopher Chute 1, Johnita Beasley 2, Frank Hartel.
Building Scientific Workflows with Taverna and BPEL: a Comparative Study in caGrid Wei Tan 1, Paolo Missier 2, Ravi Madduri 1, Ian Foster 1 1 University.
GeWorkbench caGrid TeraGrid Integration Scott Oster Ohio State University – Dept. of Biomedical Informatics Christine Hung Columbia University – JCSB/C2B2.
CaGrid Service Metadata Scott Oster - Ohio State
Using UPT: Set up Application & Create caArray Users Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute.
Technical Introduction to caGrid Service Development caGrid 1.3 Justin Permar caGrid Knowledge Center
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Wrapping third- party analytical services for caBIG Taverna-caBIG project Stian Soiland-Reyes Alexandra Nenadic University of Manchester, UK
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
State of Service Oriented Science Tools Open Source Grid Cluster Conference Oakland.
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois Shannon Hastings Department of Biomedical Informatics Ohio State University.
WaveMaker Visual AJAX Studio 4.0 Training Authentication.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
Towards a Javascript CoG Kit Gregor von Laszewski Fugang Wang Marlon Pierce Gerald Guo
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
The ACGT Workflow Editing & Enactment Environment Giorgos Zacharioudakis Institute of Computer Science, Foundation for Research & Technology – Hellas (ICS-FORTH)
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
Middleware Support for Virtual Organizations Internet 2 Fall 2006 Member Meeting Chicago, Illinois Stephen Langella Department of.
Nadir Saghar, Tony Pan, Ashish Sharma REST for Data Services.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
1 Sergio Maffioletti Grid Computing Competence Center GC3 University of Zurich Swiss Grid School 2012 Develop High Throughput.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Shannon Hastings Multiscale Computing Laboratory Department of Biomedical Informatics.
Introduce Grid Service Authoring Toolkit Shannon Hastings, Scott Oster, Stephen Langella, David Ervin Ohio State University Software Research Institute.
Wrapping Scientific Applications As Web Services Using The Opal Toolkit Wrapping Scientific Applications As Web Services Using The Opal Toolkit Sriram.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Technology behind using Taverna in caGrid caGrid user meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
CaGrid Overview and Core Services caGrid Knowledge Center February 2011.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
1 Cancer Models Database (caMOD). 2 History  January 2000 – Prototype is presented during the Mouse Models of Human Cancers (MMHCC) Steering Committee.
6 February 2009 ©2009 Cesare Pautasso | 1 JOpera and XtremWeb-CH in the Virtual EZ-Grid Cesare Pautasso Faculty of Informatics University.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
Introduction to Taverna Online and Interaction service Aleksandra Pawlik University of Manchester.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
A PPARC funded project Common Execution Architecture Paul Harrison IVOA Interoperability Meeting Cambridge MA May 2004.
UCL DEPARTMENT OF SPACE AND CLIMATE PHYSICS MULLARD SPACE SCIENCE LABORATORY Taverna Plugin VAMDC and HELIO (part of the ‘taverna-astronomy’ edition) Kevin.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Wrapping analytical services for caBIG Taverna-caGrid technical review meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
Grid Rapid Application Virtualization Interface (gRAVI) - Service Oriented Science Ravi K Madduri, Argonne National Laboratory/ University of Chicago Joshua.
July 28, 2004WSRF Technical Committee F2F meeting1 WSRP leveraging WSRF Use case for Portlets as WS-Resources.
CaBIG™ Terminology Services Path to Grid Enablement Thomas Johnson 1, Scott Bauer 1, Kevin Peterson 1, Christopher Chute 1, Johnita Beasley 2, Frank Hartel.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
CaGrid Workflow Examples Wei Tan, Ravi Madduri University of Chicago {wtan,
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
CaBIO iPhone App Konrad Rokicki SAIC. Why a native app? Current web UIs are cumbersome to use from a mobile device This could be addressed by developing.
December, 2006 ws-VLAM Workflow Management System a Re-factoring of VLAM Dmitry Vasyunin Adianto Wibisono Adam Belloum.
Exploring Taverna engine Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.
National Cancer Institute caDSR Briefing for Small Scale Harmonication Project Denise Warzel Associate Director, Core Infrastructure caCORE Product Line.
0 caBIG and caGrid: Interoperable Computing Infrastructure for the Nation’s [and World’s] Cancer Research Enterprise Peter A. Covitz, Ph.D. Chief Operating.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Portlet Development Konrad Rokicki (SAIC) Manav Kher (SemanticBits) Joshua Phillips (SemanticBits) Arch/VCDE F2F November 28, 2008.
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois
Taverna Tutorial exercise 2: REST services from BioCatalogue
REST Services Data and tools on the Web have been exposed in both WSDL and REST. Taverna provides a custom processor for accessing REST services Peter.
An Introduction to Designing and Executing Workflows with Taverna
Presentation transcript:

caBIG Workflow University of Chicago, USA University of Manchester, UK

Agenda caBIG Workflows: The “BIG” picture caBIG Workflow Infrastructure Semantic Service discovery Composing the Workflow Invoke stateful and secure services Workflow execution service Discovering and Executing caBIG workflows using caGrid portal Examples of caBIG workflows Future directions

The caGrid ecosystem and the role of workflow caGrid data instruments computation resource Virtualization Security Connectivity Discovery Composition Orchestration Reuse Community Scientific workflow lifecycle reuse generate Workflow as consumer Easily reuse services for complex experiments. Workflow as contributor Workflow as “best practice” wrapped as services.

The caBIG Workflow System caGrid Discovery composition Execution Reuse Community reuse generate Service discovery based on cancer research metadata. Data-flow modeling flavor caGrid activity State management (WSRF) Security (GSI) Implicit iteration: handle parallel execution WSRF and GSI enforcement A “Facebook” for caGrid workflows Workflow Execution. Service Workflows in caGrid Portal

Lymphoma Prediction Workflow Scientific value Use gene-expression patterns associated with two lymphoma types to predict the type of an unknown sample. Connect caGrid data service (caArray) with analytical services (PreProcess, SVM and KNN from GenePattern). Major steps Querying training data from experiments stored in caArray. Preprocessing, i.e., normalizing the microarray data. Predicting lymphoma type using SVM & KNN services. Extension Generalized the workflow into a cancer type prediction routine that can be used on other caArray data sets. *Fig. from MA Shipp. Nature Medicine, 2002 *

MicroArray from tumor tissue Microarray preProcessing Lymphoma prediction Lymphoma Prediction Workflow

Lymphoma type prediction Acknowledgement: Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI) Jared Nedzel (MIT)

caGrid Workflow Infrastructure Semantic based service query Workflow composition

2 default caGrid configurations in Taverna: NCI Production caGrid v1.3 Training caGrid Configuration – a set of caGrid services belonging to the same grid Other “caGrids” can be defined through preferences Configuring Taverna

Semantic Service Discovery Semantic search – searches Index Service for registered caGrid services matching various search criteria: Service name, inputs, outputs, research center, class names, concept codes, etc.

Adding caGrid services directly If user knows WSDL url of a caGrid service – the service can be added directly

caBIG services palette As a result of semantic search or direct adding caBIG services appear in Taverna’s Service Panel Ready to be drag and dropped into caGrid workflows

Stateful caGrid services Taverna provides support for stateful caGrid services that implement the WSRF spec. Taverna can detect if a service is WSRF-compliant and adds special input port ‘EndpointReference’ to it EPR can be passed around the workflow as normal parameter

Secure caGrid services Taverna can invoke secure caGrid services that require user to log in to caGrid Taverna interacts with caGrid’s GAARDS infrastructure to obtain user’s proxy: Authenticate the user with user’s affiliated Authentication Service Obtain user’s proxy from Dorian Service Default proxy lifetime: 12 hours

Using secure caGrid services Involves: 1.Configuring a secure caGrid service from Taverna 2.Logging onto selected caGrid to obtain a proxy certificate 3.Saving and managing caGrid proxies and username and passwords

Configuring secure services (1/2) Authentication Service and Dorian Service urls required in order to obtain user’s proxy Can be configured globally for all services from the same caGrid (in preferences) Can be configured individually for a particular caGrid service (overrides configuration from preferences)

Configuring secure services (2/2) View secure’s service details Configure service’s security properties

Logging onto caGrid User is prompted for his caGrid username and password when any secure service is invoked from a workflow for the first time

Credential management (1/2) Taverna obtains proxy for user from Dorian Service using user’s caGrid username and password Proxies are saved and managed by Credential Manager caGrid username and password can also be remembered

Workflow execution service Taverna Workflow Service wraps the Taverna execution engine into a WS-Resource and exposes operations such as createResource, startWorkflow, getStatus, and getOutput for user submitted workflows. startWorkflow createResource getStatus getOutput Workflow Service Stateful Resources (Resource Properties) Stateful Resources (Resource Properties) EPR Taverna Engine Data Services Data Services Analytical Services Analytical Services caGrid & Other Services Client API Taverna Workbench Workflow Portlet

Workflow execution service Taverna Workflow Service  Provides stateful resources that execute the workflows.  Supports caGrid security architecture (GSI Security).  Allows programmatic submission of workflows.

Access Taverna workflow via caGrid portal Taverna Workflow Portlet is deployed in the caGrid Portal on the training Grid: URL : The Portlet currently lists a few workflows with their descriptions that can be browsed from the above URL Users can select a workflow they are interested in running. View : 1

Access Taverna workflow via caGrid portal URL : Based on the number of input ports in the workflow, the portlet prompts the users to enter the input values in the textbox. For example, the Lymphoma workflow takes only one input in the form an Experiment ID that identifies the experiment that caArray uses for data collection. Hit submit after the entering the data. View : 2

Access Taverna workflow via caGrid portal URL : The portlet stores the user submitted workflows in the current session of the portal. Users can View all the Active and Completed Workflows in the session. Clicking the Output Button shows the output of the workflow. The portlet provides workflow specific view-resolvers to render the outputs. For E.g: Lymphoma workflow currently displays the output in a html table. Views : 3, 4, & 5 Ack. Manav Kher, Joshua Phillips (SemanticBits)

Workflow execution service plug-in Submit the workflow into an execution servce. Retrieve execution result asynchronously.

Examples of caBIG workflows: caDSR Scientific value To find all the UML packages related to a given context (‘caCore’). Not a real scientific experiment. Simple. Important in caGrid. Steps Querying Project object. Do data transformation. Querying Packages object and get the result. Workflow input caGrid services “Shim” services Workflow output

Protein sequence information query Scientific value To query protein sequence information out of 3 caGrid data services: caBIO, CPAS and GridPIR. To analyze a protein sequence from different data sources. Steps Querying CPAS and get the id, name, value of the sequence. Querying caBIO and GridPIR using the id or name obtained from CPAS.

Microarray clustering* Scientific value A common routine to group genes or experiments into clusters with similar profiles. To identify functional groups of genes. Steps Querying and retrieving the microarray data of interest from a caArrayScrub data service at Columbia University Preprocessing, or normalize the microarray data using the GenePattern analytical service at the Broad Institute at MIT Running hierarchical clustering using the geWorkbench analytical service at Columbia University Workflow in/output caGrid services “Shim” servicesothers *Wei Tan, Ravi Madduri, Kiran Keshav, Baris E. Suzek, Scott Oster, Ian Foster. Orchestrating caGrid Services in Taverna. ICWS 08.

Execution trace Execution result as xml 1936 gene expressions

caGrid workflows in myExperiment caGrid Workflows covered Data service workflow caDSR query Protein sequence query Data + analytical service Microarray clustering Lymphoma type classification caGrid workflows are uploaded to myExperiment and accessible from:

Future Directions More guidance in workflow modeling Leverage caDSR, EVS and the workflows at myExperiment More friendly user interface A CQL builder for caGrid data services More shim services for data transformation More features Integration with caGrid transfer to access data Browsing and executing workflows from caGrid portal Enhanced security support More workflows of real scientific value

More information caGrid workflow Our team Carole Goble Univ. Manchester, UK Univ. Chicago Wei Tan Dinanath Sulakhe Stian Soiland-Reyes Ravi Madduri Alexandra Nenadic